Re: asking your advice about bug
On 02/17/2014 06:50 PM, Roman Gareev wrote: Hi Tobias, thanks for the answer! Hi Roman, sorry for missing this mail. I think that the segfault is being caused by NULL arguments being passedto compute_deps by loop_level_carries_dependences. *This is **causing **an* *assignment of** NULL values to the following parameters of **compute_deps:* must_raw_no_source, may_raw_no_source, must_war_no_source, may_war_no_source, must_waw_no_source, may_waw_no_source. They are being passed to subtract_commutative_associative_deps and dereferenced in the following statements: *must_raw_no_source = isl_union_map_subtract (*must_raw_no_source, x_must_raw_no_source); *may_raw_no_source = isl_union_map_subtract (*may_raw_no_source, x_may_raw_no_source); *must_war_no_source = isl_union_map_subtract (*must_war_no_source, x_must_war_no_source); *may_war_no_source = isl_union_map_subtract (*may_war_no_source, x_may_war_no_source); *must_waw_no_source = isl_union_map_subtract (*must_waw_no_source, x_must_waw_no_source); *may_waw_no_source = isl_union_map_subtract (*may_waw_no_source, x_may_waw_no_source); This is the reason of segfault. (All functions mentioned above are located in gcc/graphite-dependences.c) Interesting analysis. I think that this can be solved by the addition to subtract_commutative_associative_deps of NULL checking of the following variables: must_raw_no_source, may_raw_no_source, must_war_no_source, may_war_no_source, must_waw_no_source, may_waw_no_source. I've implemented this in the patch, which can be found below. Yes, this would be a 'solution'. However, I am in fact surprised that those variables are NULL at all. Do you have an idea why this is the case? Understanding this would help to understand if the patch you propose is actually the right solution or if it is just hiding a previous bug. Cheers, Tobias
Re: [RFC][PATCH 0/5] arch: atomic rework
Hi, On Fri, 21 Feb 2014, Paul E. McKenney wrote: And with conservative I mean everything is a source of a dependency, and hence can't be removed, reordered or otherwise fiddled with, and that includes code sequences where no atomic objects are anywhere in sight [1]. In the light of that the only realistic way (meaning to not have to disable optimization everywhere) to implement consume as currently specified is to map it to acquire. At which point it becomes pointless. No, only memory_order_consume loads and [[carries_dependency]] function arguments are sources of dependency chains. I don't see [[carries_dependency]] in the C11 final draft (yeah, should get a real copy, I know, but let's assume it's the same language as the standard). Therefore, yes, only consume loads are sources of dependencies. The problem with the definition of the carries a dependency relation is not the sources, but rather where it stops. It's transitively closed over value of evaluation A is used as operand in evaluation B, with very few exceptions as per 5.1.2.4#14. Evaluations can contain function calls, so if there's _any_ chance that an operand of an evaluation might even indirectly use something resulting from a consume load then that evaluation must be compiled in a way to not break dependency chains. I don't see a way to generally assume that e.g. the value of a function argument can impossibly result from a consume load, therefore the compiler must assume that all function arguments _can_ result from such loads, and so must disable all depchain breaking optimization (which are many). [1] Simple example of what type of transformations would be disallowed: int getzero (int i) { return i - i; } This needs to be as follows: [[carries_dependency]] int getzero(int i [[carries_dependency]]) { return i - i; } Otherwise dependencies won't get carried through it. So, with the above do you agree that in absense of any other magic (see below) the compiler is not allowed to transform my initial getzero() (without the carries_dependency markers) implementation into return 0; because of the C11 rules for carries-a-dependency? If so, do you then also agree that the specification of carries a dependency is somewhat, err, shall we say, overbroad? depchains don't matter, could _then_ optmize it to zero. But that's insane, especially considering that it's hard to detect if a given context doesn't care for depchains, after all the depchain relation is constructed exactly so that it bleeds into nearly everywhere. So we would most of the time have to assume that the ultimate context will be depchain-aware and therefore disable many transformations. Any function that does not contain a memory_order_consume load and that doesn't have any arguments marked [[carries_dependency]] can be optimized just as before. And as such marker doesn't exist we must conservatively assume that it's on _all_ parameters, so I'll stand by my claim. Then inlining getzero would merely add another # j.dep = i.dep relation, so depchains are still there but the value optimization can happen before inlining. Having to do something like that I'd find disgusting, and rather rewrite consume into acquire :) Or make the depchain relation somehow realistically implementable. I was actually OK with arithmetic cancellation breaking the dependency chains. Others on the committee felt otherwise, and I figured that (1) I wouldn't be writing that kind of function anyway and (2) they knew more about writing compilers than I. I would still be OK saying that things like i-i, i*0, i%1, i0, i|~0 and so on just break the dependency chain. Exactly. I can see the problem that people had with that, though. There are very many ways to write conceiled zeros (or generally neutral elements of the function in question). My getzero() function is one (it could e.g. be an assembler implementation). The allowance to break dependency chains would have to apply to such cancellation as well, and so can't simply itemize all cases in which cancellation is allowed. Rather it would have had to argue about something like value dependency, ala evaluation B depends on A, if there exist at least two different values A1 and A2 (results from A), for which evaluation B (with otherwise same operands) yields different values B1 and B2. Alas, it doesn't, except if you want to understand the term the value of A is used as an operand of B in that way. Even then you'd still have the second case of the depchain definition, via intermediate not even atomic memory stores and loads to make two evaluations be ordered per carries-a-dependency. And even that understanding of is used wouldn't be enough, because there are cases where the cancellation happens in steps, and where it interacts with the third clause (transitiveness): Assume this: a = something() // evaluation
Non-temporal move
I could see storent pattern in x86 machine descriptions (in sse.md)., but internals doc don't mention it. Should we add a description about this in the internals doc? Regards Ganesh
RE: [RFC] Introducing MIPS O32 ABI Extension for FR0 and FR1 Interlinking
Richard Sandiford rdsandif...@googlemail.com writes Matthew Fortune matthew.fort...@imgtec.com writes: All, Imagination Technologies would like to introduce the design of an O32 ABI extension for MIPS to allow it to be used in conjunction with MIPS FPUs having 64-bit floating-point registers. This is a wide-reaching design that involves changes to all components of the MIPS toolchain it is being posted to GCC first and will progress on to other tools. This ABI extension is compatible with the existing O32 ABI definition and will not require the introduction of new build variants (multilibs). The design document is relatively large and has been placed on the MIPS Compiler Team wiki to facilitate review: http://dmz-portal.mips.com/wiki/MIPS_O32_ABI_-_FR0_and_FR1_Interlinkin g Looks good to me. It'll be interesting to see whether making the odd- numbered call-saved-in-fr0 registers available for frx pays off or whether it ends up being better to avoid them. Indeed, I suspect they should be avoided except for leaf functions. You would have to be pretty desperate for a register if you use the caller-and-callee save registers! I understand the need to deprecate the current -mgp32 -mfp64 behaviour. I don't think we should deprecate -mfp64 itself though. Instead, why not keep -mfp32 as meaning FR0, -mfp64 meaning FR1 and add -mfpxx for modeless? So rather than deprecating the -mgp32 -mfp64 combination and adding -mfr, we'd just make -mgp32 -mfp64 generate the new FR1 form in which the odd-numbered registers are call-clobbered rather than the old form in which they were call-saved. Extreme caution is the only reason why the design avoided changing fp64 behaviour (basically in case anyone had serious objection). If you would be happy with a change of behaviour for -mgp32 -mfp64 then that is a great start. AIUI the old form never really worked reliably due to things like newlib's setjmp not preserving the odd-numbered registers, so it doesn't seem worth keeping around. Also, the old form is identified by the GNU attribute (4, 4) so it'd be easy for the linker to reject links between the old and the new form. That is true. You will have noticed a number of changes over recent months to start fixing fp64 as currently defined but having found this new solution then such fixes are no longer important. The lack of support for gp32 fp64 in linux is further reason to permit redefining it. Would you be happy to retain the same builtin defines for FP64 if changing its behaviour (i.e. __mips_fpr=64)? The corresponding asm would then be .set fp=xx. Either way, a new .set option would be better than a specific .fr directive because it gives you access to the option stack (.set push/.set pop). I'm not sure about: If an assembly directive is seen prior to the start of the text section then this modifies the default mode for the module. This isn't how any of the existing options work and I think the inconsistency would be confusing. It also means that if the first function in a file happens to force a local mode (e.g. because it's an ifunc implementation) then you'd have to remember to write: .fr x .fr 1 so that the first sets the mode for the module and the second sets it for the first function. The different treatment of the two lines wouldn't be obvious at first glance. How about instead having a separate directive that explicitly sets the global value of an option? I.e. something like .module, taking the same options as .set. Better names welcome. :-) Use of a different directive to actually affect the overall mode of a module sounds like a good plan and it avoids the weird behaviour. The only thing specifically needed is that the assembly file records the mode it was written for. Getting the wrong command line option would otherwise lead to unusual runtime failures. We have been/are still discussing this point so it's no surprise you have commented on it too. I'll wait for any further comments on this area and update accordingly. The scheme allows an ifunc to request a mode and effectively gives the choice to the firstcomer. Every other ifunc then has to live with the choice. I don't think that's a good idea, since the order that ifuncs are called isn't well-defined or under direct user control. Since ifuncs would have to live with a choice made by other ifuncs, in practice they must all be prepared to deal with FR=1 if linked into a fully-modeless or FR1 program, and to deal with FR=0 if linked into a fully-modeless or FR0 program. So IMO the dynamic linker should simply set FR=1 for modeless programs if the hardware supports it and set it to 0 for modeless programs otherwise, like you say in the first paragraph of 9.4. The ifunc interaction should possibly be moved to a different proposal. We could reduce this down to a simple statement that interaction with ifunc needs to be
Re: [RFC][PATCH 0/5] arch: atomic rework
On Sun, Feb 23, 2014 at 11:31 AM, Linus Torvalds torva...@linux-foundation.org wrote: Let me think about it some more, but my gut feel is that just tweaking the definition of what ordered means is sufficient. So to go back to the suggested ordering rules (ignoring the restrict part, which is just to clarify that ordering through other means to get to the object doesn't matter), I suggested: the consume ordering guarantees the ordering between that atomic read and the accesses to the object that the pointer points to and I think the solution is to just say that this ordering acts as a fence. It doesn't say exactly *where* the fence is, but it says that there is *some* fence between the load of the pointer and any/all accesses to the object through that pointer. I'm wrong. That doesn't work. At all. There is no ordering except through the pointer chain. So I think saying just that, and nothing else (no magic fences, no nothing) is the right thing: the consume ordering guarantees the ordering between that atomic read and the accesses to the object that the pointer points to directly or indirectly through a chain of pointers The thing is, anything but a chain of pointers (and maybe relaxing it to indexes in tables in addition to pointers) doesn't really work. The current standard tries to break it at obvious points that can lose the data dependency (either by turning it into a control dependency, or by just dropping the value, like the left-hand side of a comma-expression), but the fact is, it's broken. It's broken not just because the value can be lost other ways (ie the p-p example), it's broken because the value can be turned into a control dependency so many other ways too. Compilers regularly turn arithmetic ops with logical comparisons into branches. So an expression like a = !!ptr carries a dependency in the current C standard, but it's entirely possible that a compiler ends up turning it into a compare-and-branch rather than a compare-and-set-conditional, depending on just exactly how a ends up being used. That's true even on an architecture like ARM that has a lot of conditional instructions (there are way less if you compile for Thumb, for example, but compilers also do things like if there are more than N predicated instructions I'll just turn it into a branch-over instead). So I think the C standard needs to just explicitly say that you can walk a chain of pointers (with that possible indexes in arrays extension), and nothing more. Linus
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 24, 2014 at 4:57 PM, Linus Torvalds torva...@linux-foundation.org wrote: On Sun, Feb 23, 2014 at 11:31 AM, Linus Torvalds torva...@linux-foundation.org wrote: Let me think about it some more, but my gut feel is that just tweaking the definition of what ordered means is sufficient. So to go back to the suggested ordering rules (ignoring the restrict part, which is just to clarify that ordering through other means to get to the object doesn't matter), I suggested: the consume ordering guarantees the ordering between that atomic read and the accesses to the object that the pointer points to and I think the solution is to just say that this ordering acts as a fence. It doesn't say exactly *where* the fence is, but it says that there is *some* fence between the load of the pointer and any/all accesses to the object through that pointer. I'm wrong. That doesn't work. At all. There is no ordering except through the pointer chain. So I think saying just that, and nothing else (no magic fences, no nothing) is the right thing: the consume ordering guarantees the ordering between that atomic read and the accesses to the object that the pointer points to directly or indirectly through a chain of pointers To me that reads like int i; int *q = i; int **p = q; atomic_XXX (p, CONSUME); orders against accesses '*p', '**p', '*q' and 'i'. Thus it seems they want to say that it orders against aliased storage - but then go further and include indirectly through a chain of pointers?! Thus an atomic read of a int * orders against any 'int' memory operation but not against 'float' memory operations? Eh ... Just jumping in to throw in my weird-2-cents. Richard.
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 24, 2014 at 8:27 AM, Richard Biener richard.guent...@gmail.com wrote: To me that reads like int i; int *q = i; int **p = q; atomic_XXX (p, CONSUME); orders against accesses '*p', '**p', '*q' and 'i'. Thus it seems they want to say that it orders against aliased storage - but then go further and include indirectly through a chain of pointers?! Thus an atomic read of a int * orders against any 'int' memory operation but not against 'float' memory operations? No, it's not about type at all, and the chain of pointers can be much more complex than that, since the int * can point to within an object that contains other things than just that int (the int can be part of a structure that then has pointers to other structures etc). So in your example, ptr = atomic_read(p, CONSUME); would indeed order against the subsequent access of the chain through *that* pointer (the whole restrict thing that I left out as a separate thing, which was probably a mistake), but certainly not against any integer pointer, and certainly not against any aliasing pointer chains. So yes, the atomic_read() would be ordered wrt '*ptr' (getting 'q') _and_ '**ptr' (getting 'i'), but nothing else - including just the aliasing access of dereferencing 'i' directly. Linus
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 24, 2014 at 8:37 AM, Linus Torvalds torva...@linux-foundation.org wrote: So yes, the atomic_read() would be ordered wrt '*ptr' (getting 'q') _and_ '**ptr' (getting 'i'), but nothing else - including just the aliasing access of dereferencing 'i' directly. Btw, what CPU architects and memory ordering guys tend to do in documentation is give a number of litmus test pseudo-code sequences to show the effects and intent of the language. I think giving those kinds of litmus tests for both this is ordered and this is not ordered cases like the above is would be a great clarification. Partly because the language is going to be somewhat legalistic and thus hard to wrap your mind around, and partly to really hit home the *intent* of the language, which I think is actually fairly clear to both compiler writers and to programmers. Linus
Re: [RFC][PATCH 0/5] arch: atomic rework
Hi, On Mon, 24 Feb 2014, Linus Torvalds wrote: To me that reads like int i; int *q = i; int **p = q; atomic_XXX (p, CONSUME); orders against accesses '*p', '**p', '*q' and 'i'. Thus it seems they want to say that it orders against aliased storage - but then go further and include indirectly through a chain of pointers?! Thus an atomic read of a int * orders against any 'int' memory operation but not against 'float' memory operations? No, it's not about type at all, and the chain of pointers can be much more complex than that, since the int * can point to within an object that contains other things than just that int (the int can be part of a structure that then has pointers to other structures etc). So, let me try to poke holes into your definition or increase my understanding :) . You said chain of pointers(dereferences I assume), e.g. if p is result of consume load, then access to p-here-there-next-prev-stuff is supposed to be ordered with that load (or only when that last load/store itself is also an atomic load or store?). So, what happens if the pointer deref chain is partly hidden in some functions: A * adjustptr (B *ptr) { return ptr-here-there-next; } B * p = atomic_XXX (somewhere, consume); adjustptr(p)-prev-stuff = bla; As far as I understood you, this whole ptrderef chain business would be only an optimization opportunity, right? So if the compiler can't be sure how p is actually used (as in my function-using case, assume adjustptr is defined in another unit), then the consume load would simply be transformed into an acquire (or whatever, with some barrier I mean)? Only _if_ the compiler sees all obvious uses of p (indirectly through pointer derefs) can it, yeah, do what with the consume load? Ciao, Michael.
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 24, 2014 at 05:55:50PM +0100, Michael Matz wrote: Hi, On Mon, 24 Feb 2014, Linus Torvalds wrote: To me that reads like int i; int *q = i; int **p = q; atomic_XXX (p, CONSUME); orders against accesses '*p', '**p', '*q' and 'i'. Thus it seems they want to say that it orders against aliased storage - but then go further and include indirectly through a chain of pointers?! Thus an atomic read of a int * orders against any 'int' memory operation but not against 'float' memory operations? No, it's not about type at all, and the chain of pointers can be much more complex than that, since the int * can point to within an object that contains other things than just that int (the int can be part of a structure that then has pointers to other structures etc). So, let me try to poke holes into your definition or increase my understanding :) . You said chain of pointers(dereferences I assume), e.g. if p is result of consume load, then access to p-here-there-next-prev-stuff is supposed to be ordered with that load (or only when that last load/store itself is also an atomic load or store?). So, what happens if the pointer deref chain is partly hidden in some functions: A * adjustptr (B *ptr) { return ptr-here-there-next; } B * p = atomic_XXX (somewhere, consume); adjustptr(p)-prev-stuff = bla; As far as I understood you, this whole ptrderef chain business would be only an optimization opportunity, right? So if the compiler can't be sure how p is actually used (as in my function-using case, assume adjustptr is defined in another unit), then the consume load would simply be transformed into an acquire (or whatever, with some barrier I mean)? Only _if_ the compiler sees all obvious uses of p (indirectly through pointer derefs) can it, yeah, do what with the consume load? Good point, I left that out of my list. Adding it: 13. By default, pointer chains do not propagate into or out of functions. In implementations having attributes, a [[carries_dependency]] may be used to mark a function argument or return as passing a pointer chain into or out of that function. If a function does not contain memory_order_consume loads and also does not contain [[carries_dependency]] attributes, then that function may be compiled using any desired dependency-breaking optimizations. The ordering effects are implementation defined when a given pointer chain passes into or out of a function through a parameter or return not marked with a [[carries_dependency]] attributed. Note that this last paragraph differs from the current standard, which would require ordering regardless. Thanx, Paul
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 24, 2014 at 8:55 AM, Michael Matz m...@suse.de wrote: So, let me try to poke holes into your definition or increase my understanding :) . You said chain of pointers(dereferences I assume), e.g. if p is result of consume load, then access to p-here-there-next-prev-stuff is supposed to be ordered with that load (or only when that last load/store itself is also an atomic load or store?). It's supposed to be ordered wrt the first load (the consuming one), yes. So, what happens if the pointer deref chain is partly hidden in some functions: No problem. The thing is, the ordering is actually handled by the CPU in all relevant cases. So the compiler doesn't actually need to *do* anything. All this legalistic stuff is just to describe the semantics and the guarantees. The problem is two cases: (a) alpha (which doesn't really order any accesses at all, not even dependent loads), but for a compiler alpha is actually trivial: just add a rmb instruction after the load, and you can't really do anything else (there's a few optimizations you can do wrt the rmb, but they are very specific and simple). So (a) is a problem, but the solution is actually really simple, and gives very *strong* guarantees: on alpha, a consume ends up being basically the same as a read barrier after the load, with only very minimal room for optimization. (b) ARM and powerpc and similar architectures, that guarantee the data dependency as long as it is an *actual* data dependency, and never becomes a control dependency. On ARM and powerpc, control dependencies do *not* order accesses (the reasons boil down to essentially: branch prediction breaks the dependency, and instructions that come after the branch can be happily executed before the branch). But it's almost impossible to describe that in the standard, since compilers can (and very much do) turn a control dependency into a data dependency and vice versa. So the current standard tries to describe that control vs data dependency, and tries to limit it to a data dependency. It fails. It fails for multiple reasons - it doesn't allow for trivial optimizations that just remove the data dependency, and it also doesn't allow for various trivial cases where the compiler *does* turn the data dependency into a control dependency. So I really really think that the current C standard language is broken. Unfixably broken. I'm trying to change the syntactic data dependency that the current standard uses into something that is clearer and correct. The chain of pointers thing is still obviously a data dependency, but by limiting it to pointers, it simplifies the language, clarifies the meaning, avoids all syntactic tricks (ie p-p is clearly a syntactic dependency on p, but does *not* involve in any way following the pointer) and makes it basically impossible for the compiler to break the dependency without doing value prediction, and since value prediction has to be disallowed anyway, that's a feature, not a bug. Linus
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 24, 2014 at 02:55:07PM +0100, Michael Matz wrote: Hi, On Fri, 21 Feb 2014, Paul E. McKenney wrote: And with conservative I mean everything is a source of a dependency, and hence can't be removed, reordered or otherwise fiddled with, and that includes code sequences where no atomic objects are anywhere in sight [1]. In the light of that the only realistic way (meaning to not have to disable optimization everywhere) to implement consume as currently specified is to map it to acquire. At which point it becomes pointless. No, only memory_order_consume loads and [[carries_dependency]] function arguments are sources of dependency chains. I don't see [[carries_dependency]] in the C11 final draft (yeah, should get a real copy, I know, but let's assume it's the same language as the standard). Therefore, yes, only consume loads are sources of dependencies. The problem with the definition of the carries a dependency relation is not the sources, but rather where it stops. It's transitively closed over value of evaluation A is used as operand in evaluation B, with very few exceptions as per 5.1.2.4#14. Evaluations can contain function calls, so if there's _any_ chance that an operand of an evaluation might even indirectly use something resulting from a consume load then that evaluation must be compiled in a way to not break dependency chains. I don't see a way to generally assume that e.g. the value of a function argument can impossibly result from a consume load, therefore the compiler must assume that all function arguments _can_ result from such loads, and so must disable all depchain breaking optimization (which are many). [1] Simple example of what type of transformations would be disallowed: int getzero (int i) { return i - i; } This needs to be as follows: [[carries_dependency]] int getzero(int i [[carries_dependency]]) { return i - i; } Otherwise dependencies won't get carried through it. So, with the above do you agree that in absense of any other magic (see below) the compiler is not allowed to transform my initial getzero() (without the carries_dependency markers) implementation into return 0; because of the C11 rules for carries-a-dependency? If so, do you then also agree that the specification of carries a dependency is somewhat, err, shall we say, overbroad? From what I can see, overbroad. The problem is that the C++11 standard defines how carries-dependency interacts with function calls and returns in 7.6.4, which describes the [[carries_dependency]] attribute. For example, 7.6.4p6 says: Function g’s second parameter has a carries_dependency attribute, but its first parameter does not. Therefore, function h’s first call to g carries a dependency into g, but its second call does not. The implementation might need to insert a fence prior to the second call to g. When C11 declined to take attributes, they also left out the part saying how carries-dependency interacts with functions. :-/ Might be fixed by now, checking up on it. One could argue that the bit about emitting fence instructions at function calls and returns is implied by the as-if rule even without this wording, but... depchains don't matter, could _then_ optmize it to zero. But that's insane, especially considering that it's hard to detect if a given context doesn't care for depchains, after all the depchain relation is constructed exactly so that it bleeds into nearly everywhere. So we would most of the time have to assume that the ultimate context will be depchain-aware and therefore disable many transformations. Any function that does not contain a memory_order_consume load and that doesn't have any arguments marked [[carries_dependency]] can be optimized just as before. And as such marker doesn't exist we must conservatively assume that it's on _all_ parameters, so I'll stand by my claim. Or that you have to emit a fence instruction when a dependency chain enters or leaves a function in cases where all callers/calles are not visible to the compiler. My preference is that the ordering properties of a carries-dependency chain is implementation defined at the point that it enters or leaves a function without the marker, but others strongly disagreed. ;-) Then inlining getzero would merely add another # j.dep = i.dep relation, so depchains are still there but the value optimization can happen before inlining. Having to do something like that I'd find disgusting, and rather rewrite consume into acquire :) Or make the depchain relation somehow realistically implementable. I was actually OK with arithmetic cancellation breaking the dependency chains. Others on the committee felt otherwise, and I figured that (1) I wouldn't be writing that kind of function anyway and (2) they knew
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 24, 2014 at 09:28:56AM -0800, Paul E. McKenney wrote: On Mon, Feb 24, 2014 at 05:55:50PM +0100, Michael Matz wrote: Hi, On Mon, 24 Feb 2014, Linus Torvalds wrote: To me that reads like int i; int *q = i; int **p = q; atomic_XXX (p, CONSUME); orders against accesses '*p', '**p', '*q' and 'i'. Thus it seems they want to say that it orders against aliased storage - but then go further and include indirectly through a chain of pointers?! Thus an atomic read of a int * orders against any 'int' memory operation but not against 'float' memory operations? No, it's not about type at all, and the chain of pointers can be much more complex than that, since the int * can point to within an object that contains other things than just that int (the int can be part of a structure that then has pointers to other structures etc). So, let me try to poke holes into your definition or increase my understanding :) . You said chain of pointers(dereferences I assume), e.g. if p is result of consume load, then access to p-here-there-next-prev-stuff is supposed to be ordered with that load (or only when that last load/store itself is also an atomic load or store?). So, what happens if the pointer deref chain is partly hidden in some functions: A * adjustptr (B *ptr) { return ptr-here-there-next; } B * p = atomic_XXX (somewhere, consume); adjustptr(p)-prev-stuff = bla; As far as I understood you, this whole ptrderef chain business would be only an optimization opportunity, right? So if the compiler can't be sure how p is actually used (as in my function-using case, assume adjustptr is defined in another unit), then the consume load would simply be transformed into an acquire (or whatever, with some barrier I mean)? Only _if_ the compiler sees all obvious uses of p (indirectly through pointer derefs) can it, yeah, do what with the consume load? Good point, I left that out of my list. Adding it: 13. By default, pointer chains do not propagate into or out of functions. In implementations having attributes, a [[carries_dependency]] may be used to mark a function argument or return as passing a pointer chain into or out of that function. If a function does not contain memory_order_consume loads and also does not contain [[carries_dependency]] attributes, then that function may be compiled using any desired dependency-breaking optimizations. The ordering effects are implementation defined when a given pointer chain passes into or out of a function through a parameter or return not marked with a [[carries_dependency]] attributed. Note that this last paragraph differs from the current standard, which would require ordering regardless. And there is also kill_dependency(), which needs to be added to the list in #8 of operators that take a chained pointer and return something that is not a chained pointer. Thanx, Paul
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 24, 2014 at 09:38:46AM -0800, Linus Torvalds wrote: On Mon, Feb 24, 2014 at 8:55 AM, Michael Matz m...@suse.de wrote: So, let me try to poke holes into your definition or increase my understanding :) . You said chain of pointers(dereferences I assume), e.g. if p is result of consume load, then access to p-here-there-next-prev-stuff is supposed to be ordered with that load (or only when that last load/store itself is also an atomic load or store?). It's supposed to be ordered wrt the first load (the consuming one), yes. So, what happens if the pointer deref chain is partly hidden in some functions: No problem. The thing is, the ordering is actually handled by the CPU in all relevant cases. So the compiler doesn't actually need to *do* anything. All this legalistic stuff is just to describe the semantics and the guarantees. The problem is two cases: (a) alpha (which doesn't really order any accesses at all, not even dependent loads), but for a compiler alpha is actually trivial: just add a rmb instruction after the load, and you can't really do anything else (there's a few optimizations you can do wrt the rmb, but they are very specific and simple). So (a) is a problem, but the solution is actually really simple, and gives very *strong* guarantees: on alpha, a consume ends up being basically the same as a read barrier after the load, with only very minimal room for optimization. (b) ARM and powerpc and similar architectures, that guarantee the data dependency as long as it is an *actual* data dependency, and never becomes a control dependency. On ARM and powerpc, control dependencies do *not* order accesses (the reasons boil down to essentially: branch prediction breaks the dependency, and instructions that come after the branch can be happily executed before the branch). But it's almost impossible to describe that in the standard, since compilers can (and very much do) turn a control dependency into a data dependency and vice versa. So the current standard tries to describe that control vs data dependency, and tries to limit it to a data dependency. It fails. It fails for multiple reasons - it doesn't allow for trivial optimizations that just remove the data dependency, and it also doesn't allow for various trivial cases where the compiler *does* turn the data dependency into a control dependency. So I really really think that the current C standard language is broken. Unfixably broken. I'm trying to change the syntactic data dependency that the current standard uses into something that is clearer and correct. The chain of pointers thing is still obviously a data dependency, but by limiting it to pointers, it simplifies the language, clarifies the meaning, avoids all syntactic tricks (ie p-p is clearly a syntactic dependency on p, but does *not* involve in any way following the pointer) and makes it basically impossible for the compiler to break the dependency without doing value prediction, and since value prediction has to be disallowed anyway, that's a feature, not a bug. OK, good point, please ignore my added thirteenth item in the list. Thanx, Paul
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 24, 2014 at 9:21 AM, Paul E. McKenney paul...@linux.vnet.ibm.com wrote: 4. Bitwise operators (, |, ^, and I suppose also ~) applied to a chained pointer and an integer results in another chained pointer in that same pointer chain. No. You cannot define it this way. Taking the value of a pointer and doing a bitwise operation that throws away all the bits (or even *most* of the bits) results in the compiler easily being able to turn the chain into a non-chain. The obvious example being val 0, but things like val 1 are in practice also something that compilers easily turn into control dependencies instead of data dependencies. So you can talk about things like aligning the pointer value to object boundaries etc, but it really cannot and *must* not be about the syntactic operations. The same goes for adding and subtracting an integer. The *syntax* doesn't matter. It's about remaining information. Doing p-(int)p or p+(-(int)p) doesn't leave any information despite being subtracting and adding an integer at a syntactic level. Syntax is meaningless. Really. 8. Applying any of the following operators to a chained pointer results in something that is not a chained pointer: (), sizeof, !, *, /, %, , , , , =, =, ==, !=, , and ||. Parenthesis? I'm assuming that you mean calling through the chained pointer. Also, I think all of /, * and % are perfectly fine, and might be used for that aligning the pointer operation that is fine. Linus
Re: [RFC] Introducing MIPS O32 ABI Extension for FR0 and FR1 Interlinking
Matthew Fortune matthew.fort...@imgtec.com writes: Richard Sandiford rdsandif...@googlemail.com writes I understand the need to deprecate the current -mgp32 -mfp64 behaviour. I don't think we should deprecate -mfp64 itself though. Instead, why not keep -mfp32 as meaning FR0, -mfp64 meaning FR1 and add -mfpxx for modeless? So rather than deprecating the -mgp32 -mfp64 combination and adding -mfr, we'd just make -mgp32 -mfp64 generate the new FR1 form in which the odd-numbered registers are call-clobbered rather than the old form in which they were call-saved. Extreme caution is the only reason why the design avoided changing fp64 behaviour (basically in case anyone had serious objection). If you would be happy with a change of behaviour for -mgp32 -mfp64 then that is a great start. Yeah, my first impression is that keeping the current interface would be much better than adding a new set of options. AIUI the old form never really worked reliably due to things like newlib's setjmp not preserving the odd-numbered registers, so it doesn't seem worth keeping around. Also, the old form is identified by the GNU attribute (4, 4) so it'd be easy for the linker to reject links between the old and the new form. That is true. You will have noticed a number of changes over recent months to start fixing fp64 as currently defined but having found this new solution then such fixes are no longer important. The lack of support for gp32 fp64 in linux is further reason to permit redefining it. Would you be happy to retain the same builtin defines for FP64 if changing its behaviour (i.e. __mips_fpr=64)? I think that should be OK. I suppose a natural follow-on question is what __mips_fpr should be for -mfpxx. Maybe just 0? If we want to be extra cautious we could define a second set of macros alongside the old ones. The scheme allows an ifunc to request a mode and effectively gives the choice to the firstcomer. Every other ifunc then has to live with the choice. I don't think that's a good idea, since the order that ifuncs are called isn't well-defined or under direct user control. Since ifuncs would have to live with a choice made by other ifuncs, in practice they must all be prepared to deal with FR=1 if linked into a fully-modeless or FR1 program, and to deal with FR=0 if linked into a fully-modeless or FR0 program. So IMO the dynamic linker should simply set FR=1 for modeless programs if the hardware supports it and set it to 0 for modeless programs otherwise, like you say in the first paragraph of 9.4. The ifunc interaction should possibly be moved to a different proposal. We could reduce this down to a simple statement that interaction with ifunc needs to be considered when finalising MIPS ifunc support in general. Sounds good. You allow the mode to be changed midexecution if a new FR0 or FR1 object is loaded. Is it really worth supporting that though? It has the same problem as the ifuncs: once you've dlopen()ed an object, you fix the mode for the whole program, even after the dlclose(). Unless we know of specific cases where this is needed, maybe it would be safer to fix the mode before execution based on DT_NEEDED libraries and allow the mode of modeless programs to be overridden by an environment variable. Scanning the entire set of DT_NEEDED libraries would achieve most of what full dynamic mode switching gives us, it is essentially the first stage of the dynamic mode switching described in the proposal anyway. However, I am concerned about excluding dlopen()ed objects from mode selection though (not so worried about excluding ifunc, that could just fix the mode before resolving the first one). One specific concern is for Android where I believe we have the situation where native applications are loaded as (a form of) shared library. This means a mode requirement can be introduced late on. In an Android environment it is unlikely to be acceptable to have to do something special to load an application that happens to have a specific mode requirement so dynamic selection is useful. This is more of a transitional problem than anything but making it a smooth process is quite important. I'm also not sure that there is much more effort required for a dynamic linker to take account of dlopen()ed objects in addition to DT_NEEDED, changes are needed in this code regardless. As far as GNU/Linux goes, if we do end up with a function in something like a modeless libm that is implemented as an FR-aware ifunc, that would force the choice to be made early anyway. So we have this very specific case where everything in the initial process is modeless, no ifuncs take advantage of the FR setting, and a dlopen()ed object was compiled as fr0 rather than modeless. I agree it's possible but it seems unlikely. I know nothing about the way Android loading works though. :-) Could you describe it in more detail? Is it significantly different from glibc's
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 24, 2014 at 10:14:01AM -0800, Linus Torvalds wrote: On Mon, Feb 24, 2014 at 9:21 AM, Paul E. McKenney paul...@linux.vnet.ibm.com wrote: 4. Bitwise operators (, |, ^, and I suppose also ~) applied to a chained pointer and an integer results in another chained pointer in that same pointer chain. No. You cannot define it this way. Taking the value of a pointer and doing a bitwise operation that throws away all the bits (or even *most* of the bits) results in the compiler easily being able to turn the chain into a non-chain. The obvious example being val 0, but things like val 1 are in practice also something that compilers easily turn into control dependencies instead of data dependencies. Indeed, most of the bits need to remain for this to work. So you can talk about things like aligning the pointer value to object boundaries etc, but it really cannot and *must* not be about the syntactic operations. The same goes for adding and subtracting an integer. The *syntax* doesn't matter. It's about remaining information. Doing p-(int)p or p+(-(int)p) doesn't leave any information despite being subtracting and adding an integer at a syntactic level. Syntax is meaningless. Really. Good points. How about the following replacements? 3. Adding or subtracting an integer to/from a chained pointer results in another chained pointer in that same pointer chain. The results of addition and subtraction operations that cancel the chained pointer's value (for example, p-(long)p where p is a pointer to char) are implementation defined. 4. Bitwise operators (, |, ^, and I suppose also ~) applied to a chained pointer and an integer for the purposes of alignment and pointer translation results in another chained pointer in that same pointer chain. Other uses of bitwise operators on chained pointers (for example, p|~0) are implementation defined. 8. Applying any of the following operators to a chained pointer results in something that is not a chained pointer: (), sizeof, !, *, /, %, , , , , =, =, ==, !=, , and ||. Parenthesis? I'm assuming that you mean calling through the chained pointer. Yes, good point. Of course, parentheses for grouping just pass the value through without affecting the chained-ness. Also, I think all of /, * and % are perfectly fine, and might be used for that aligning the pointer operation that is fine. Something like this? char *p; p = p - (unsigned long)p % 8; I was thinking of this as subtraction -- the p gets moduloed by 8, which loses the chained-pointer designation. But that is OK because that designation gets folded back in by the subtraction. Am I missing a use case? That leaves things like this one: p = (p / 8) * 8; I cannot think of any other legitimate use for / and *. Here is an updated #8 and a new 8a: 8. Applying any of the following operators to a chained pointer results in something that is not a chained pointer: function call (), sizeof, !, %, , , , , =, =, ==, !=, , ||, and kill_dependency(). 8a. Dividing a chained pointer by an integer and multiplying it by that same integer (for example, to align that pointer) results in a chained pointer in that same pointer chain. The ordering effects of other uses of infix * and / on chained pointers are implementation defined. Does that capture it? Thanx, Paul
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 24, 2014 at 10:53 AM, Paul E. McKenney paul...@linux.vnet.ibm.com wrote: Good points. How about the following replacements? 3. Adding or subtracting an integer to/from a chained pointer results in another chained pointer in that same pointer chain. The results of addition and subtraction operations that cancel the chained pointer's value (for example, p-(long)p where p is a pointer to char) are implementation defined. 4. Bitwise operators (, |, ^, and I suppose also ~) applied to a chained pointer and an integer for the purposes of alignment and pointer translation results in another chained pointer in that same pointer chain. Other uses of bitwise operators on chained pointers (for example, p|~0) are implementation defined. Quite frankly, I think all of this language that is about the actual operations is irrelevant and wrong. It's not going to help compiler writers, and it sure isn't going to help users that read this. Why not just talk about value chains and that any operations that restrict the value range severely end up breaking the chain. There is no point in listing the operations individually, because every single operation *can* restrict things. Listing individual operations and depdendencies is just fundamentally wrong. For example, let's look at this obvious case: int q,*p = atomic_read(pp, consume); .. nothing modifies 'p' .. q = *p; and there are literally *zero* operations that modify the value change, so obviously the two operations are ordered, right? Wrong. What if the nothing modifies 'p' part looks like this: if (p != myvariable) return; and now any sane compiler will happily optimize q = *p into q = myvariable, and we're all done - nothing invalid was ever So my earlier suggestion tried to avoid this by having the restrict thing, so the above wouldn't work. But your (and the current C standards) attempt to define this with some kind of syntactic dependency carrying chain will _inevitably_ get this wrong, and/or be too horribly complex to actually be useful. Seriously, don't do it. I claim that all your attempts to do this crazy syntactic these operations maintain the chained pointers is broken. The fact that you have changed carries a dependency to chained pointer changes NOTHING. So just give it up. It's a fundamentally broken model. It's *wrong*, but even more importantly, it's not even *useful*, since it ends up being too complicated for a compiler writer or a programmer to understand. I really really really think you need to do this at a higher conceptual level, get away from all these idiotic these operations maintain the chain crap. Because there *is* no such list. Quite frankly, any standards text that has that [[carries_dependency]] or [[kill_dependency]] or whatever attribute is broken. It's broken because the whole concept is TOTALLY ALIEN to the compiler writer or the programmer. It makes no sense. It's purely legalistic language that has zero reason to exist. It's non-intuitive for everybody. And *any* language that talks about the individual operations only encourages people to play legalistic games that actually defeat the whole purpose (namely that all relevant CPU's are going to implement that consume ordering guarantee natively, with no extra code generation rules AT ALL). So any time you talk about some random detail of some operation, somebody is going to come up with a trick that defeats things. So don't do it. There is absolutely ZERO difference between any of the arithmetic operations, be they bitwise, additive, multiplicative, shifts, whatever. The *only* thing that matters for all of them is whether they are value-preserving, or whether they drop so much information that the compiler might decide to use a control dependency instead. That's true for every single one of them. Similarly, actual true control dependencies that limit the problem space sufficiently that the actual pointer value no longer has significant information in it (see the above example) are also things that remove information to the point that only a control dependency remains. Even when the value itself is not modified in any way at all. Linus
[GSoc 2014] OpenMP runtime improvements
Hi, I am interested in contributing to the OpenMP project as part of GSoC 2014. I am mailing to discuss ideas, to see if someone is willing to mentor me and also if possible, to get a test patch in before the end of application process so as to verify my qualification! Looking forward to your feedback! Thanks, Pranith
build breakage in libsanitizer
Hey all, I've just tried building Trunk from branches/google/main/ and got the following failure in libsanitizer: In file included from /mnt/test/rvbd-root-gcc49/usr/include/features.h:356:0, from /mnt/test/rvbd-root-gcc49/usr/include/arpa/inet.h:22, from ../../../../gcc49-google-main/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cc:20: /mnt/test/rvbd-root-gcc49/usr/include/sys/timex.h:145:31: error: expected initializer before 'throw' __asm__ (ntp_gettimex) __THROW; ^ This is an intel-to-intel cross-compiler that is being built against linux-2.6.32.27 headers and glibc-2.12 (with a few redhat patches). The system header in question contains the following: #if defined __GNUC__ __GNUC__ = 2 extern int ntp_gettime (struct ntptimeval *__ntv) __asm__ (ntp_gettimex) __THROW; #else extern int ntp_gettimex (struct ntptimeval *__ntv) __THROW; # define ntp_gettime ntp_gettimex #endif This same header works against gcc4.8 (or is not used). Could someone clarify what libsanitizer needs here and what gcc dislikes please? Which should I patch? Thanks in advance! Oleg.
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 24, 2014 at 2:37 PM, Paul E. McKenney paul...@linux.vnet.ibm.com wrote: What if the nothing modifies 'p' part looks like this: if (p != myvariable) return; and now any sane compiler will happily optimize q = *p into q = myvariable, and we're all done - nothing invalid was ever Yes, the compiler could do that. But it would still be required to carry a dependency from the memory_order_consume read to the *p, But that's *BS*. You didn't actually listen to the main issue. Paul, why do you insist on this carries-a-dependency crap? It's broken. If you don't believe me, then believe the compiler person who already piped up and told you so. The carries a dependency model is broken. Get over it. No sane compiler will ever distinguish two different registers that have the same value from each other. No sane compiler will ever say ok, register r1 has the exact same value as register r2, but r2 carries the dependency, so I need to make sure to pass r2 to that function or use it as a base pointer. And nobody sane should *expect* a compiler to distinguish two registers with the same value that way. So the whole model is broken. I gave an alternate model (the restrict), and you didn't seem to understand the really fundamental difference. It's not a language difference, it's a conceptual difference. In the broken carries a dependency model, you have fight all those aliases that can have the same value, and it is not a fight you can win. We've had the p-p examples, we've had the p0 examples, but the fact is, that p==myvariable example IS EXACTLY THE SAME THING. All three of those things: p-p, p0, and p==myvariable mean that any compiler worth its salt now know that p carries no information, and will optimize it away. So please stop arguing against that. Whenever you argue against that simple fact, you are arguing against sane compilers. So *accept* the fact that some operations (and I guarantee that there are more of those than you can think of, and you can create them with various tricks using pretty much *any* feature in the C language) essentially take the data information away. And just accept the fact that then the ordering goes away too. So give up on carries a dependency. Because there will be cases where that dependency *isn't* carried. The language of the standard needs to get *away* from the broken model, because otherwise the standard is broken. I suggest we instead talk about litmus tests and why certain code sequences are ordered, and others are not. So the code sequence I already mentioned is *not* ordered: Litmus test 1: p = atomic_read(pp, consume); if (p == variable) return p-val; is *NOT* ordered, because the compiler can trivially turn this into return variable.val, and break the data dependency. This is true *regardless* of any carries a dependency language, because that language is insane, and doesn't work when the different pieces here may be in different compilation units. BUT: Litmus test 2: p = atomic_read(pp, consume); if (p != variable) return p-val; *IS* ordered, because while it looks syntactically a lot like Litmus test 1, there is no sane way a compiler can use the knowledge that p is not a pointer to a particular location to break the data dependency. There is no way in hell that any sane carries a dependency model can get the simple tests above right. So give up on it already. Carries a dependency cannot work. It's a bad model. You're trying to describe the problem using the wrong tools. Note that my restrict+pointer to object language actually got this *right*. The restrict part made Litmus test 1 not ordered, because that p == variable success case means that the pointer wasn't restricted, so the pre-requisite for ordering didn't exist. See? The carries a dependency is a broken model for this, but there are _other_ models that can work. You tried to rewrite my model into carries a dependency. That *CANNOT* work. It's like trying to rewrite quantum physics into the Greek model of the four elements. They are not compatible models, and one of them can be shown to not work. Linus
RE: [RFC] Introducing MIPS O32 ABI Extension for FR0 and FR1 Interlinking
Richard Sandiford rdsandif...@googlemail.com writes AIUI the old form never really worked reliably due to things like newlib's setjmp not preserving the odd-numbered registers, so it doesn't seem worth keeping around. Also, the old form is identified by the GNU attribute (4, 4) so it'd be easy for the linker to reject links between the old and the new form. That is true. You will have noticed a number of changes over recent months to start fixing fp64 as currently defined but having found this new solution then such fixes are no longer important. The lack of support for gp32 fp64 in linux is further reason to permit redefining it. Would you be happy to retain the same builtin defines for FP64 if changing its behaviour (i.e. __mips_fpr=64)? I think that should be OK. I suppose a natural follow-on question is what __mips_fpr should be for -mfpxx. Maybe just 0? I'm doing just that in my experimental implementation of all this. If we want to be extra cautious we could define a second set of macros alongside the old ones. You allow the mode to be changed midexecution if a new FR0 or FR1 object is loaded. Is it really worth supporting that though? It has the same problem as the ifuncs: once you've dlopen()ed an object, you fix the mode for the whole program, even after the dlclose(). Unless we know of specific cases where this is needed, maybe it would be safer to fix the mode before execution based on DT_NEEDED libraries and allow the mode of modeless programs to be overridden by an environment variable. Scanning the entire set of DT_NEEDED libraries would achieve most of what full dynamic mode switching gives us, it is essentially the first stage of the dynamic mode switching described in the proposal anyway. However, I am concerned about excluding dlopen()ed objects from mode selection though (not so worried about excluding ifunc, that could just fix the mode before resolving the first one). One specific concern is for Android where I believe we have the situation where native applications are loaded as (a form of) shared library. This means a mode requirement can be introduced late on. In an Android environment it is unlikely to be acceptable to have to do something special to load an application that happens to have a specific mode requirement so dynamic selection is useful. This is more of a transitional problem than anything but making it a smooth process is quite important. I'm also not sure that there is much more effort required for a dynamic linker to take account of dlopen()ed objects in addition to DT_NEEDED, changes are needed in this code regardless. As far as GNU/Linux goes, if we do end up with a function in something like a modeless libm that is implemented as an FR-aware ifunc, that would force the choice to be made early anyway. So we have this very specific case where everything in the initial process is modeless, no ifuncs take advantage of the FR setting, and a dlopen()ed object was compiled as fr0 rather than modeless. I agree it's possible but it seems unlikely. A reasonable point. I know nothing about the way Android loading works though. :-) Could you describe it in more detail? Is it significantly different from glibc's dynamic loader running a PIE? I am working from fragments of information on this aspect still so I need to get more clarification from Android developers. My current understanding is that native parts of applications are actually shared libraries and form part of, but not necessarily the entry to, an application. Since such a shared library can't be 'required' by anything it must be loaded explicitly. I'll get clarification but the potential need for dynamic mode switching in Android need not affect the decision that GNU/Linux takes. If we do end up using ELF flags then maybe adding two new EF_MIPS_ABI enums would be better. It's more likely to be trapped by old loaders and avoids eating up those precious remaining bits. Sound's reasonable but I'm still trying to determine how this information can be propagated from loader to dynamic loader. The dynamic loader has access to the ELF headers so I didn't think it would need any help. As I understand it the dynamic loader only has specific access to the program headers of the executable not the ELF headers. There is no question that the dynamic loader has access to DSO ELF headers but we need the start point too. You didn't say specifically how a static program's crt code would know whether it was linked as modeless or in a specific FR mode. Maybe the linker could define a special hidden symbol? Why do you say crt rather than dlopen? The mode requirement should only matter if you want to change it and dlopen should be able to access information in the same way that a dynamic linker would. It may seem redundant but perhaps we end up having to mark an executable with
Re: [LM-32] Code generation for address loading
FX MOREL fxmorel@gmail.com writes: Hi everyone, I am developing on a custom design using the LatticeMico32 architecture and I use gcc 4.5.1 to compile C code for this arch. In this architecture, the loading of an address 0x always takes two assembly instructions to fetch the address because immediates are on 16 bits : mvhi r1, 0x ori r1, r1, 0x ... lw r2, r1 In my situation, nearly all the symbols are located in the same 64kB region and their address share the same hi-part, so I am trying to minimize the overload of always using two instructions when only one is needed. [additional details deleted] Because the symbol mapping phase is done during linking, I have little chance to know the future symbol address at code generation but is there some way I could make this work ? If I affect the symbol to a dedicated section (with the __attribute__ ((section())) directive ), is there a way to know its section during code generation ? I understand that I am asking for a very 'dangerous' advice but again, this will only be a custom optimization for a custom design. Have you considered doing this through custom GNU linker relaxation work? I would try this before hacking away at the compiler. AG Thank you. F-X Morel
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 24, 2014 at 03:35:04PM -0800, Linus Torvalds wrote: On Mon, Feb 24, 2014 at 2:37 PM, Paul E. McKenney paul...@linux.vnet.ibm.com wrote: What if the nothing modifies 'p' part looks like this: if (p != myvariable) return; and now any sane compiler will happily optimize q = *p into q = myvariable, and we're all done - nothing invalid was ever Yes, the compiler could do that. But it would still be required to carry a dependency from the memory_order_consume read to the *p, But that's *BS*. You didn't actually listen to the main issue. Paul, why do you insist on this carries-a-dependency crap? Sigh. Read on... It's broken. If you don't believe me, then believe the compiler person who already piped up and told you so. The carries a dependency model is broken. Get over it. No sane compiler will ever distinguish two different registers that have the same value from each other. No sane compiler will ever say ok, register r1 has the exact same value as register r2, but r2 carries the dependency, so I need to make sure to pass r2 to that function or use it as a base pointer. And nobody sane should *expect* a compiler to distinguish two registers with the same value that way. So the whole model is broken. I gave an alternate model (the restrict), and you didn't seem to understand the really fundamental difference. It's not a language difference, it's a conceptual difference. In the broken carries a dependency model, you have fight all those aliases that can have the same value, and it is not a fight you can win. We've had the p-p examples, we've had the p0 examples, but the fact is, that p==myvariable example IS EXACTLY THE SAME THING. All three of those things: p-p, p0, and p==myvariable mean that any compiler worth its salt now know that p carries no information, and will optimize it away. So please stop arguing against that. Whenever you argue against that simple fact, you are arguing against sane compilers. So let me see if I understand your reasoning. My best guess is that it goes something like this: 1. The Linux kernel contains code that passes pointers from rcu_dereference() through external functions. 2. Code in the Linux kernel expects the normal RCU ordering guarantees to be in effect even when external functions are involved. 3. When compiling one of these external functions, the C compiler has no way of knowing about these RCU ordering guarantees. 4. The C compiler might therefore apply any and all optimizations to these external functions. 5. This in turn implies that we the only way to prohibit any given optimization from being applied to the results obtained from rcu_dereference() is to prohibit that optimization globally. 6. We have to be very careful what optimizations are globally prohibited, because a poor choice could result in unacceptable performance degradation. 7. Therefore, the only operations that can be counted on to maintain the needed RCU orderings are those where the compiler really doesn't have any choice, in other words, where any reasonable way of computing the result will necessarily maintain the needed ordering. Did I get this right, or am I confused? So *accept* the fact that some operations (and I guarantee that there are more of those than you can think of, and you can create them with various tricks using pretty much *any* feature in the C language) essentially take the data information away. And just accept the fact that then the ordering goes away too. Actually, the fact that there are more potential optimizations than I can think of is a big reason for my insistence on the carries-a-dependency crap. My lack of optimization omniscience makes me very nervous about relying on there never ever being a reasonable way of computing a given result without preserving the ordering. So give up on carries a dependency. Because there will be cases where that dependency *isn't* carried. The language of the standard needs to get *away* from the broken model, because otherwise the standard is broken. I suggest we instead talk about litmus tests and why certain code sequences are ordered, and others are not. OK... So the code sequence I already mentioned is *not* ordered: Litmus test 1: p = atomic_read(pp, consume); if (p == variable) return p-val; is *NOT* ordered, because the compiler can trivially turn this into return variable.val, and break the data dependency. Right, given your model, the compiler is free to produce code that doesn't order the load from pp against the load from p-val. This is true *regardless* of any carries a dependency language, because that language is insane, and doesn't work when the different pieces here may be in different compilation units. Indeed, it
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 24, 2014 at 3:35 PM, Linus Torvalds torva...@linux-foundation.org wrote: Litmus test 1: p = atomic_read(pp, consume); if (p == variable) return p-val; is *NOT* ordered Btw, don't get me wrong. I don't _like_ it not being ordered, and I actually did spend some time thinking about my earlier proposal on strengthening the 'consume' ordering. I have for the last several years been 100% convinced that the Intel memory ordering is the right thing, and that people who like weak memory ordering are wrong and should try to avoid reproducing if at all possible. But given that we have memory orderings like power and ARM, I don't actually see a sane way to get a good strong ordering. You can teach compilers about cases like the above when they actually see all the code and they could poison the value chain etc. But it would be fairly painful, and once you cross object files (or even just functions in the same compilation unit, for that matter), it goes from painful to just ridiculously not worth it. So I think the C semantics should mirror what the hardware gives us - and do so even in the face of reasonable optimizations - not try to do something else that requires compilers to treat consume very differently. If people made me king of the world, I'd outlaw weak memory ordering. You can re-order as much as you want in hardware with speculation etc, but you should always *check* your speculation and make it *look* like you did everything in order. Which is pretty much the intel memory ordering (ignoring the write buffering). Linus
[Bug ipa/60327] [4.9 Regression] xalanbmk and dealII ICE in ipa-inline-analysis.c:3555
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60327 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Keywords||lto CC||hubicka at gcc dot gnu.org Target Milestone|--- |4.9.0
[Bug ipa/60327] New: [4.9 Regression] xalanbmk and dealII ICE in ipa-inline-analysis.c:3555
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60327 Bug ID: 60327 Summary: [4.9 Regression] xalanbmk and dealII ICE in ipa-inline-analysis.c:3555 Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: ipa Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Both xalanbmk and dealII ICE when built with -Ofast -funroll-loops -fpeel-loops -march=native -flto -fwhole-program -flto-partition=none (AMD Fam10) like lto1: internal compiler error: Segmentation fault 0x7ea3bf crash_signal /gcc/spec/sb-barbella.arch.suse.de-ai-64/gcc/gcc/toplev.c:337 0x68d839 inline_update_overall_summary(cgraph_node*) /gcc/spec/sb-barbella.arch.suse.de-ai-64/gcc/gcc/ipa-inline-analysis.c:3555 0x6a7843 walk_polymorphic_call_targets /gcc/spec/sb-barbella.arch.suse.de-ai-64/gcc/gcc/ipa.c:229 0x6a7843 symtab_remove_unreachable_nodes(bool, _IO_FILE*) /gcc/spec/sb-barbella.arch.suse.de-ai-64/gcc/gcc/ipa.c:400 0x744edb execute_todo /gcc/spec/sb-barbella.arch.suse.de-ai-64/gcc/gcc/passes.c:1896 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See http://gcc.gnu.org/bugs.html for instructions. lto-wrapper: /gcc/spec/sb-barbella.arch.suse.de-ai-64/x86_64/install-hack/bin/g++ returned 1 exit status /usr/bin/ld: fatal error: lto-wrapper failed collect2: error: ld returned 1 exit status specmake: *** [dealII] Error 1
[Bug ipa/60325] [4.9 Regression] ICE in ipa_modify_formal_parameters, at ipa-prop.c compiling g++.dg/cilk-plus/CK/lambda_spawns.cc with LTO-profiledbootstrap build
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60325 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Keywords||lto Target Milestone|--- |4.9.0 Summary|ICE in |[4.9 Regression] ICE in |ipa_modify_formal_parameter |ipa_modify_formal_parameter |s, at ipa-prop.c compiling |s, at ipa-prop.c compiling |g++.dg/cilk-plus/CK/lambda_ |g++.dg/cilk-plus/CK/lambda_ |spawns.cc with |spawns.cc with |LTO-profiledbootstrap build |LTO-profiledbootstrap build
[Bug lto/60319] wrong code (that hangs) by LTO at -Os and above on x86_64-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60319 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Keywords||lto Known to fail||4.5.4, 4.6.4, 4.7.3 --- Comment #1 from Richard Biener rguenth at gcc dot gnu.org --- Hmm, I can't reproduce this with 4.8 or trunk but with 4.5, 4.6 and 4.7.
[Bug lto/60319] wrong code (that hangs) by LTO at -Os and above on x86_64-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60319 --- Comment #2 from Richard Biener rguenth at gcc dot gnu.org --- 4.7 and lower is expected to show this behavior due to the bug that c++ is not properly implemented as c = (char)((int)c + 1) and thus we think that overflow is undefined. 4.8 and above has that fixed and thus shows different, working behavior.
[Bug lto/60319] wrong code (that hangs) by LTO at -Os and above on x86_64-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60319 --- Comment #3 from Zhendong Su su at cs dot ucdavis.edu --- (In reply to Richard Biener from comment #1) Hmm, I can't reproduce this with 4.8 or trunk but with 4.5, 4.6 and 4.7. Richard, it still fails for me. Did you use LTO? $ gcc-trunk -v Using built-in specs. COLLECT_GCC=gcc-trunk COLLECT_LTO_WRAPPER=/usr/local/gcc-trunk/libexec/gcc/x86_64-unknown-linux-gnu/4.9.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../gcc-trunk/configure --prefix=/usr/local/gcc-trunk --enable-languages=c,c++ --disable-werror --enable-multilib Thread model: posix gcc version 4.9.0 20140223 (experimental) [trunk revision 208062] (GCC) $ $ gcc-trunk -O0 -c foo.c $ gcc-trunk -O0 -c main.c $ gcc-trunk -Os foo.o main.o $ a.out $ $ gcc-trunk -flto -O0 -c foo.c $ gcc-trunk -flto -O0 -c main.c $ gcc-trunk -flto -Os foo.o main.o $ a.out ^C $ $ gcc-4.8 -flto -O0 -c foo.c $ gcc-4.8 -flto -O0 -c main.c $ gcc-4.8 -flto -Os foo.o main.o $ a.out ^C $
[Bug fortran/59198] [4.7/4.8/4.9 Regression] ICE on cyclically dependent polymorphic types
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59198 --- Comment #10 from paul.richard.thomas at gmail dot com paul.richard.thomas at gmail dot com --- A further small remark, when the explicit interface for obs1_int is turned to a subroutine, everything works perfectly. I am homing in on this as being the source of the trouble; I suspect that the function pointer is not receiving the DEC_SIZE information. I will look tonight. On 23 February 2014 21:49, pault at gcc dot gnu.org gcc-bugzi...@gcc.gnu.org wrote: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59198 --- Comment #9 from Paul Thomas pault at gcc dot gnu.org --- Hi Tobias, I need to walk away from this one for 24 hours. I have established this chain: (i) We start building decay_t; (ii) During which we have to build decay_gen_t (from trans-types.c:2456); (iii) Followed by decay_term_t; (iv) Which has a decay_t as its only component; (v) Since this is in the process of being built, what is retruned is the backend_decl without any of the fields. Thus the size cannot be determined; (vi) For reasons that I cannot see, since this component is a pointer, indeterminate size this propagates back to the size of the decay_gen_t component in decay_t; and (vii) This I suppose but have not confirmed, clobbers the initialisation of the vtable. This latter is surmise, on the basis that changing the 'term' field to a pointer still causes the size problem but the ICE goes away. The programme even executes! I cannot see why there is a problem in estimating the size, since the relevant components are either allocatable or pointers - thus the size can be determined. Cheers Paul -- You are receiving this mail because: You are on the CC list for the bug.
[Bug lto/60319] wrong code (that hangs) by LTO at -Os and above on x86_64-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60319 --- Comment #4 from Richard Biener rguenth at gcc dot gnu.org --- (In reply to Zhendong Su from comment #3) (In reply to Richard Biener from comment #1) Hmm, I can't reproduce this with 4.8 or trunk but with 4.5, 4.6 and 4.7. Richard, it still fails for me. Did you use LTO? Yes, I did. For me -O[s23] -flto -v -save-temps -fdump-tree-all results in a ccXYZ.ltrans0.o.169t.optimized file like ;; Function main (main, funcdef_no=2, decl_uid=2401, symbol_order=0) (executed once) main () { bb 2: return 0; } $ gcc-trunk -v Using built-in specs. COLLECT_GCC=gcc-trunk COLLECT_LTO_WRAPPER=/usr/local/gcc-trunk/libexec/gcc/x86_64-unknown-linux- gnu/4.9.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../gcc-trunk/configure --prefix=/usr/local/gcc-trunk --enable-languages=c,c++ --disable-werror --enable-multilib Thread model: posix gcc version 4.9.0 20140223 (experimental) [trunk revision 208062] (GCC) $ $ gcc-trunk -O0 -c foo.c $ gcc-trunk -O0 -c main.c $ gcc-trunk -Os foo.o main.o $ a.out $ $ gcc-trunk -flto -O0 -c foo.c $ gcc-trunk -flto -O0 -c main.c $ gcc-trunk -flto -Os foo.o main.o $ a.out ^C $ $ gcc-4.8 -flto -O0 -c foo.c $ gcc-4.8 -flto -O0 -c main.c $ gcc-4.8 -flto -Os foo.o main.o $ a.out ^C $
[Bug fortran/60128] [4.8/4.9 Regression] Wrong ouput using en edit descriptor
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60128 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Priority|P3 |P4 Target Milestone|--- |4.8.3
[Bug tree-optimization/45791] Missed devirtualization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45791 Matthijs Kooijman matthijs at stdin dot nl changed: What|Removed |Added CC||matthijs at stdin dot nl --- Comment #15 from Matthijs Kooijman matthijs at stdin dot nl --- I ran into another variant of this problem, which I reduced to the following testcase. I found the problem on 4.8.2, but it is already fixed in trunk / gcc-4.9 (Debian 4.9-20140218-1). Still, it might be useful to have the testcase here for reference. class Base { }; class Sub : public Base { public: virtual void bar(); }; Sub foo; Sub * const pointer = foo; Sub* function() { return foo; }; int main() { // Gcc 4.8.2 devirtualizes this: pointer-bar(); // but not this: function()-bar(); }
[Bug libstdc++/60326] Incorrect type from std::make_unsignedwchar_t
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60326 Jonathan Wakely redi at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2014-02-24 Ever confirmed|0 |1 --- Comment #2 from Jonathan Wakely redi at gcc dot gnu.org --- I think we're just missing wchar_t specializations, and also for char16_t and char32_t #include type_traits using namespace std; using wchar_signed = make_signedwchar_t::type; using wchar_unsigned = make_unsignedwchar_t::type; static_assert( !is_samewchar_signed, wchar_unsigned::value, ); static_assert( !is_samechar16_t, make_signedchar16_t::type::value, ); static_assert( !is_samechar32_t, make_signedchar32_t::type::value, );
[Bug middle-end/60292] [4.9 Regression] ICE: in fill_vec_av_set, at sel-sched.c:3863 with -m64 after r206174 on powerpc-apple-darwin9.8.0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60292 --- Comment #5 from Dominique d'Humieres dominiq at lps dot ens.fr --- The PR is fixed by the patch in comment 2 without regression, see http://gcc.gnu.org/ml/gcc-testresults/2014-02/msg01688.html. Thanks for the quick fix.
[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Blocks||60243 --- Comment #2 from Richard Biener rguenth at gcc dot gnu.org --- We have only 1 billion (!) calls to estimate_calls_size_and_time (and 2 billion recursions in that function - I suppose callgrind/kcachegrind overflowed the entry counter even ...). Also 3 billion calls to evaluate_predicate. Sth is getting seriously out-of-hands here ;) That is, it's ultimately called from do_estimate_edge_size (but only 36 calls here - still a _lot_ for this testcase). Looks related to what I found in PR60243.
[Bug middle-end/60292] [4.9 Regression] ICE: in fill_vec_av_set, at sel-sched.c:3863 with -m64 after r206174 on powerpc-apple-darwin9.8.0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60292 --- Comment #6 from Andrey Belevantsev abel at gcc dot gnu.org --- (In reply to Dominique d'Humieres from comment #5) The PR is fixed by the patch in comment 2 without regression, see http://gcc.gnu.org/ml/gcc-testresults/2014-02/msg01688.html. Thanks for the quick fix. Thank you, I will commit the patch then once the ia64 testing will finish.
[Bug rtl-optimization/60317] [4.9 Regression] find_hard_regno_for compile time hog in libvpx
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60317 --- Comment #3 from Richard Biener rguenth at gcc dot gnu.org --- This whole thing updating keys and such should use a proper lattice of per-cgraph and per-edge node sizes/times which can be updated with a less ad-hoc algorithm than the current one which easily and completely explodes ... :/ (and the 'cache' is completely ineffective during the update process)
[Bug rtl-optimization/60317] [4.9 Regression] find_hard_regno_for compile time hog in libvpx
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60317 --- Comment #4 from Richard Biener rguenth at gcc dot gnu.org --- (In reply to Richard Biener from comment #3) This whole thing updating keys and such should use a proper lattice of per-cgraph and per-edge node sizes/times which can be updated with a less ad-hoc algorithm than the current one which easily and completely explodes ... :/ (and the 'cache' is completely ineffective during the update process) Err - wrong bug
[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315 --- Comment #3 from Richard Biener rguenth at gcc dot gnu.org --- This whole thing updating keys and such should use a proper lattice of per-cgraph and per-edge node sizes/times which can be updated with a less ad-hoc algorithm than the current one which easily and completely explodes ... :/ (and the 'cache' is completely ineffective during the update process)
[Bug ipa/60266] [4.9 Regression] ICE: in ipa_get_parm_lattices, at ipa-cp.c:261 during LibreOffice LTO build
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60266 --- Comment #4 from Martin Jambor jamborm at gcc dot gnu.org --- Author: jamborm Date: Mon Feb 24 12:39:52 2014 New Revision: 208067 URL: http://gcc.gnu.org/viewcvs?rev=208067root=gccview=rev Log: 2014-02-24 Martin Jambor mjam...@suse.cz PR ipa/60266 * ipa-cp.c (propagate_constants_accross_call): Bail out early if there are no parameter descriptors. Modified: trunk/gcc/ChangeLog trunk/gcc/ipa-cp.c
[Bug ipa/60266] [4.9 Regression] ICE: in ipa_get_parm_lattices, at ipa-cp.c:261 during LibreOffice LTO build
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60266 Martin Jambor jamborm at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #5 from Martin Jambor jamborm at gcc dot gnu.org --- Fixed.
[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315 --- Comment #4 from Richard Biener rguenth at gcc dot gnu.org --- When calling do_estimate_edge_size to compute the effect on caller size when inlining an edge we call estimate_node_size_and_time which eventually recurses down to estimate_calls_size_and_time (why!? call edges in the callee are irrelevant when inlining the call into the caller!). Doesn't this just want to add(?) e-call_stmt_size/time? At the moment estimate_calls_size_and_time recurses to estimate_edge_size_and_time ... and I don't see _any_ prevention of running in cgraph cycles here. (and the cache isn't populated before computing an edges size/time is). In fact, Index: gcc/ipa-inline-analysis.c === --- gcc/ipa-inline-analysis.c (revision 207960) +++ gcc/ipa-inline-analysis.c (working copy) @@ -3011,21 +3011,11 @@ estimate_calls_size_and_time (struct cgr struct inline_edge_summary *es = inline_edge_summary (e); if (!es-predicate || evaluate_predicate (es-predicate, possible_truths)) - { - if (e-inline_failed) - { - /* Predicates of calls shall not use NOT_CHANGED codes, -sowe do not need to compute probabilities. */ - estimate_edge_size_and_time (e, size, time, REG_BR_PROB_BASE, - known_vals, known_binfos, - known_aggs, hints); - } - else - estimate_calls_size_and_time (e-callee, size, time, hints, - possible_truths, - known_vals, known_binfos, - known_aggs); - } + /* Predicates of calls shall not use NOT_CHANGED codes, + sowe do not need to compute probabilities. */ + estimate_edge_size_and_time (e, size, time, REG_BR_PROB_BASE, +known_vals, known_binfos, +known_aggs, hints); } for (e = node-indirect_calls; e; e = e-next_callee) { fixes this and I cannot make sense of calling estimate_calls_size_and_time for the callee of an edge that we are not going to inline (or that is already inlined? I still find those if (e-inline_failed) checks odd). If it's supposed to account for inline bodies in the caller then we should have updated the inline_summary () of the caller, not have to recurse here - and we _do_ seem to (inline_merge_summary).
[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315 --- Comment #5 from Richard Biener rguenth at gcc dot gnu.org --- Hmm, ok - it is supposed to only account for the extra call edges in the inlined bodies. The actual issue seems to be Deciding on inlining of small functions. Starting with size 114. Enqueueing calls in Testscale::Test(Scale) [with Scale scale = (Scale)3u]/14. Enqueueing calls in Testscale::Test(Scale) [with Scale scale = (Scale)3u]/13. Estimating body: Testscale::Test(Scale) [with Scale scale = (Scale)3u]/13 Known to be false: not inlined, op1 != 3, op1 changed size:0 time:0 enqueuing call Testscale::Test(Scale) [with Scale scale = (Scale)3u]/13 - Testscale::Test(Scale) [with Scale scale = (Scale)3u]/14, badness -1073741826 Estimating body: Testscale::Test(Scale) [with Scale scale = (Scale)2u]/9 Known to be false: not inlined, op1 != 2, op1 changed size:0 time:0 enqueuing call Testscale::Test(Scale) [with Scale scale = (Scale)3u]/13 - Testscale::Test(Scale) [with Scale scale = (Scale)2u]/10, badness -1073741827 Estimating body: Testscale::Test(Scale) [with Scale scale = (Scale)1u]/5 Known to be false: not inlined, op1 != 1, op1 changed size:0 time:0 ... Considering Testscale::Test(Scale) [with Scale scale = (Scale)2u]/9 with 27 size to be inlined into Testscale::Test(Scale) [with Scale scale = (Scale)3u]/13 in t.C:21 Estimated growth after inlined into all is +27 insns. Estimated badness is -1073741827, frequency 0.16. Badness calculation for Testscale::Test(Scale) [with Scale scale = (Scale)3u]/13 - Testscale::Test(Scale) [with Scale scale = (Scale)2u]/10 size growth -3, time 0 inline hints: in_scc declared_inline -1073741827: Growth -3 = 0 Processing frequency Testscale::Test(Scale) [with Scale scale = (Scale)2u] Called by Testscale::Test(Scale) [with Scale scale = (Scale)3u] that is normal or hot Estimating body: Testscale::Test(Scale) [with Scale scale = (Scale)3u]/13 Known to be false: not inlined, op1 != 3, op1 changed size:0 time:0 Estimating body: Testscale::Test(Scale) [with Scale scale = (Scale)0u]/2 Known to be false: not inlined, op1 != 0, op1 changed size:0 time:0 enqueuing call Testscale::Test(Scale) [with Scale scale = (Scale)2u]/30 - Testscale::Test(Scale) [with Scale scale = (Scale)0u]/3, badness -1073741827 Estimating body: Testscale::Test(Scale) [with Scale scale = (Scale)1u]/5 Known to be false: not inlined, op1 != 1, op1 changed size:0 time:0 enqueuing call Testscale::Test(Scale) [with Scale scale = (Scale)2u]/30 - Testscale::Test(Scale) [with Scale scale = (Scale)1u]/6, badness -1073741827 Estimating body: Testscale::Test(Scale) [with Scale scale = (Scale)2u]/9 Known to be false: not inlined, op1 != 2, op1 changed size:0 time:0 enqueuing call Testscale::Test(Scale) [with Scale scale = (Scale)2u]/30 - Testscale::Test(Scale) [with Scale scale = (Scale)2u]/10, badness -1073741827 Estimating body: Testscale::Test(Scale) [with Scale scale = (Scale)3u]/13 Known to be false: not inlined, op1 != 3, op1 changed size:0 time:0 enqueuing call Testscale::Test(Scale) [with Scale scale = (Scale)2u]/30 - Testscale::Test(Scale) [with Scale scale = (Scale)3u]/14, badness -1073741826 Estimating body: Testscale::Test(Scale) [with Scale scale = (Scale)3u]/13 Known to be false: not inlined, op1 != 3, op1 changed size:0 time:0 Estimating body: Testscale::Test(Scale) [with Scale scale = (Scale)3u]/13 Known to be false: not inlined, op1 != 3, op1 changed size:0 time:0 Estimating body: Testscale::Test(Scale) [with Scale scale = (Scale)3u]/13 Known to be false: not inlined, op1 != 3, op1 changed size:0 time:0 Inlined into Testscale::Test(Scale) [with Scale scale = (Scale)3u] which now has time 13 and size 24,net change of -3. New minimal size reached: 111 Estimating body: Testscale::Test(Scale) [with Scale scale = (Scale)1u]/5 Known to be false: not inlined, op1 != 1, op1 changed size:0 time:0 Considering Testscale::Test(Scale) [with Scale scale = (Scale)1u]/5 with 27 size to be inlined into Testscale::Test(Scale) [with Scale scale = (Scale)0u]/2 in t.C:20 Estimated growth after inlined into all is +27 insns. Estimated badness is -1073741827, frequency 0.12. Badness calculation for Testscale::Test(Scale) [with Scale scale = (Scale)0u]/2 - Testscale::Test(Scale) [with Scale scale = (Scale)1u]/6 size growth -3, time 0 inline hints: declared_inline -1073741827: Growth -3 = 0 ... so we are inlining all over the place but don't really arrive at a point where no further useful inlining is left and inlining never has a growth effect in our theory and we continue to think that we shrink the overall unit by further inlining. Meanwhile the cgraph is full of millions of calls (but estimated to be never reached?): Considering Testscale::Test(Scale) [with Scale scale = (Scale)0u]/2 with 24 size to be inlined into Testscale::Test(Scale)
[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315 --- Comment #6 from Richard Biener rguenth at gcc dot gnu.org --- Btw, the smaller testcase (E4 case commented) shows exactly the same behavior, we just seem to be exponential so only adding E4 makes it really bad.
[Bug libstdc++/59894] Force use of the default new/delete
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59894 --- Comment #4 from Marc Glisse glisse at gcc dot gnu.org --- (In reply to Marc Glisse from comment #0) PR 59893 considers a different path using LTO to inline at link time the definition from libsupc++. Note that doing both at the same time: 1) provide an inline version of new 2) LTO-link with libsupc++ might interact badly (or not, I haven't checked). We should test it and if needed warn about it in the documentation or preferably find a workaround.
[Bug ipa/60315] [4.8/4.9 Regression] template constructor switch optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315 --- Comment #7 from Richard Biener rguenth at gcc dot gnu.org --- Note that we seem to fail to update BB predicates for switch stmts. size:0.00, time:0.00, predicate:(true) size:3.00, time:2.00, predicate:(not inlined) size:2.00, time:2.00, predicate:(op1 changed) size:8.00, time:3.20, predicate:(op1 changed) (op1 != 2) I cannot interpret the size 2 case (what is op1 changed?), but in that case we actually shrink compared to not inlining. As op1 != 2 makes it more restrictive it's odd that that increases the metrics. As inliner I would inline all the (op1 changed) cases, thus in the above case op1 == 2. We seem to inline fully until hitting the case where only recursive edges are left. Even for only two switch cases we inline 5(!) calls into TestE1. Ah, the switch isn't handled by the predicates because BB 3 predicate:(op1 != 1) s.1_4 = (int) s_2(D); freq:0.80 size: 0 time: 0 switch (s.1_4) default: L8, case 0: L1, case 1: L2 it is not unmodified_parm_or_parm_agg_item (). The parameter is unsigned int so the cast is not value-preserving. Of course in general we can't rely on proper predicates so we need to avoid exploding reliably.
[Bug preprocessor/58580] [4.8 Regression] preprocessor goes OOM with warning for zero literals
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58580 Dodji Seketeli dodji at gcc dot gnu.org changed: What|Removed |Added Status|REOPENED|RESOLVED Resolution|--- |FIXED --- Comment #12 from Dodji Seketeli dodji at gcc dot gnu.org --- This fall-out seems fixed now on trunk by commit r207046. Sorry for the inconvenience.
[Bug preprocessor/58580] [4.8 Regression] preprocessor goes OOM with warning for zero literals
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58580 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Status|RESOLVED|REOPENED Resolution|FIXED |--- --- Comment #13 from Richard Biener rguenth at gcc dot gnu.org --- Still broken on the branch as far as I can see.
[Bug c++/58950] Missing statement has no effect
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58950 Marc Glisse glisse at gcc dot gnu.org changed: What|Removed |Added CC||paolo.carlini at oracle dot com --- Comment #14 from Marc Glisse glisse at gcc dot gnu.org --- (In reply to Marc Glisse from comment #7) The current patch breaks g++.dg/ext/vla13.C (PR 54583), but nothing else covered by the testsuite, so it is tempting to see if there are other ways to fix PR 54583. Paolo, do you have an opinion on this PR?
[Bug lto/60319] wrong code (that hangs) by LTO at -Os and above on x86_64-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60319 --- Comment #5 from Zhendong Su su at cs dot ucdavis.edu --- Did you separately compile the two files at -O0 and link at -Os, like below? $ gcc-trunk -flto -O0 -c foo.c $ gcc-trunk -flto -O0 -c main.c $ gcc-trunk -flto -Os foo.o main.o
[Bug rtl-optimization/50677] volatile forces load into register
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50677 --- Comment #4 from H.J. Lu hjl.tools at gmail dot com --- Combine generates Trying 6, 7 - 8: Failed to match this instruction: (set (mem/v:SI (reg/v/f:DI 85 [ i ]) [2 *i_2(D)+0 S4 A32]) (plus:SI (mem/v:SI (reg/v/f:DI 85 [ i ]) [2 *i_2(D)+0 S4 A32]) (const_int 1 [0x1]))) from (insn 6 3 7 2 (set (reg:SI 83 [ D.1752 ]) (mem/v:SI (reg/v/f:DI 85 [ i ]) [2 *i_2(D)+0 S4 A32])) x.i:1 90 {*movsi_ internal} (nil)) (insn 7 6 8 2 (parallel [ (set (reg:SI 84 [ D.1752 ]) (plus:SI (reg:SI 83 [ D.1752 ]) (const_int 1 [0x1]))) (clobber (reg:CC 17 flags)) ]) x.i:1 266 {*addsi_1} (expr_list:REG_DEAD (reg:SI 83 [ D.1752 ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil Why doesn't combine include (clobber (reg:CC 17 flags))?
[Bug c++/58950] Missing statement has no effect
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58950 --- Comment #15 from Paolo Carlini paolo.carlini at oracle dot com --- I don't think you simply want a better fix for 54583, because for the testcase in #Comment 13 the new conditional setting TREE_NO_WARNING isn't used. Otherwise, I think it would be easy to tighten it via array_of_runtime_bound_p.
[Bug c++/58950] Missing statement has no effect
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58950 --- Comment #16 from Marc Glisse glisse at gcc dot gnu.org --- (In reply to Paolo Carlini from comment #15) I don't think you simply want a better fix for 54583, because for the testcase in #Comment 13 the new conditional setting TREE_NO_WARNING isn't used. Otherwise, I think it would be easy to tighten it via array_of_runtime_bound_p. The issue isn't with setting the bit but reading it. If you look at the patch, I remove a test for TREE_NO_WARNING (expr). This breaks 54583 because the TREE_NO_WARNING bit is then ignored.
[Bug rtl-optimization/50677] volatile forces load into register
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50677 --- Comment #5 from Andrew Pinski pinskia at gcc dot gnu.org --- (In reply to H.J. Lu from comment #4) Why doesn't combine include (clobber (reg:CC 17 flags))? It has nothing to do with the clobber. Inside combine_instructions there is a call to init_recog_no_volatile which forces volatile memory not be recognized. The main reason is because combine does not check for volatile memory issues before doing the combine so it was easier to just disable recognizability of volatile memory.
[Bug c++/58950] Missing statement has no effect
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58950 --- Comment #17 from Paolo Carlini paolo.carlini at oracle dot com --- Yes, I know that. What I'm saying is that other code may want to see that TREE_NO_WARNING honored, the issue doesn't have much to do with 54583 per se. In my personal opinion removing a TREE_NO_WARNING check is in general a pretty risky thing to do, because unfortunately we have only that generic bit and we use it in many different circumstances.
[Bug lto/60319] wrong code (that hangs) by LTO at -Os and above on x86_64-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60319 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2014-02-24 Ever confirmed|0 |1 --- Comment #6 from Richard Biener rguenth at gcc dot gnu.org --- (In reply to Zhendong Su from comment #5) Did you separately compile the two files at -O0 and link at -Os, like below? $ gcc-trunk -flto -O0 -c foo.c $ gcc-trunk -flto -O0 -c main.c $ gcc-trunk -flto -Os foo.o main.o Ah, no. The issue here is that the fix for that bug I mention triggers on TYPE_OVERFLOW_UNDEFINED, but with -O[01] we have -fno-strict-overflow enabled and thus we lower it in a bogus way while with -O[s23] we have -fstrict-overflow. This is a IL semantic change that is not actually contained in the IL ... Anyway, confirmed.
[Bug lto/60319] wrong code (that hangs) by LTO at -Os and above on x86_64-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60319 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #7 from Richard Biener rguenth at gcc dot gnu.org --- Note that such bugs may occur generally when mixing -O[01] with -O[s23] ... for a similar case, -f[no-]strict-aliasing we get away with streaming get_alias_set () == 0. Thus I think we have to conservatively merge -fstrict-overflow, similar to how we treat -ffp-contract. I have a patch.
[Bug c++/60312] [4.9 Regression] [c++1y] ICE using auto as template parameter
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60312 Jason Merrill jason at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #1 from Jason Merrill jason at gcc dot gnu.org --- Fixed.
[Bug c++/60312] [4.9 Regression] [c++1y] ICE using auto as template parameter
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60312 --- Comment #2 from Jason Merrill jason at gcc dot gnu.org --- Author: jason Date: Mon Feb 24 18:47:20 2014 New Revision: 208092 URL: http://gcc.gnu.org/viewcvs?rev=208092root=gccview=rev Log: PR c++/60312 * parser.c (cp_parser_template_type_arg): Check for invalid 'auto'. Added: trunk/gcc/testsuite/g++.dg/cpp1y/auto-neg1.C Modified: trunk/gcc/cp/ChangeLog trunk/gcc/cp/parser.c
[Bug c++/60146] [4.8/4.9 Regression] ICE when compiling this code with -fopenmp
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60146 Jason Merrill jason at gcc dot gnu.org changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |jason at gcc dot gnu.org
[Bug c++/60328] New: [4.8/4.9 Regression] [c++11] ICE/Rejection with specialization in variadic template
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60328 Bug ID: 60328 Summary: [4.8/4.9 Regression] [c++11] ICE/Rejection with specialization in variadic template Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: reagentoo at gmail dot com The following valid code snippet (compiled with -std=c++11) rejected in GCC 4.9.0 (20130909) and triggers an ICE in GCC 4.8.2: template class _T, class... _Rest struct Foo { template class _TT, class... _RR using Bar = Foo_TT, _RR...; using Normal = Foo_Rest...; using Fail = Bar_Rest...; }; GCC 4.8.2 output: internal compiler error: Segmentation fault using Fail = Bar_Rest...; ^ Please submit a full bug report, with preprocessed source if appropriate. See https://bugs.gentoo.org/ for instructions. GCC 4.9.0 output: 8 : error: pack expansion argument for non-pack parameter ‘_TT’ of alias template ‘template template using Bar = Foo_TT, _RR ...’ using Fail = Bar_Rest...; ^
[Bug other/60329] New: Fix Typo
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60329 Bug ID: 60329 Summary: Fix Typo Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: alangiderick at gmail dot com Created attachment 32206 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32206action=edit Fix Typo Hello Fixed a typo in call.h header file in the comment.
[Bug other/60330] New: Licensed an unlicensed file
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60330 Bug ID: 60330 Summary: Licensed an unlicensed file Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: alangiderick at gmail dot com Created attachment 32207 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32207action=edit Licensed a file Hi I licensed a file that had the license missing at the begginning of the code. Thanks
[Bug c++/60331] New: ICE with OpenMP #pragma omp declare reduction in template class
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60331 Bug ID: 60331 Summary: ICE with OpenMP #pragma omp declare reduction in template class Product: gcc Version: 4.9.0 Status: UNCONFIRMED Keywords: ice-on-valid-code, openmp Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: reichelt at gcc dot gnu.org The following valid(?) code snippet (compiled with -std=c++11 -fopenmp) triggers an ICE on trunk: === templatetypename struct A { #pragma omp declare reduction (x : int : omp_out += omp_in) initializer (omp_priv = omp_priv) }; === bug.cc: In static member function 'static void A template-parameter-1-1 ::omp declare reduction x~i(int)': bug.cc:3:87: sorry, unimplemented: unexpected AST of kind decl_expr #pragma omp declare reduction (x : int : omp_out += omp_in) initializer (omp_priv = omp_priv) ^ bug.cc:3:87: internal compiler error: in potential_constant_expression_1, at cp/semantics.c:10553 0x746309 potential_constant_expression_1 ../../gcc/gcc/cp/semantics.c:10553 0x6105a8 fold_non_dependent_expr_sfinae(tree_node*, int) ../../gcc/gcc/cp/pt.c:5111 0x610619 build_non_dependent_expr(tree_node*) ../../gcc/gcc/cp/pt.c:21306 0x72b0e0 finish_expr_stmt(tree_node*) ../../gcc/gcc/cp/semantics.c:688 0x6d2629 cp_parser_omp_declare_reduction_exprs ../../gcc/gcc/cp/parser.c:30638 0x6d404a cp_parser_late_parsing_for_member ../../gcc/gcc/cp/parser.c:23503 0x6ae0bc cp_parser_class_specifier_1 ../../gcc/gcc/cp/parser.c:19451 0x6ae0bc cp_parser_class_specifier ../../gcc/gcc/cp/parser.c:19482 0x6ae0bc cp_parser_type_specifier ../../gcc/gcc/cp/parser.c:14305 0x6c6ec0 cp_parser_decl_specifier_seq ../../gcc/gcc/cp/parser.c:11547 0x6ccb63 cp_parser_single_declaration ../../gcc/gcc/cp/parser.c:23082 0x6cd044 cp_parser_template_declaration_after_export ../../gcc/gcc/cp/parser.c:22958 0x6d83e9 cp_parser_declaration ../../gcc/gcc/cp/parser.c:10947 0x6d6ed8 cp_parser_declaration_seq_opt ../../gcc/gcc/cp/parser.c:10869 0x6d877a cp_parser_translation_unit ../../gcc/gcc/cp/parser.c:4014 0x6d877a c_parse_file() ../../gcc/gcc/cp/parser.c:31582 0x7f79f3 c_common_parse_file() ../../gcc/gcc/c-family/c-opts.c:1060 Please submit a full bug report, [etc.] Without -std=c++11 or without the template, the code compiles fine.
[Bug c++/60332] New: [c++1y] ICE with auto in function-pointer cast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60332 Bug ID: 60332 Summary: [c++1y] ICE with auto in function-pointer cast Product: gcc Version: 4.9.0 Status: UNCONFIRMED Keywords: ice-on-valid-code, lto Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: reichelt at gcc dot gnu.org The following valid(?) code snippet (compiled with -std=c++1y -flto) triggers an ICE on trunk: = void foo(); auto f = (auto(*)())(foo); = bug.cc:3:27: internal compiler error: tree code 'template_type_parm' is not supported in LTO streams auto f = (auto(*)())(foo); ^ 0xaba08d DFS_write_tree ../../gcc/gcc/lto-streamer-out.c:1300 0xab941f DFS_write_tree_body ../../gcc/gcc/lto-streamer-out.c:476 0xab941f DFS_write_tree ../../gcc/gcc/lto-streamer-out.c:1208 0xab941f DFS_write_tree_body ../../gcc/gcc/lto-streamer-out.c:476 0xab941f DFS_write_tree ../../gcc/gcc/lto-streamer-out.c:1208 0xab941f DFS_write_tree_body ../../gcc/gcc/lto-streamer-out.c:476 0xab941f DFS_write_tree ../../gcc/gcc/lto-streamer-out.c:1208 0xabb727 lto_output_tree(output_block*, tree_node*, bool, bool) ../../gcc/gcc/lto-streamer-out.c:1390 0xab5aef write_global_stream ../../gcc/gcc/lto-streamer-out.c:2100 0xabd99e lto_output_decl_state_streams ../../gcc/gcc/lto-streamer-out.c:2144 0xabd99e produce_asm_for_decls() ../../gcc/gcc/lto-streamer-out.c:2429 0xaffe4f write_lto ../../gcc/gcc/passes.c:2297 0xb02ec0 ipa_write_summaries_1 ../../gcc/gcc/passes.c:2356 0xb02ec0 ipa_write_summaries() ../../gcc/gcc/passes.c:2413 0x891cf7 ipa_passes ../../gcc/gcc/cgraphunit.c:2078 0x891cf7 compile() ../../gcc/gcc/cgraphunit.c:2174 0x892224 finalize_compilation_unit() ../../gcc/gcc/cgraphunit.c:2329 0x68deee cp_write_global_declarations() ../../gcc/gcc/cp/decl2.c:4449 Please submit a full bug report, [etc.]
[Bug c++/37140] type inherited from base class not recognized
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37140 --- Comment #15 from fabien at gcc dot gnu.org --- Author: fabien Date: Mon Feb 24 20:27:34 2014 New Revision: 208093 URL: http://gcc.gnu.org/viewcvs?rev=208093root=gccview=rev Log: 2014-02-24 Fabien Chene fab...@gcc.gnu.org PR c++/37140 * parser.c (cp_parser_nonclass_name): Call strip_using_decl and move the code handling dependent USING_DECLs... * name-lookup.c (strip_using_decl): ...Here. 2014-02-24 Fabien Chene fab...@gcc.gnu.org PR c++/37140 * g++.dg/template/using27.C: New. * g++.dg/template/using28.C: New. * g++.dg/template/using29.C: New. Added: branches/gcc-4_8-branch/gcc/testsuite/g++.dg/template/using27.C branches/gcc-4_8-branch/gcc/testsuite/g++.dg/template/using28.C branches/gcc-4_8-branch/gcc/testsuite/g++.dg/template/using29.C Modified: branches/gcc-4_8-branch/gcc/cp/ChangeLog branches/gcc-4_8-branch/gcc/cp/name-lookup.c branches/gcc-4_8-branch/gcc/cp/parser.c branches/gcc-4_8-branch/gcc/testsuite/ChangeLog
[Bug c++/60312] [4.9 Regression] [c++1y] ICE using auto as template parameter
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60312 Adam Butcher abutcher at gcc dot gnu.org changed: What|Removed |Added CC||abutcher at gcc dot gnu.org --- Comment #3 from Adam Butcher abutcher at gcc dot gnu.org --- I think this might have fixed PR c++/60311 too.
[Bug c++/60312] [4.9 Regression] [c++1y] ICE using auto as template parameter
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60312 --- Comment #4 from Volker Reichelt reichelt at gcc dot gnu.org --- I think this might have fixed PR c++/60311 too. Alas not, that one still crashes for me.
[Bug lto/60295] [4.9 Regression] Too many lto1-wpa-stream processes are forked
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60295 --- Comment #3 from H.J. Lu hjl.tools at gmail dot com --- lto_wpa_write_files has for (i = 0; i n_sets; i++) { ... stream_out (temp_filename, part-encoder, i == n_sets - 1); ... } n_sets is 32 when bootstrapping GCC. With parallel build, we may build cc1, cc1plus, f951, cc1obj at the same time. If machine is under heavy load, we may start 4*32 == 128 lto1-wpa-stream processes at the same time with -flto=jobserver. This patch: diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c index c676d79..4023036 100644 --- a/gcc/lto/lto.c +++ b/gcc/lto/lto.c @@ -3219,9 +3219,7 @@ do_whole_program_analysis (void) lto_parallelism = 1; /* TODO: jobserver communicatoin is not supported, yet. */ - if (!strcmp (flag_wpa, jobserver)) -lto_parallelism = -1; - else + if (strcmp (flag_wpa, jobserver)) { lto_parallelism = atoi (flag_wpa); if (lto_parallelism = 0) limits lto1-wpa-stream process to 1 if -flto=jobserver is used.
[Bug c++/60312] [4.9 Regression] [c++1y] ICE using auto as template parameter
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60312 --- Comment #5 from Adam Butcher abutcher at gcc dot gnu.org --- Actually strike that, my [local] changes relating to PR c++/60065 (http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01437.html) seem to have changed the behavior.
[Bug libstdc++/60333] New: type_traits make_signed, make_unsigned missing support for long long enumerations
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60333 Bug ID: 60333 Summary: type_traits make_signed, make_unsigned missing support for long long enumerations Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: harald at gigawatt dot nl #include type_traits enum E { e = 0x1 }; static_assert(sizeof(std::make_signedE::type) == sizeof(E), ); static_assert(sizeof(std::make_unsignedE::type) == sizeof(E), ); This fails on x86, because make_signed and make_unsigned never return a larger type than (un)signed long.
[Bug lto/60295] [4.9 Regression] Too many lto1-wpa-stream processes are forked
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60295 --- Comment #4 from Jan Hubicka hubicka at ucw dot cz --- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60295 --- Comment #3 from H.J. Lu hjl.tools at gmail dot com --- lto_wpa_write_files has for (i = 0; i n_sets; i++) { ... stream_out (temp_filename, part-encoder, i == n_sets - 1); ... } n_sets is 32 when bootstrapping GCC. With parallel build, we may build cc1, cc1plus, f951, cc1obj at the same time. If machine is under heavy load, we may start 4*32 == 128 lto1-wpa-stream processes at the same time with -flto=jobserver. This patch: diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c index c676d79..4023036 100644 --- a/gcc/lto/lto.c +++ b/gcc/lto/lto.c @@ -3219,9 +3219,7 @@ do_whole_program_analysis (void) lto_parallelism = 1; /* TODO: jobserver communicatoin is not supported, yet. */ - if (!strcmp (flag_wpa, jobserver)) -lto_parallelism = -1; - else + if (strcmp (flag_wpa, jobserver)) { lto_parallelism = atoi (flag_wpa); if (lto_parallelism = 0) limits lto1-wpa-stream process to 1 if -flto=jobserver is used. I think this is better variant $ svn diff ~/trunk/gcc/lto/lto.c Index: /aux/hubicka/trunk/gcc/lto/lto.c === --- /aux/hubicka/trunk/gcc/lto/lto.c(revision 207702) +++ /aux/hubicka/trunk/gcc/lto/lto.c(working copy) @@ -2491,7 +2491,7 @@ stream_out (char *temp_filename, lto_sym #ifdef HAVE_WORKING_FORK static int nruns; - if (!lto_parallelism || lto_parallelism == 1) + if (lto_parallelism = 1) { do_stream_out (temp_filename, encoder); return; I basically wanted to have simple jobserver client with lto_parallelism=-1, but I did not have time to implement it. (after glancing over GNU Make's implementation it seems actually bit non-trivial) I am adding Paul D. Smith into CC, since he wrote article on the implementation. Paul, we would like GCC to actually use jobserver to limit number of processes it forks internally for streaming. Is there an elegant and simple solution to get this implemented? I was thinking that if connecting to jobserver is hard, we can just produce makefile with rules waiting for read to finish and use it to get tokens into GCC streamer, but that seems somewhat kludgy. Jan
[Bug c++/60146] [4.8/4.9 Regression] ICE when compiling this code with -fopenmp
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60146 --- Comment #5 from Jason Merrill jason at gcc dot gnu.org --- Author: jason Date: Mon Feb 24 22:17:43 2014 New Revision: 208094 URL: http://gcc.gnu.org/viewcvs?rev=208094root=gccview=rev Log: PR c++/60146 * pt.c (tsubst_omp_for_iterator): Don't let substitution of the DECL_EXPR initialize a non-class iterator. Added: trunk/gcc/testsuite/g++.dg/gomp/for-20.C Modified: trunk/gcc/cp/ChangeLog trunk/gcc/cp/pt.c
[Bug c++/60146] [4.8 Regression] ICE when compiling this code with -fopenmp
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60146 Jason Merrill jason at gcc dot gnu.org changed: What|Removed |Added Summary|[4.8/4.9 Regression] ICE|[4.8 Regression] ICE when |when compiling this code|compiling this code with |with -fopenmp |-fopenmp --- Comment #6 from Jason Merrill jason at gcc dot gnu.org --- Fixed in 4.9 so far.
[Bug c++/60328] [4.8/4.9 Regression] [c++11] ICE/Rejection with specialization in variadic template
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60328 Jason Merrill jason at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||jason at gcc dot gnu.org Resolution|--- |DUPLICATE --- Comment #1 from Jason Merrill jason at gcc dot gnu.org --- GCC 4.9 implements the tentative resolution of DR 1430. http://open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#1430 *** This bug has been marked as a duplicate of bug 51239 ***
[Bug lto/60295] [4.9 Regression] Too many lto1-wpa-stream processes are forked
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60295 --- Comment #5 from H.J. Lu hjl.tools at gmail dot com --- (In reply to Jan Hubicka from comment #4) I think this is better variant $ svn diff ~/trunk/gcc/lto/lto.c Index: /aux/hubicka/trunk/gcc/lto/lto.c === --- /aux/hubicka/trunk/gcc/lto/lto.c(revision 207702) +++ /aux/hubicka/trunk/gcc/lto/lto.c(working copy) @@ -2491,7 +2491,7 @@ stream_out (char *temp_filename, lto_sym #ifdef HAVE_WORKING_FORK static int nruns; - if (!lto_parallelism || lto_parallelism == 1) + if (lto_parallelism = 1) { do_stream_out (temp_filename, encoder); return; I basically wanted to have simple jobserver client with lto_parallelism=-1, but I did not have time to implement it. (after glancing over GNU Make's implementation it seems actually bit non-trivial) This works for me. Can you check it in?
[Bug c++/51239] [DR 1430] ICE with variadic template alias
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51239 Jason Merrill jason at gcc dot gnu.org changed: What|Removed |Added CC||reagentoo at gmail dot com --- Comment #7 from Jason Merrill jason at gcc dot gnu.org --- *** Bug 60328 has been marked as a duplicate of this bug. ***
[Bug lto/60295] [4.9 Regression] Too many lto1-wpa-stream processes are forked
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60295 --- Comment #6 from Jan Hubicka hubicka at gcc dot gnu.org --- Author: hubicka Date: Mon Feb 24 22:58:44 2014 New Revision: 208097 URL: http://gcc.gnu.org/viewcvs?rev=208097root=gccview=rev Log: PR lto/60295 * lto.c (stream_out): Avoid parallel streaming with -flto=jobserver until we are able to throttle it down resonably. Modified: trunk/gcc/lto/ChangeLog trunk/gcc/lto/lto.c
[Bug fortran/60334] New: Segmentation fault on character pointer assignments
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60334 Bug ID: 60334 Summary: Segmentation fault on character pointer assignments Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: antony at cosmologist dot info This compiles OK program tester implicit none character(LEN=:), pointer :: Y character(LEN=0), target :: Empty_String = '' Y = test() print *, Y contains function test() result(P) character(LEN=:), pointer :: P P= Empty_String end function end program but compiled with gfortran -Og when run gives Program received signal 11 (SIGSEGV): Segmentation fault. Backtrace for this error: + [0xb776c400] + /lib/i386-linux-gnu/libc.so.6(+0x13afcb) [0xb7529fcb] + in the main program from file TestClass.f90 + /lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0xb74084d3] *** glibc detected *** ./a.out: free(): invalid pointer: 0x09be0898 *** Other compiler options don't so reliably crash, but still probably invalid code (I think). It's not specific to the string having zero length.
[Bug c++/60335] New: confused by earlier errors, bailing out
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60335 Bug ID: 60335 Summary: confused by earlier errors, bailing out Product: gcc Version: 4.7.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: vanyacpp at gmail dot com struct baz0 { int baz1(void bar0, struct bar0 {} bar3); };
[Bug c++/60065] [c++1y] ICE with auto parameter pack
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60065 --- Comment #3 from Adam Butcher abutcher at gcc dot gnu.org --- Author: abutcher Date: Tue Feb 25 03:47:24 2014 New Revision: 208106 URL: http://gcc.gnu.org/viewcvs?rev=208106root=gccview=rev Log: Fix PR c++/60065. PR c++/60065 * parser.c (cp_parser_direct_declarator): Don't save and restore num_template_parameter_lists around call to cp_parser_parameter_declaration_list. (function_being_declared_is_template_p): New predicate. (cp_parser_parameter_declaration_list): Use function_being_declared_is_template_p as predicate for inspecting current function template parameter list length rather than num_template_parameter_lists. PR c++/60065 * g++.dg/cpp1y/pr60065.C: New testcase. Added: trunk/gcc/testsuite/g++.dg/cpp1y/pr60065.C Modified: trunk/gcc/cp/ChangeLog trunk/gcc/cp/parser.c trunk/gcc/testsuite/ChangeLog
[Bug c++/60328] [4.8/4.9 Regression] [c++11] ICE/Rejection with specialization in variadic template
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60328 --- Comment #2 from reagentoo at gmail dot com --- (In reply to Jason Merrill from comment #1) GCC 4.9 implements the tentative resolution of DR 1430. http://open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#1430 *** This bug has been marked as a duplicate of bug 51239 *** But this test-case compiles with Clang normal.
[Bug rtl-optimization/49847] [4.7/4.8/4.9 Regression] NULL deref in fold_rtx (prev_insn_cc0 == NULL)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49847 --- Comment #34 from Jeffrey A. Law law at redhat dot com --- OK. Then I suggest two immediate things to do. 1. Fix the documentation for cc0 targets to indicate that the setter/user no longer have to be consecutive, particularly in the presence of flag_trapping_math. 2. Fault in fixes. While a review of every bit of HAVE_cc0 code is warranted, I'm not terribly inclined as HAVE_cc0 targets simply aren't that important anymore.
[Bug middle-end/60292] [4.9 Regression] ICE: in fill_vec_av_set, at sel-sched.c:3863 with -m64 after r206174 on powerpc-apple-darwin9.8.0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60292 --- Comment #7 from Andrey Belevantsev abel at gcc dot gnu.org --- Author: abel Date: Tue Feb 25 06:35:09 2014 New Revision: 208109 URL: http://gcc.gnu.org/viewcvs?rev=208109root=gccview=rev Log: PR rtl-optimization/60292 * sel-sched.c (fill_vec_av_set): Do not reset target availability bit fot the fence instruction. Modified: trunk/gcc/ChangeLog trunk/gcc/sel-sched.c
[Bug rtl-optimization/60155] ICE: in get_pressure_class_and_nregs at gcse.c:3438
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60155 Jeffrey A. Law law at redhat dot com changed: What|Removed |Added CC||law at redhat dot com --- Comment #6 from Jeffrey A. Law law at redhat dot com --- Well, given that gcse merely moves evaluations to other blocks where evaluation of the expression is always anticipated, there's no inherent reason why we can't gcse something that might trap. It really feels like this is papering over the real problem, namely that get_pressure_class_and_nregs simply doesn't handle things that are passed through hash_scan_set, in particular cases where there's a single set inside a PARALLEL.
[Bug c++/60311] [c++1y] ICE with pointer-to-function with auto parameter
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60311 --- Comment #1 from Adam Butcher abutcher at gcc dot gnu.org --- Author: abutcher Date: Tue Feb 25 06:44:53 2014 New Revision: 208111 URL: http://gcc.gnu.org/viewcvs?rev=208111root=gccview=rev Log: Fix PR c++/60311. PR c++/60311 * parser.c (function_being_declared_is_template_p): Return false when processing a template parameter list. (cp_parser_parameter_declaration_clause): Don't set auto_is_implicit_function_template_parm_p when processing a template parameter list. PR c++/60311 * g++.dg/cpp1y/pr60311.C: New testcase. Added: trunk/gcc/testsuite/g++.dg/cpp1y/pr60311.C Modified: trunk/gcc/cp/ChangeLog trunk/gcc/cp/parser.c trunk/gcc/testsuite/ChangeLog
[Bug c++/60305] ICE constexpr array of functions in template
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60305 Daniel Krügler daniel.kruegler at googlemail dot com changed: What|Removed |Added CC||daniel.kruegler@googlemail. ||com --- Comment #1 from Daniel Krügler daniel.kruegler at googlemail dot com --- Seems to be fixed in 4.9.0 head. Note that your func function template is broken, because it is declared as returning void, but obviously returns a non-void result.
[Bug middle-end/60292] [4.9 Regression] ICE: in fill_vec_av_set, at sel-sched.c:3863 with -m64 after r206174 on powerpc-apple-darwin9.8.0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60292 Andrey Belevantsev abel at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #8 from Andrey Belevantsev abel at gcc dot gnu.org --- Fixed on trunk. No need to backport as the initial assert patch was committed to trunk only.
Re: Fix caller-save.c:add_used_regs_1 handling of pseudos
Steven Bosscher stevenb@gmail.com writes: On Sun, Feb 23, 2014 at 10:14 PM, Richard Sandiford wrote: I noticed in passing that this 4.7 cleanup: http://article.gmane.org/gmane.comp.gcc.patches/224853 ... so that nothing happens for pseudos. I've no idea whether this makes a difference in practice or not but it seems safer to restore the old behaviour. Tested on mipsisa64-sde-elf rather than x86_64-linux-gnu since it only affects reload targets. OK to install? If it's worked since GCC 4.7, why restore that code? OK, fair enough. I'll withdraw the patch. Thanks, Richard
New Serbian PO file for 'cpplib' (version 4.9-b20140202)
Hello, gentle maintainer. This is a message from the Translation Project robot. A revised PO file for textual domain 'cpplib' has been submitted by the Serbian team of translators. The file is available at: http://translationproject.org/latest/cpplib/sr.po (This file, 'cpplib-4.9-b20140202.sr.po', has just now been sent to you in a separate email.) All other PO files for your package are available in: http://translationproject.org/latest/cpplib/ Please consider including all of these in your next release, whether official or a pretest. Whenever you have a new distribution with a new version number ready, containing a newer POT file, please send the URL of that distribution tarball to the address below. The tarball may be just a pretest or a snapshot, it does not even have to compile. It is just used by the translators when they need some extra translation context. The following HTML page has been updated: http://translationproject.org/domain/cpplib.html If any question arises, please contact the translation coordinator. Thank you for all your work, The Translation Project robot, in the name of your translation coordinator. coordina...@translationproject.org
Contents of PO file 'cpplib-4.9-b20140202.sr.po'
cpplib-4.9-b20140202.sr.po.gz Description: Binary data The Translation Project robot, in the name of your translation coordinator. coordina...@translationproject.org
Re: [PATCH] Fix a typo in sparseset_pop
On Mon, Feb 24, 2014 at 6:52 AM, Carrot Wei car...@google.com wrote: Hi The following patch fixes an obvious wrong index used to access the dense array. The patch has passed the bootstrap and regression tests on x86-64. OK for trunk? Ok. Thanks, Richard. thanks Carrot 2014-02-23 Guozhi Wei car...@google.com * sparseset.h (sparseset_pop): Fix the wrong index. Index: sparseset.h === --- sparseset.h (revision 208039) +++ sparseset.h (working copy) @@ -177,7 +177,7 @@ gcc_checking_assert (mem != 0); s-members = mem - 1; - return s-dense[mem]; + return s-dense[s-members]; } static inline void
Re: builtin fe[gs]etround
On Sun, Feb 23, 2014 at 12:09 PM, Marc Glisse marc.gli...@inria.fr wrote: Hello, a natural first step to optimize changes of rounding modes seems to be making these 2 functions builtins. I don't know exactly how far optimizations will be able to go (the fact that fesetround can fail complicates things a lot). What is included here: 1) fegetround is pure. 2) Neither function aliases (use or clobber) any memory. I expect this is likely not true on all platforms, some probably store the rounding mode in a global variable that is accessible through other means (though mixing direct accesses with calls to fe*etround seems a questionable style). Any opinion or advice here? Regtested on x86_64-linux-gnu, certainly not for 4.9. Hohumm ... before making any of these functions less of a barrier than they are (at least for loads and stores), shouldn't we think of, and fix, the lack of any dependences between FP status word changes and actual arithmetic instructions? In fact, using 'pure' or 'not use/clobber memory' here is exactly walking on shaking grounds. Simply because we lack of a way to say that this stmt uses/clobbers the FP state (fegetround would be 'const' when following your logic in 2)). Otherwise, what is it worth optimizing^breaking things even more than we do now? [not that I have an answer for the FP state dependency that I like] Thanks, Richard. 2014-02-23 Marc Glisse marc.gli...@inria.fr gcc/ * builtins.def (BUILT_IN_FEGETROUND, BUILT_IN_FESETROUND): Add. * tree-ssa-alias.c (ref_maybe_used_by_call_p_1, call_may_clobber_ref_p_1): Handle them. gcc/testsuite/ * gcc.dg/tree-ssa/fegsetround.c: New file. -- Marc Glisse Index: gcc/builtins.def === --- gcc/builtins.def(revision 208045) +++ gcc/builtins.def(working copy) @@ -276,20 +276,22 @@ DEF_C99_BUILTIN(BUILT_IN_EXPM1F, DEF_C99_BUILTIN(BUILT_IN_EXPM1L, expm1l, BT_FN_LONGDOUBLE_LONGDOUBLE, ATTR_MATHFN_FPROUNDING_ERRNO) DEF_LIB_BUILTIN(BUILT_IN_FABS, fabs, BT_FN_DOUBLE_DOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_C99_C90RES_BUILTIN (BUILT_IN_FABSF, fabsf, BT_FN_FLOAT_FLOAT, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_C99_C90RES_BUILTIN (BUILT_IN_FABSL, fabsl, BT_FN_LONGDOUBLE_LONGDOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_GCC_BUILTIN(BUILT_IN_FABSD32, fabsd32, BT_FN_DFLOAT32_DFLOAT32, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_GCC_BUILTIN(BUILT_IN_FABSD64, fabsd64, BT_FN_DFLOAT64_DFLOAT64, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_GCC_BUILTIN(BUILT_IN_FABSD128, fabsd128, BT_FN_DFLOAT128_DFLOAT128, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_C99_BUILTIN(BUILT_IN_FDIM, fdim, BT_FN_DOUBLE_DOUBLE_DOUBLE, ATTR_MATHFN_FPROUNDING_ERRNO) DEF_C99_BUILTIN(BUILT_IN_FDIMF, fdimf, BT_FN_FLOAT_FLOAT_FLOAT, ATTR_MATHFN_FPROUNDING_ERRNO) DEF_C99_BUILTIN(BUILT_IN_FDIML, fdiml, BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE, ATTR_MATHFN_FPROUNDING_ERRNO) +DEF_C99_BUILTIN(BUILT_IN_FEGETROUND, fegetround, BT_FN_INT, ATTR_PURE_NOTHROW_LEAF_LIST) +DEF_C99_BUILTIN(BUILT_IN_FESETROUND, fesetround, BT_FN_INT_INT, ATTR_NOTHROW_LEAF_LIST) DEF_LIB_BUILTIN(BUILT_IN_FLOOR, floor, BT_FN_DOUBLE_DOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_C99_C90RES_BUILTIN (BUILT_IN_FLOORF, floorf, BT_FN_FLOAT_FLOAT, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_C99_C90RES_BUILTIN (BUILT_IN_FLOORL, floorl, BT_FN_LONGDOUBLE_LONGDOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_C99_BUILTIN(BUILT_IN_FMA, fma, BT_FN_DOUBLE_DOUBLE_DOUBLE_DOUBLE, ATTR_MATHFN_FPROUNDING) DEF_C99_BUILTIN(BUILT_IN_FMAF, fmaf, BT_FN_FLOAT_FLOAT_FLOAT_FLOAT, ATTR_MATHFN_FPROUNDING) DEF_C99_BUILTIN(BUILT_IN_FMAL, fmal, BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE, ATTR_MATHFN_FPROUNDING) DEF_C99_BUILTIN(BUILT_IN_FMAX, fmax, BT_FN_DOUBLE_DOUBLE_DOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_C99_BUILTIN(BUILT_IN_FMAXF, fmaxf, BT_FN_FLOAT_FLOAT_FLOAT, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_C99_BUILTIN(BUILT_IN_FMAXL, fmaxl, BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_C99_BUILTIN(BUILT_IN_FMIN, fmin, BT_FN_DOUBLE_DOUBLE_DOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST) Index: gcc/testsuite/gcc.dg/tree-ssa/fegsetround.c === --- gcc/testsuite/gcc.dg/tree-ssa/fegsetround.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/fegsetround.c (working copy) @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-options -std=c99 -O -fdump-tree-optimized } */ + +#include fenv.h + +int a; +int f () +{ + a = 42; + // don't read a + int x = fegetround (); + fesetround (x + 1); + a = 0; + return a; +} +int g () +{ + a = 0; + // don't write a + int x = fegetround (); + fesetround (x + 1); + return a; +} +int h () +{ + // pure + return fegetround () -
Re: [v3] complex functions with expression template reals
Hi, On 02/23/2014 04:11 PM, Marc Glisse wrote: On Sun, 23 Feb 2014, Paolo Carlini wrote: On 02/23/2014 11:32 AM, Marc Glisse wrote: Hello, looking at this question: http://stackoverflow.com/q/21737186/1918193 I was surprised to see that libstdc++'s std::complex basically just works with user-defined types, even weird expression template ones, although that's not a supported use afaik. The only functions that fail seem to be exp and pow, both because they call polar with two arguments that have different (expression) types. I am not proposing to make this a supported use, but the cost of this small patch seems very low, and if it makes a couple users happy... Regtested with no problem on x86_64-linux-gnu, ok for stage 1? I would even be in favor of applying it now. Can we figure out simple (ie, not relying on boost...) testcases too? I didn't try std::complexstd::valarrayX, maybe... Otherwise, you need a type T with all the (real) math functions defined, and where every operation returns a different type (implicitly convertible to T). And then you want to call all the complex functions. That seems doable, but way bigger than I'm willing to go for this feature. If you want to take over, be my guest ;-) Another option would be just using boost/multiprecision/mpfr.hpp when available. In general, I think it makes sense to have a minimum of infrastructure enabling tests checking interoperability with boost. If only we had a check_v3_target_header { args } it would be most of it, but it doesn't seem we do?!? Anyway I guess we can take care of that post 4.9.0 and commit the straightforward code tweak now. Jon? Paolo. PS: Resending message, yesterday had issues with html
[PATCH ARM]: Fix more -mapcs-frame failures
This patch improves the one sent previously, (http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01159.html), to fix a few more failures in the testsuite that could arise with shrink-wrap and -fexceptions. To recall, the problem that it fixes is that with -mapcs-frame : - the epilogue pops as sub sp, fp, #12 @ does not set FRAME_RELATED_P ldmia sp, {fp, sp, lr} @ XXX assert def_cfa-reg is FP instead of SP - with vrp this is worse, we have fldmfdd ip!, {d8}@ FRAME_RELATED_P sub sp, fp, #20 ... ldmfd sp, {r3, r4, fp, sp, pc} @ XXX assert def_cfa-reg is IP instead of SP, Fixed by inserting a REG_CFA_DEF_CFA note, fixing the arm_unwind_emit machinery and setting the FRAME_RELATED_P . The comment says : /* The INSN is generated in epilogue. It is set as RTX_FRAME_RELATED_P to get correct dwarf information for shrink-wrap. We should not emit unwind information for it because these are used either for pretend arguments or notes to adjust sp and restore registers from stack. */ the testsuite score improves without regression (improvements from -g and -fexeptions tests) === gcc Summary for arm-sim//-mapcs-frame === # of expected passes77545 # of unexpected failures31 # of unexpected successes2 # of expected failures172 # of unsupported tests1336 === g++ Summary for arm-sim//-mapcs-frame === # of expected passes50116 # of unexpected failures9 # of unexpected successes3 # of expected failures280 # of unsupported tests1229 instead of === gcc Summary for arm-sim//-mapcs-frame === # of expected passes77106 # of unexpected failures500 # of unexpected successes2 # of expected failures172 # of unresolved testcases111 # of unsupported tests1336 === g++ Summary for arm-sim//-mapcs-frame === # of expected passes50021 # of unexpected failures136 # of unexpected successes3 # of expected failures280 # of unsupported tests1229 Comments ? OK for trunk ? Many thanks 2014-02-18 Christian Bruel christian.br...@st.com PR target/60264 * config/arm/arm.c (arm_emit_vfp_multi_reg_pop): Emit a REG_CFA_DEF_CFA note. (arm_expand_epilogue_apcs_frame): call arm_add_cfa_adjust_cfa_note. (arm_unwind_emit): Allow REG_CFA_DEF_CFA. 2014-02-18 Christian Bruel christian.br...@st.com PR target/60264 * gcc.target/arm/pr60264.c * gcc.target/arm/pr60264-2.c Index: gcc/config/arm/arm.c === --- gcc/config/arm/arm.c (revision 207942) +++ gcc/config/arm/arm.c (working copy) @@ -19909,8 +19909,15 @@ arm_emit_vfp_multi_reg_pop (int first_reg, int num par = emit_insn (par); REG_NOTES (par) = dwarf; - arm_add_cfa_adjust_cfa_note (par, 2 * UNITS_PER_WORD * num_regs, - base_reg, base_reg); + /* Make sure cfa doesn't leave with IP_REGNUM to allow unwinding fron FP. */ + if (TARGET_VFP REGNO (base_reg) == IP_REGNUM) +{ + RTX_FRAME_RELATED_P (par) = 1; + add_reg_note (par, REG_CFA_DEF_CFA, hard_frame_pointer_rtx); +} + else +arm_add_cfa_adjust_cfa_note (par, 2 * UNITS_PER_WORD * num_regs, + base_reg, base_reg); } /* Generate and emit a pattern that will be recognized as LDRD pattern. If even @@ -27098,15 +27105,19 @@ arm_expand_epilogue_apcs_frame (bool really_return if (TARGET_HARD_FLOAT TARGET_VFP) { int start_reg; + rtx ip_rtx = gen_rtx_REG (SImode, IP_REGNUM); /* The offset is from IP_REGNUM. */ int saved_size = arm_get_vfp_saved_size (); if (saved_size 0) { + rtx insn; floats_from_frame += saved_size; - emit_insn (gen_addsi3 (gen_rtx_REG (SImode, IP_REGNUM), - hard_frame_pointer_rtx, - GEN_INT (-floats_from_frame))); + insn = emit_insn (gen_addsi3 (ip_rtx, + hard_frame_pointer_rtx, + GEN_INT (-floats_from_frame))); + arm_add_cfa_adjust_cfa_note (insn, -floats_from_frame, + ip_rtx, hard_frame_pointer_rtx); } /* Generate VFP register multi-pop. */ @@ -27179,11 +27190,15 @@ arm_expand_epilogue_apcs_frame (bool really_return num_regs = bit_count (saved_regs_mask); if ((offsets-outgoing_args != (1 + num_regs)) || cfun-calls_alloca) { + rtx insn; emit_insn (gen_blockage ()); /* Unwind the stack to just below the saved registers. */ - emit_insn (gen_addsi3 (stack_pointer_rtx, - hard_frame_pointer_rtx, - GEN_INT (- 4 * num_regs))); + insn = emit_insn (gen_addsi3 (stack_pointer_rtx, +hard_frame_pointer_rtx, +GEN_INT (- 4 * num_regs))); + + arm_add_cfa_adjust_cfa_note (insn, - 4 * num_regs, + stack_pointer_rtx, hard_frame_pointer_rtx); }
Re: builtin fe[gs]etround
On Mon, Feb 24, 2014 at 10:02 AM, Richard Biener richard.guent...@gmail.com wrote: On Sun, Feb 23, 2014 at 12:09 PM, Marc Glisse marc.gli...@inria.fr wrote: Hello, a natural first step to optimize changes of rounding modes seems to be making these 2 functions builtins. I don't know exactly how far optimizations will be able to go (the fact that fesetround can fail complicates things a lot). What is included here: 1) fegetround is pure. 2) Neither function aliases (use or clobber) any memory. I expect this is likely not true on all platforms, some probably store the rounding mode in a global variable that is accessible through other means (though mixing direct accesses with calls to fe*etround seems a questionable style). Any opinion or advice here? Regtested on x86_64-linux-gnu, certainly not for 4.9. Hohumm ... before making any of these functions less of a barrier than they are (at least for loads and stores), shouldn't we think of, and fix, the lack of any dependences between FP status word changes and actual arithmetic instructions? In fact, using 'pure' or 'not use/clobber memory' here is exactly walking on shaking grounds. Simply because we lack of a way to say that this stmt uses/clobbers the FP state (fegetround would be 'const' when following your logic in 2)). Otherwise, what is it worth optimizing^breaking things even more than we do now? [not that I have an answer for the FP state dependency that I like] Just to elaborate on the two obvious options: 1) represent all arithmetic with builtins, using an extra explicit FP state argument and set / query that with the FP manipulation / query functions (also with every call) 2) use sth similar to virtual operands - conveniently the vuse/vdef members are present even for unary, binary and ternary assigns (you'd only use the vuse field here). Issues with calls (might consume/clobber FP state) - there the vop fields are already used, so you'd need to add an extra use (easy) and a def (ugh) eventually people wanted to get multiple defs for the simple stmts (assigns and calls) back for stuff like modeling CPU flags explicitely (the overflow flag for example). And FP ISAs now have support for per-stmt rounding mode flags (and element masks for vector instructions). Thus eventually this may be a good reason to support extra (but less efficient to get at / modify?) SSA(!) uses and defs to these stmt kinds. But it needs to be well-designed to not throw away the speedups and simplicity we gained when removing general support for multiple defs. (should be obvious that I lean towards 2) but am not very happy with the consequences for gimple data structures) Richard. Thanks, Richard.
Re: [ARM] [Trivial] Fix shortening of field name extend.
*ping*, CCing Jakub. Thanks, James On Wed, Feb 12, 2014 at 12:43:10PM +, Ramana Radhakrishnan wrote: On 02/12/14 12:19, James Greenhalgh wrote: Hi, In aarch-common-protos.h we define a field in alu_cost_table: extnd On its own this is an upsetting optimization of the English language, but this trouble is compounded by the comment attached to this field throughout the cost tables themselves: /* Extend. */ This patch fixes the spelling of extend to match that in the commemnts. I've checked that AArch64 and AArch32 build with this patch applied. OK for trunk/stage-1 (I don't mind which)? I am happy for this to go in now - Jakub ? regards Ramana Thanks, James --- 2014-02-12 James Greenhalgh james.greenha...@arm.com * config/arm/aarch-common-protos.h (alu_cost_table): Fix spelling of extend. * config/arm/arm.c (arm_new_rtx_costs): Fix spelling of extend. -- Ramana Radhakrishnan Principal Engineer ARM Ltd.