Re: Ideas for Google Summer of Code
So we can do Intel, ATI and NVIDIA GPU backends. NVIDIA already has an implementation of OpenCL working. http://www.nvidia.com/object/cuda_opencl.html. Would there be any sharing involved with them?? If you mean between backends, 1) do not underestimate the time needed to write a new GCC backend; 2) probably nothing can be shared (see for example the zero-sharing between PPU and SPU). I think that your 2) and 3) projects are way more viable. I am working on my proposal now and I will post it to this list before final submission (I've got to hurry, they are due April 3rd). I will mainly focus on this pdf: http://www.khronos.org/developers/library/overview/opencl_overview.pdf, Note that this *is* different from the backends you mentioned above, and as I said I think it is more viable. Make sure that what you propose is implementable without an OpenCL C compiler (I think it is), or discuss to what extent the library will be functional. For example, do you need another runtime library implementing the intrinsics used by kernels? Thanks, Paolo
[cond-optab] update after first round of testing (results for all targets)
I've finished the first round of testing on all targets and will be sending patches soon. Overall, I think the results are quite satisfying. For the current bunch of files, I get the same code on the following targets: m32c crx mmix xstormy16 fr30 v850 m32r iq2000 picochip mcore spu ia64 m68hc11 alpha frv e500* arm * I'm treating e500 as a different target than powerpc I get the same code except for unordered comparisons, which are improved, on the following targets: mips sparc I get the same code except for small scheduling changes with some option combinations on the following targets: m68k i386 rs6000 I get the same code with small improvements in instruction selection or delay slot scheduling on the following targets: vax avr cris h8300 I get slightly better code because of better optimization (especially if conversion) on the following targets: arc xtensa mn10300 score bfin I get large improvements on the following target: pdp11 I get overall a slight decrease in code quality, which is however offset by patches to expand that I've already posted, on the following targets: pa s390 I have not yet converted sh. I'll do so today. Next step is simulator testing for targets that, well, have a simulator. I'll be posting the target conversion patches soon. Indications about the options that I tested will be found there. The intention is to merge early in stage1 either as a series of commits or just one or anything in the middle. As usual, if people want me to switch to a public branch, just tell me. Paolo
Re: GCC 4.4.0 Status Report (2009-03-13)
On Sun, Mar 22, 2009 at 15:41, Richard Kenner ken...@vlsi1.ultra.nyu.edu wrote: I must admit that this interpretation is quite new to me. It certainly wasn't when EGCS reunited with gcc. I disagree. reuniting with GCC means reuniting with the FSF. ... but not raising a white flag. Paolo
Re: GCC 4.4.0 Status Report (2009-03-13)
Then you had the wrong understanding. The FSF has ALWAYS had the right to overrule technical decisions on ANY of their projects. The point is that this is a right they very rarely exercise. Of course, just I (and others) don't see why they should do it in this case. Delaying a *branch* is different from, say, using a proprietary version control or bug tracking system. Paolo
Re: GCC 4.4.0 Status Report (2009-03-13)
Btw, I cannot find anything related to this discussion (about whether and what power the FSF has to force their maintainers to do anything) in the official FSF documentation (http://www.gnu.org/prep/maintain/). Well, as the copyright owner and the appointer of maintainers it is pretty obvious that the FSF *can* do whatever they want. Obviously, since they are intelligent people they will usually just ask you to follow the GNU project guidelines (including the strong suggestions about using C). Paolo
Re: Proposed gfortran development branch
Note that merging the branch will be painful (as in, please dissect the branch into the individual patches again to make bisecting the trunk SVN possible). Also the SC vetoed these kind of 'integration' branches in the past (to not encourage starting an effective stage1 on a branch). I think that gfortran is managed in a sufficiently different way than the rest of GCC (e.g. they started having reviewers long before the rest of GCC did, and they appoint their own maintainers practically autonomously) that I don't think there is a reason to care. What you see in practice is Novell and Google doing stage1 work, and the volunteer gfortraners stuck. Besides, the rule against integration branches, while being extremely well founded, is bound to become obsolete. GCC developers could start using distributed version control and publishing their work on git.or.cz or github -- and after they pull from each other, all of the distributed repositories will be integration branches. Should the SC prohibit developing GCC with Mercurial or git? Paolo
Re: GCC 4.4.0 Status Report (2009-03-13)
I don't understand this. Why does the SC have little power in this matter? Surely you could decide to ship GCC 4.4 with the old license, as the official GCC maintainer? But you *choose* not to use this power (perhaps for good reasons, but I'm unconvinced). The GCC maintainers work on behalf of the FSF and in some matters defer to the FSF. It's that simple. Yes, but it's not written anywhere that release and especially branching policies are one of this matters. Personally, what I'd like to see is a clear justification of why a license change motivated by plugins needs to be on a branch that will never have plugins. There has been already a plugin branch or two for a while, obviously with the old license, and the FSF said nothing. The only reason I can see, would be to avoid that merges *to* a plugin branch include new 4.5 features. Paolo
Re: Automatic Parallelization Graphite - future plans
The most visible ongoing effort is the conversion from target macros to target hooks (which is incomplete). The goal was to allow hot swapping of backends. This is still the most obvious, most complete, and least unappealing (from a technical POV) approach IMHO. But Kaveh showed at one point that the compile time penalty of even just the partial conversion done so far is a few percentage points (somewhere between 3% and 5%, I don't recall the details). And also it's not nice and easy work so nobody is working on it actively AFAIK. It occurred to me at some point that using an indirect function call is useless. It would be much better to have, instead of the current targetm.foo syntax, something like TARGET(foo); this would expand to target_foo and be further remapped to the target hooks via aliases. Just by swapping targ* files you could choose whether to use function pointers if the target does not support aliases (or in the future if multiple backends are desired), or regular functions in the other case. Another problem is the mess of GO_IF_LEGITIMATE_ADDRESS and REG_OK_FOR_{BASE,INDEX}_P. These should be expressed as RTL constructs and constraints in my opinion. I had started a little work on that but never got very far. Paolo
[cond-optab] update
I now went through all backends except sh and made the required changes. So far all I tested is that gcc compiles with one target per port. :-) Plus, i386-linux bootstraps and regtests okay. Right now I aim at 100% identical assembly, maybe I'll have to relax that. Besides obvious register allocation differences (which did not happen for i386 on simple testcases, so it is possible to avoid them), I'm not sure I can achieve that on cc0 targets because of the tst patterns, but probably combine can be taught to try them if it is not already doing it. Each port took no more than 30-45 minutes to convert. It is very mechanical: you basically duplicate the cmp patterns into cbranch and cstore patterns and eliminate all occurrences of the *_compare_op variables from the emitters. Then you go through mov*cc and add*cc patterns, and replace *_compare_op there too (with elements of the comparison passed in operand 1). Then you zap all code you do not need. The positive surprises: PA was already very clean. bfin was very different from the others but easy. The only ports for which I substantially rewrote some of the code in a non-mechanical way are m32r and sparc, and mcore somewhat. The only ports that grew are cris, h8300 and i386. Overall over 5000 lines were deleted. Here is the diffstat: config/picochip 1 file changed, 1 insertion(+), 112 deletions(-) config/fr30 3 files changed, 7 insertions(+), 185 deletions(-) config/score 8 files changed, 18 insertions(+), 152 deletions(-) config/crx 3 files changed, 22 insertions(+), 97 deletions(-) config/cris 1 file changed, 29 insertions(+), 17 deletions(-) config/bfin 4 files changed, 31 insertions(+), 234 deletions(-) config/m68hc11 3 files changed, 33 insertions(+), 242 deletions(-) config/stormy16 4 files changed, 34 insertions(+), 99 deletions(-) config/iq2000 4 files changed, 39 insertions(+), 272 deletions(-) config/arc 3 files changed, 41 insertions(+), 318 deletions(-) config/v850 1 file changed, 42 insertions(+), 199 deletions(-) config/m32c 4 files changed, 50 insertions(+), 150 deletions(-) config/pa 4 files changed, 53 insertions(+), 520 deletions(-) config/mn10300 1 file changed, 54 insertions(+), 110 deletions(-) config/frv 3 files changed, 55 insertions(+), 197 deletions(-) config/xtensa 3 files changed, 58 insertions(+), 75 deletions(-) config/mmix 4 files changed, 63 insertions(+), 261 deletions(-) config/mcore 3 files changed, 67 insertions(+), 307 deletions(-) config/pdp11 3 files changed, 68 insertions(+), 447 deletions(-) config/vax 4 files changed, 81 insertions(+), 56 deletions(-) config/avr 1 file changed, 82 insertions(+), 155 deletions(-) config/h8300 3 files changed, 87 insertions(+), 73 deletions(-) config/mips 5 files changed, 93 insertions(+), 148 deletions(-) config/spu 3 files changed, 95 insertions(+), 181 deletions(-) config/s390 4 files changed, 96 insertions(+), 126 deletions(-) config/arm 4 files changed, 97 insertions(+), 350 deletions(-) config/alpha 4 files changed, 103 insertions(+), 277 deletions(-) config/rs6000 5 files changed, 126 insertions(+), 346 deletions(-) config/m32r 4 files changed, 155 insertions(+), 405 deletions(-) config/m68k 4 files changed, 156 insertions(+), 273 deletions(-) config/ia64 4 files changed, 190 insertions(+), 243 deletions(-) config/i386 4 files changed, 286 insertions(+), 207 deletions(-) config/sparc 4 files changed, 327 insertions(+), 861 deletions(-) Overall: 128 files changed, 2990 insertions(+), 8261 deletions(-) Paolo
Re: help for arm avr bfin cris frv h8300 m68k mcore mmix pdp11 rs6000 sh vax
Martin Guy wrote: On 3/14/09, Paolo Bonzini bonz...@gnu.org wrote: Hans-Peter Nilsson wrote: The answer to the question is no, but I'd guess the more useful answer is yes, for different definitions of truncate. Ok, after my patches you will be able to teach GCC about this definition of truncate. I expect it's a bit too extreme an example, but I've just found (to my horror) that the MaverickCrunch FPU truncates all its shift counts to 6-bit signed (-32(right) to +31(left)), including on 64-bit integers, which is not very helpful to compile for. ...unless it happens to come easy to handle shift count is truncated to less than size of word in your new framework Uhm, well, no. :-) This could already be handled by faking a 63 bit truncation and using a splitter to expand those into something like this (I only know integer ARM assembly, so I'm making this up): AND R1, R0, #31 MOV R2, R2, SHIFT R1 ANDS R1, R0, #32 MOVNE R2, R2, SHIFT #31 MOVNE R2, R2, SHIFT #1 or ANDS R1, R0, #32 MOVNE R2, R2, SHIFT #-32 SUB R1, R1, R0 ; R1 = (x = 32 ? 32 - x : -x) MOV R2, R2, SHIFT R1 (which requires a scratch register, so it cannot be done postreload... this might be a problem) But my new stuff won't change anything. Paolo
Re: GCC 4.4.0 Status Report (2009-03-13)
NightStrike wrote: On Fri, Mar 13, 2009 at 1:58 PM, Joseph S. Myers jos...@codesourcery.com wrote: Given the SC request we need to stay in Stage 4 rather than trying to work around it. What if GCC went back to stage 3 until the issue is resolved, thus opening the door for a number of stage3-type patches that don't affect 1) licensing and 2) plugin frameworks, but are merely bug fixes which would have long been shaken out by now. No, not at all. The only benefit we're having from this is that GCC 4.4 should be quite stable already in GCC 4.4.0, let's not destroy this one too. Paolo
Re: Dose gcc provide any function to build def-use chain in RTL form
villa gogh wrote: hi now i'm trying to construct def-use chain after the PASS_LEAF_REGS. for the ssa form structure has been destoried during the former passes. I have found that gcc provides a way to build the def-use chain in the PASS_REGRENAME, but it only contains the defs and uses all in one basic block. No, don't look at those. Instead look at fwprop.c which uses use-def chains -- DU chains are the same but they are computed with df_chain_add_problem (DF_DU_CHAIN); instead of df_chain_add_problem (DF_UD_CHAIN); before df_analyze. fwprop accesses use-def chains by using DF_REF_CHAIN (use); def-use chains are the same but the DF_REF_CHAIN macro is used with a def argument instead. Paolo
Re: help for arm avr bfin cris frv h8300 m68k mcore mmix pdp11 rs6000 sh vax
Hans-Peter Nilsson wrote: Date: Fri, 13 Mar 2009 12:34:49 +0100 From: Paolo Bonzini bonz...@gnu.org I would like to know whether for avr,bfin,cris,frv,h8300,pdp11,rs6000 (which define SHIFT_COUNT_TRUNCATED as 0) and for mcore,sh,vax (which do not define it at all) it is right that shift counts are never truncated. The answer to the question is no, but I'd guess the more useful answer is yes, for different definitions of truncate. Ok, after my patches you will be able to teach GCC about this definition of truncate. Paolo
help for arm avr bfin cris frv h8300 m68k mcore mmix pdp11 rs6000 sh vax
These are all the !SHIFT_COUNT_TRUNCATED targets. For 4.5 I would like to improve our RTL canonicalization so that no out-of-range shifts are ever in the RTL representation. This in turn means that the description given by SHIFT_COUNT_TRUNCATED must be exact. Right now !SHIFT_COUNT_TRUNCATED means I don't know, I want it to mean it is never truncated. I would like to know whether for avr,bfin,cris,frv,h8300,pdp11,rs6000 (which define SHIFT_COUNT_TRUNCATED as 0) and for mcore,sh,vax (which do not define it at all) it is right that shift counts are never truncated. In addition, for arm and m68k I'd like to know whether bitfield instructions truncate the bit position the same as shifts (8 bits for arm, 6 bits for m68k). This information is particularly important for targets that do not have a simulator in src. Thanks in advance! Paolo
Re: help for arm avr bfin cris frv h8300 m68k mcore mmix pdp11 rs6000 sh vax
Ian Lance Taylor wrote: Paolo Bonzini bonz...@gnu.org writes: This in turn means that the description given by SHIFT_COUNT_TRUNCATED must be exact. Right now !SHIFT_COUNT_TRUNCATED means I don't know, I want it to mean it is never truncated. You need to do more work to make that happen, as SHIFT_COUNT_TRUNCATED applies to both the shift instructions and the bitfield instructions. On some processors one or the other is truncated; SHIFT_COUNT_TRUNCATED may currently only be set to 1 if both are truncated. (E.g., I believe that m68k truncates shifts but not bitfield instructions.) Yes, I've also split TARGET_SHIFT_TRUNCATION_MASK and TARGET_EXTRACT_TRUNCATION_MASK, but for the latter a conservative default can be used since it's used only in one optimization in combine. [trimmed CC list] Paolo
Re: help for arm avr bfin cris frv h8300 m68k mcore mmix pdp11 rs6000 sh vax
The Blackfin does not truncate shift counts. The documentation specifies that e.g. for Dx = Dy instructions, shift counts greater than 31 produce a result of zero. Other shift instructions use a sign extended part of the shift count to shift either left or right. I don't know is probably the best answer we can give the compiler. In my plan, the truncation of shifts is used to canonicalize RTL created with out of range shift counts. This is useful because such out of range RTL can appear because of unrolling or inlining. Then the answer should be based on this: would a typical C programmer expect a left and a right shift from this: int f(int a) { return 0x4000 a; } int x, y; int main() { x = f(1); y = f(-1); } If the C program above can be reasonably considered undefined with Blackfin, saying shifts are not truncated is okay. This is because the variable left/right shifts can still be described as rtl like (set A (if_then_else (lt B (const_int 0)) (lshiftrt A (minus (const_int 0) B)) (lshift A B))) so that the actual arguments are LSHIFT/LSHIFTRT are positive. Paolo
Re: help for arm avr bfin cris frv h8300 m68k mcore mmix pdp11 rs6000 sh vax
/* Immediate shift counts are truncated by the output routines (or was it the assembler?). Shift counts in a register are truncated by SH. Note that the native compiler puts too large ( 32) immediate shift counts into a register and shifts by the register, letting the SH decide what to do instead of doing that itself. */ /* ??? The library routines in lib1funcs.asm truncate the shift count. However, the SH3 has hardware shifts that do not truncate exactly as gcc expects - the sign bit is significant - so it appears that we need to leave this zero for correct SH3 code. */ So you have that in the RTL stream we should canonicalize a 32 to a, but a (b 31) is not the same as a b? Also, how is the sign bit is significant? Does it determine whether the value is left- or right-shifted? Finally, is SH2A the same as SH3? Thanks! Paolo
Re: help for arm avr bfin cris frv h8300 m68k mcore mmix pdp11 rs6000 sh vax
Hm. In fold-const.c we try to make sure to produce the same result as the target would for constant-folding shifts. Thus, Paolo, I think what fold-const.c does is what we should assume for !SHIFT_COUNT_TRUNCATED. No? Unfortunately it is not so simple. fold-const.c is actually wrong, as witnessed by this program static inline int f (int s) { return 2 s; } int main () { printf (%d\n, f(33)); } which prints 4 at -O0 and 0 at -O2 on i686-pc-linux-gnu. This might mean either that it is easier than I thought (i.e. that all the subtleties of the targets could be ignored), but I want to play it safe and actually take the opportunity to fix the above problem (my current patch does fix it). Paolo
Re: help for arm avr bfin cris frv h8300 m68k mcore mmix pdp11 rs6000 sh vax
Hm. In fold-const.c we try to make sure to produce the same result as the target would for constant-folding shifts. Thus, Paolo, I think what fold-const.c does is what we should assume for !SHIFT_COUNT_TRUNCATED. No? Unfortunately it is not so simple. fold-const.c is actually wrong, as witnessed by this program static inline int f (int s) { return 2 s; } int main () { printf (%d\n, f(33)); } which prints 4 at -O0 and 0 at -O2 on i686-pc-linux-gnu. But this is because i?86 doesn't define SHIFT_COUNT_TRUNCATED, no? Yes, so fold-const.c is *not* modeling the target in this case. But on the other hand, this means we can get by with documenting the effect of a conservative truncation mask: no wrong code bugs, just differences between optimization levels for undefined programs. I'll check that the optimizations done based on the truncation mask are all conservative or can be made so. So, I'd still need the information for arm and m68k, because that information is about the bitfield instructions. For rs6000 it would be nice to see what they do for 64-bits (for 32-bit I know that PowerPCs truncate to 6 bits, not 5). But for the other architectures, we can be conservative. Paolo
Re: help for arm avr bfin cris frv h8300 m68k mcore mmix pdp11 rs6000 sh vax
Note, one thing I encountered when doing the SSE5 work at AMD, is SHIFT_COUNT_TRUNCATED really needs a mode argument (and ideally should be moved into the gcc_target structure). In fact I'm reusing the TARGET_SHIFT_TRUNCATION_MASK element that is already there and accepts a mode. Paolo
Re: Revision 144098 (d.d. Wed Feb 11 08:56:41 2009 UTC (4 weeks ago)) is not a regression bug fix.
1) As multiple people said, it *was* a regression bug fix. It actually fixed two regressions. (That it fixed the second was discovered only after I committed it). I'm sorry that it caused problems for you (even though it's actually lucky for GCC), but I can't help saying that it might have been the other way and it might have improved your weather forecasting app by 10-20% or more, as it did on one or two SPEC benchmarks. 2) I apologize for the bad quality of the patch. But I tested it on bootstrap, SPEC2000, and of course the testcase, and it had no problems. This testing takes more than 24 hours on my machine and it is more than is requested usually--and I did for correctness, not for performance testing. In addition, all but one of the fixes that H.J. made (and for which I have to thank him) were unrecognizable insns due to a misunderstanding of how peephole2 worked; I thought it recognized the produced instructions, instead apparently it's the only optimization in GCC where this does not happen. I have a patch to fix this in GCC 4.5. 3) I look forward to seeing the result of the tests H.J. asked you to do, so that at least we can find which peephole2 is responsible. If you want to revert it now, go ahead. I don't see any problems with that and I can approve the reversal of my patch. I'll try again for 4.5 and propose the patch for 4.5.1. Alternatively, let's use these 48 hours constructively to finish the above test, look at the code and try and understand the cause of the failure. I'll do my part by looking at the code *now*. 4) I would have appreciated being CCed on the message. That said, I run a weather forecasting system 4 times daily to test it out. We can only thank you for this. Please keep the same attitude towards the people that try to improve the compiler. Paolo
Re: Revision 144098 (d.d. Wed Feb 11 08:56:41 2009 UTC (4 weeks ago)) is not a regression bug fix.
In addition, all but one of the fixes that H.J. made (and for which I have to thank him) were unrecognizable insns due to a misunderstanding of how peephole2 worked I stand corrected; *all* of the fixes. The patch hadn't had a correctness problem until your message, only ice-on-valids. This does not make the patch better, but I like to set things straight. Paolo
Re: Revision 144098 (d.d. Wed Feb 11 08:56:41 2009 UTC (4 weeks ago)) is not a regression bug fix.
Toon Moene wrote: H.J. Lu wrote: If you can provide a testcase, I can take a look. If it isn't easy to find a testcase, please disable the second pattern: (define_peephole2 [(set (match_operand 0 register_operand ) (match_operand 1 register_operand )) (set (match_dup 0) (match_operator 3 commutative_operator [(match_dup 0) (match_operand 2 memory_operand )]))] operands[0] != operands[1] ((MMX_REG_P (operands[0]) MMX_REG_P (operands[1])) || (SSE_REG_P (operands[0]) SSE_REG_P (operands[1]))) [(set (match_dup 0) (match_dup 2)) (set (match_dup 0) (match_op_dup 3 [(match_dup 0) (match_dup 1)]))] ) to see if it makes a difference. Thanks. Test case is hard, but this is easy to try. Expect an answer from me tomorrow (e.g. 12 UTC). In case it does *not* make a difference, please try this patch: Index: config/i386/i386.md === --- config/i386/i386.md (revision 144464) +++ config/i386/i386.md (working copy) @@ -20788,12 +20788,12 @@ ;; refers to the destination of the load! (define_peephole2 - [(set (match_operand:SI 0 register_operand ) -(match_operand:SI 1 register_operand )) + [(set (match_operand:P 0 register_operand ) +(match_operand:P 1 register_operand )) (parallel [(set (match_dup 0) - (match_operator:SI 3 commutative_operator + (match_operator:P 3 commutative_operator [(match_dup 0) - (match_operand:SI 2 memory_operand )])) + (match_operand:P 2 memory_operand )])) (clobber (reg:CC FLAGS_REG))])] operands[0] != operands[1] GENERAL_REGNO_P (REGNO (operands[0])) Thanks, Paolo
Re: Revision 144098 (d.d. Wed Feb 11 08:56:41 2009 UTC (4 weeks ago)) is not a regression bug fix.
Attached you'll find the (preprocessed) source of the routine that printed the Infinity's (of course, I cannot be completely certain that it actually resulted in the wrong code, but at least it might be studied to see if it helps to find the culprit). No, this function is sane (the peephole *is* called a lot by this function, but all is in due order). I looked at the dumps and assembly for -O2, -O3 and -O3 -fno-schedule-insns (*), and all is as expected. Interestingly enough, you *should* expect a speedup when this is resolved... The next guess then is that the RHXU and RHYV arrays are wrong. From these, ZHXY is computed, and ZHXY is multiplied into each of the outputs. Can you send the routine that computes those, or is it too big? Paolo (*) it would have helped to know the compilation flags and target, of course.
Re: Revision 144098 (d.d. Wed Feb 11 08:56:41 2009 UTC (4 weeks ago)) is not a regression bug fix.
Toon Moene wrote: Paolo Bonzini wrote: Attached you'll find the (preprocessed) source of the routine that printed the Infinity's (of course, I cannot be completely certain that it actually resulted in the wrong code, but at least it might be studied to see if it helps to find the culprit). No, this function is sane (the peephole *is* called a lot by this function, but all is in due order). I looked at the dumps and assembly for -O2, -O3 and -O3 -fno-schedule-insns (*), and all is as expected. Yeah, it was probably too much to hope for. No, you were right, and that's great. -ffast-math makes a difference, because it enables more vectorization. It goes as this: (insn 494 493 495 44 statin.f:703 (set (reg:SF 371) (vec_select:SF (reg:V4SF 367) (parallel [ (const_int 0 [0x0]) ]))) 1408 {*vec_extractv4sf_0} (expr_list:REG_DEAD (reg:V4SF 367) (nil))) registers 371 and 367 are coalesced into xmm0. Then the vec_select is split to just (set (reg:SF 21 [orig: 371]) (reg:SF 21 [orig: 367])) and these are indeed !=, but they have the same hard register number so the peephole should not apply in this case. Here is a minimized testcase: subroutine statin(x,y,pstratr,pconvecr,zhxy,zhxhy,ztmp) integer :: x,y real pstratr(x,y),pconvecr(x,y),zhxy(x,y) real ztmp(4) do j = 1,y do i = 1,x-2 zttotrainr = zttotrainr + (pstratr(i,j) + pconvecr(i,j))*zhxy(i,j) ztstratr = ztstratr + pstratr(i,j) ztconvecr = ztconvecr + pconvecr(i,j) ztsenf = ztsenf + zhxy(i,j) ztlatf = ztlatf + zhxy(i,j) ztcldtop = ztcldtop + zhxy(i,j) enddo enddo ztmp(1)=zttotrainr ztmp(2)=ztstratr ztmp(3)=ztconvecr ztmp(4)=ztsenf*ztlatf*ztcldtop end The following patch should fix it, you're welcome to run it through HIRLAM. I'm bootstrapping it in the meanwhile. Index: gcc/config/i386/i386.md === --- gcc/config/i386/i386.md (revision 144464) +++ gcc/config/i386/i386.md (working copy) @@ -20795,7 +20795,7 @@ [(match_dup 0) (match_operand:SI 2 memory_operand )])) (clobber (reg:CC FLAGS_REG))])] - operands[0] != operands[1] + !rtx_equal_p (operands[0], operands[1]) GENERAL_REGNO_P (REGNO (operands[0])) GENERAL_REGNO_P (REGNO (operands[1])) [(set (match_dup 0) (match_dup 4)) @@ -20811,7 +20811,7 @@ (match_operator 3 commutative_operator [(match_dup 0) (match_operand 2 memory_operand )]))] - operands[0] != operands[1] + !rtx_equal_p (operands[0], operands[1]) ((MMX_REG_P (operands[0]) MMX_REG_P (operands[1])) || (SSE_REG_P (operands[0]) SSE_REG_P (operands[1]))) [(set (match_dup 0) (match_dup 2)) Paolo
Re: Revision 144098 (d.d. Wed Feb 11 08:56:41 2009 UTC (4 weeks ago)) is not a regression bug fix.
Will REGNO (operands[0]) == REGNO (operands[1]) work here? Yes. I wanted to be conservative in case one day subregs or who knows what are allowed. I'll defer to maintainers or other people (Steven?), either way is fine by me. Paolo
cond-optab patch series
Hi, I'll be posting soon a series of patches labeled [cond-optab]. The aim of the series is to have all ports use cbranch+cstore+cmov optabs instead of cmp/bcc/scc/movcc. As a starter, the first patches I'll post will be cleaning up and centralizing the generation of cmp, scc and bcc opcodes. The reasons are as follows: 1) more maintainability, less code duplication. The preliminary series I'll send remove 2 lines for each added line 2) more flexibility in RTL generation of jumps. As a result... 3) less md-code, more machine independent code. ability to make all the branch selection code written for i386 work on ARM and SPARC too. Unless there is demand, I don't plan to put this on a branch. Reviews and bootstraps are welcome though. Paolo
Re: No address_cost calls when inlining ?
I want the version of foo because the store with an address as destination is costly on my architecture, which is why I defined TARGET_ADDRESS_COST and added a cost when I get this scenario. However, in the compilation of this code, it seems that, when the function is inlined, the address_cost function does not seem to be called anymore. Any ideas why ? This is (a variant of) PR33699. Paolo
Re: [patch][4.5] Make regmove cfglayout-safe
Paolo Bonzini wrote: I also wondered about this. I think the original idea is that splits can call into dojump.c. A more likely possibility is -fnon-call-exceptions. Of course this is the main cause. But splitting one jump to multiple jumps is supported and actually even documented. It will happen for example in this testcase: int f(float x) { if (x != x) return 5; else abort (); } on i386 which produces fucomip %st(0), %st jp .L8 je .L6 It is possible to change this to an expander in the i386 md of course. I don't think any other backend is relying on it, but I will make a more thorough check if I end up submitting something like the attached patch. Paolo 2009-03-10 Paolo Bonzini bonz...@gnu.org * lower-subreg.c (decompose_multiword_subregs): Extract code... * cfgbuild.c (rtl_split_blocks_for_eh): ... here. * basic-block.h (rtl_split_blocks_for_eh): Declare it. * recog.c (split_insn): Return bool. Check that the splitter produces no barriers and no labels. (split_all_insns): Use the result. Call rtl_split_blocks_for_eh instead of find_many_sub_basic_blocks. * reload1.c (fixup_abnormal_edges): Use it. * passes.c (init_optimization_passes): Move cfglayout mode further down. Index: gcc/passes.c === --- gcc/passes.c(branch combine-cfglayout) +++ gcc/passes.c(working copy) @@ -757,8 +757,8 @@ init_optimization_passes (void) NEXT_PASS (pass_if_after_combine); NEXT_PASS (pass_partition_blocks); NEXT_PASS (pass_regmove); - NEXT_PASS (pass_outof_cfg_layout_mode); NEXT_PASS (pass_split_all_insns); + NEXT_PASS (pass_outof_cfg_layout_mode); NEXT_PASS (pass_lower_subreg2); NEXT_PASS (pass_df_initialize_no_opt); NEXT_PASS (pass_stack_ptr_mod); Index: gcc/recog.c === --- gcc/recog.c (branch combine-cfglayout) +++ gcc/recog.c (working copy) @@ -29,6 +29,7 @@ along with GCC; see the file COPYING3. #include insn-config.h #include insn-attr.h #include hard-reg-set.h +#include except.h #include recog.h #include regs.h #include addresses.h @@ -71,7 +72,6 @@ get_attr_enabled (rtx insn ATTRIBUTE_UNU static void validate_replace_rtx_1 (rtx *, rtx, rtx, rtx, bool); static void validate_replace_src_1 (rtx *, void *); -static rtx split_insn (rtx); /* Nonzero means allow operands to be volatile. This should be 0 if you are generating rtl, such as if you are calling @@ -2671,19 +2671,23 @@ reg_fits_class_p (rtx operand, enum reg_ } /* Split single instruction. Helper function for split_all_insns and - split_all_insns_noflow. Return last insn in the sequence if successful, - or NULL if unsuccessful. */ + split_all_insns_noflow. Return whether new control flow insns + were added. */ -static rtx +static bool split_insn (rtx insn) { /* Split insns here to get max fine-grain parallelism. */ rtx first = PREV_INSN (insn); rtx last = try_split (PATTERN (insn), insn, 1); rtx insn_set, last_set, note; + bool new_cfi = false; + bool was_cfi; if (last == insn) -return NULL_RTX; +return false; + + was_cfi = control_flow_insn_p (insn); /* If the original instruction was a single set that was known to be equivalent to a constant, see if we can say the same about the last @@ -2706,22 +2710,25 @@ split_insn (rtx insn) /* try_split returns the NOTE that INSN became. */ SET_INSN_DELETED (insn); - /* ??? Coddle to md files that generate subregs in post-reload - splitters instead of computing the proper hard register. */ - if (reload_completed first != last) + while (first != last) { first = NEXT_INSN (first); - for (;;) + gcc_assert (!BARRIER_P (first) !LABEL_P (first)); + + /* ??? Coddle to md files that generate subregs in post-reload + splitters instead of computing the proper hard register. */ + if (reload_completed INSN_P (first)) + cleanup_subreg_operands (first); + if ((first != last || !was_cfi) + control_flow_insn_p (first)) { - if (INSN_P (first)) - cleanup_subreg_operands (first); - if (first == last) - break; - first = NEXT_INSN (first); + gcc_assert (flag_non_call_exceptions + can_throw_internal (first)); + new_cfi = true; } } - return last; + return new_cfi; } /* Split all insns in the function. If UPD_LIFE, update life info after. */ @@ -2730,12 +2737,10 @@ void split_all_insns (void) { sbitmap blocks; - bool changed; basic_block bb; blocks = sbitmap_alloc (last_basic_block); sbitmap_zero (blocks); - changed = false; FOR_EACH_BB_REVERSE (bb) { @@ -2753,41 +2758,17 @@ split_all_insns (void
Re: -mfpmath=sse,387 is experimental ?
Timothy Madden wrote: Hello Is -mfpmath=both for i386 and x86-64 still experimental in gcc 4.3, as the in the online manual page ? Yes. It might (*might*) be better in GCC 4.4 thanks to the new register allocator, but it's unlikely that the manual page will be changed before the release. Paolo
Re: GCC-only software
Well, the problem is that I don't know where to find the unofficial documentation, so it is hard to figure out the questions to be asked. Well, the unofficial documentation is the source code. :- Paolo
Re: Setting -frounding-math by default
Sylvain Pion wrote: Andrew Thomas Pinski wrote: The fact is that Roger's patch introduced a regression (this word should be clear enough here), in that some users now have their old code broken, and they are forced to add the -frounding-math option (after having lost some time finding about this non trivial issue). This is a long term hindrance. Actually before roger's patch the default is the same. Just there was no way to turn it off. Actually, there are 2 things controlled by -frounding-math : 1) constant propagation of FP operations 2) generic transformations like (-a)*b - -(a*b) I think 2) is taken care of by -fassociative-math, or it should at least. Paolo
Re: Setting -frounding-math by default
I think 2) is taken care of by -fassociative-math, or it should at least. I don't think it is (I haven't checked), and I don't see why it should. This transformation has nothing to do with associativity : unless I'm mistaken, it is always valid when rounding is to the nearest or towards zero. (-a) * b = -(a * b) is definitely reassociation (-a is -1 * a); no reassociation has to be valid in any rounding mode, which means two things: 1) it can be done even when other rounding-mode-dependent optimizations are disabled via flag_rounding_math (good); 2) it would also enable other optimization that you might not want (bad). Paolo
Re: bitwise dataflow
1. Dataflow framework to propagate bitwise register properties. (Integrated with the current dataflow framework.) 2. Forward bitwise dataflow analysis: constant bit propagation. 3. Backward bitwise dataflow analysis: dead bit propagation. 4. Target applications: improve dce and see. (Others?) For each instruction I in the function body For each register R in instruction I def_constant_bits(I, R) = collect constants from AND/OR/... operations. There's already nonzero_bits (i.e. maybe nonzero) and num_sign_bit_copies in rtlanal.c. You can add to this one_bits and it should be enough to do the simplifications you want. You can get initial info from those routines, do the dataflow. Then there are rtx_hooks members to get a REG's nonzero bits/# of sign bit copies (and you can add one for one_bits): just set them to a function in your pass that returns info from the dataflow. Then you can walk through all the functions, recursively simplifying the RHS of each set (you can look at propagate_rtx in fwprop.c for an example of simplifying the RHS). The code in simplify-rtx.c will take care of using the nonzero_bits (et al.) information; other optimizations can be added there. Do not forget to check the cost of the replacement, otherwise your pass might end up doing constant propagation (for constants, all zero and one bits are known!!!). Paolo
Re: New no-undefined-overflow branch
So while trapping variants can certainly be introduced it looks like this task may be more difficult. I don't think you need to introduce trapping tree codes. You can introduce them directly in the front-end as s = x +nv y (((s ^ x) (s ^ y)) 0) ? trap () : s d = x -nv y (((d ^ x) (x ^ y)) 0) ? trap () : d (b == INT_MIN ? trap () : -nv b) (int)((long long) a * (long long) b) == a *nv b ? trap () : a *nv b Making sure they are compiled efficiently is another story, but especially for the sake of LTO I think this is the way to go. Paolo
Re: New no-undefined-overflow branch
Richard Guenther wrote: On Fri, Mar 6, 2009 at 3:29 PM, Paolo Bonzini bonz...@gnu.org wrote: So while trapping variants can certainly be introduced it looks like this task may be more difficult. I don't think you need to introduce trapping tree codes. You can introduce them directly in the front-end as s = x +nv y I think this should be s = x + y (((s ^ x) (s ^ y)) 0) ? trap () : s otherwise the compiler can assume that for the following check the addition did not overflow. Ah yeah I've not yet looked at the patches and I did not know which one was which. I actually wrote x + y first and then went back to carefully check them. :-P Making sure they are compiled efficiently is another story, but especially for the sake of LTO I think this is the way to go. I agree. Btw, for the addition case we generate leal(%rsi,%rdi), %eax xorl%eax, %esi xorl%eax, %edi testl %edi, %esi jns .L2 .value 0x0b0f .L2: rep ret which isn't too bad. Well, for x86 it requires the addends to die. This is unfortunately four insns, and combine has a limit of three. but maybe you could make combine recognize the check and turn it to an addv pattern (with the add result unused!); and then CSE or maybe combine as well would, well, eliminate the duplicate ADD... If this does not work, on ARM you can also hope for something like this: ADDR0, R1, R2 XORS R0, R2, R3 XORSMI R1, R2, R3 SWIMI #trap But hey, whatever you get, it's anyway faster than a libcall. :-) Of course there are better choices for x+CONSTANT; using (b == INT_MIN ? trap () : -b) for negation is one example. Paolo
Re: New no-undefined-overflow branch
Joseph S. Myers wrote: On Fri, 6 Mar 2009, Paolo Bonzini wrote: I don't think you need to introduce trapping tree codes. You can introduce them directly in the front-end as Multiple front ends want the same thing. This is why it would be better to introduce the codes in GENERIC and have the language-independent gimplifier contain the code to lower them, even if they don't become part of GIMPLE. I see your point. What I'm worried of, is that this codes would be tested more lightly and, until folding is a middle-end thing only, the risk of unwanted optimization on -ftrapv code would be high. You can have common code shared by front-ends. They could apply it at GENERICization time (Fortran, Ada) or directly while parsing (C, C++). (int)((long long) a * (long long) b) == a *nv b ? trap () : a *nv b This is not a solution for trapping multiplication in the widest supported type. There's always range checking, I was pointing out optimization possibilities; the above one can be optimized like (h,l) = a*b if (h != l 31) trap ();// signed shift Paolo
Re: New no-undefined-overflow branch
Joseph S. Myers wrote: On Fri, 6 Mar 2009, Geert Bosch wrote: this task may be more difficult. So lowering them early during gimplification looks like a more reasonable plan IMHO. Right, that was my intention. Still, I'll need to add code to handle the new tree codes in fold(), right? If you add new trapping codes to GENERIC I'd recommend *not* making fold() handle them. Constant folding should be done for them, though. Either lower the codes in gimplification, or handle them explicitly in a few GIMPLE optimizations e.g. when constants are propagated in, but avoid general folding for them. Definitely the former. Paolo
Re: New no-undefined-overflow branch
If this does not work, on ARM you can also hope for something like this: ADDR0, R1, R2 XORS R0, R2, R3 XORSMI R1, R2, R3 SWIMI #trap On ARM you can just check for overflow directly... ADDSR0, R1, R2 SWIVS #trap Of course, I was thinking explicitly of what happens with no MD support. Paolo
Re: __builtin_return_address for ARM
Uwe Kleine-König wrote: Hello, currently[1] __builtin_return_address for ARM only works with level == 0. For ftrace in the linux kernel it would be great to implement that for level 0 (provided that framepointers or unwind information are available of course). On the linux-arm-kernel ML Mikael Pettersson[2] said that __builtin_return_address(N) where N0 should never have been introduced into gcc.. Is that the general view for __builtin_return_address or would a patch be accepted? My personal opinion is that Mikael Pettersson is right, but since the damage is done why not extend it to more architectures. I am not an ARM maintainer though. Paolo
Re: Native support for vector shift
Currently, we have to use intrinsics to support such shift. Isn't syntax of vector shift intuitive enough to be supported natively? Someone may argue it breaks the C language. But vector is a GCC extension anyway. Support for vector add/sub/etc already break C syntax. Any thought? Sorry if this issue had been raised in past. I see no reason why this could not be added provided that it is 1) adequately documented 2) implemented when not supported in hardware too (tree-ssa-vect-generic.c) 3) possibly implemented for both C and C++. Regarding 2, note that this V4H tst(V4H a, V4H b){ return a b; } would have to be emulated on all x86 targets prior to SSE5. Another much desired feature would be OpenCL C-style masking and swizzling. Paolo
Re: Native support for vector shift
It shouldn't be too hard to add the support. I suspect the person who did the initial support may have been on a machine without vector shifts. Nope, because it was originally done by Aldy who did the VMX support which had vector shifts. OTOH the support for vector lowering was weaker than now in 3.x and it was harder to add more lowering. It shouldn't be very hard now to add all kinds of shifts and also auto-splatting. Paolo
Re: libiberty testsuite builds with wrong compiler
Jack Howarth wrote: The same issue in the libiberty testsuite run can be seen with the Apple regress server log at http://gcc.gnu.org/regtest/HEAD/native-lastbuild.txt.gzip. If you search for test-demangle, you will find... I'm sure there is a bugzilla entry for that. Paolo
Re: About strict-aliasing warning
-Wstrict-aliasing This option is only active when -fstrict-aliasing is active. It warns about code which might break the strict aliasing rules that the compiler is using for optimization. The warning does not catch all cases, but does attempt to catch the more common pitfalls. It is included in -Wall. It is equivalent to -Wstrict-aliasing=3 and -O2 would active -fstrict-aliasing by default, which should also active this options. No, the text above means that -fstrict-aliasing is a *necessary* condition to get aliasing warnings, not a sufficient condition. Do you have suggestions for how to clarify the text? Paolo
Re: IRA conflict graph alternative selection
Jeff Law wrote: We'd want to encode [early insn alternative selection] information in the conflict graph so that IRA would allocate registers so as to fit the constraints of the early insn alternative selection. Right? In the case where the graph is uncolorable, do we allow IRA to override the alternative selection, or do we insert copies to simplify the conflict graph or some mixture of both? Inserting compensation code, for example copies, can be seen as some kind of pre-reload as it was used on new-ra branch; the problem with pre-reload was that it was built on cp reload1.c pre-reload.c, so it was not much less complicated than reload. Paolo
Re: proposal for improved management bugzilla priorities/release criteria
However, I don't agree that P2 regressions aren't a factor. If we have a ton of crashing on wrong-code, etc., regressions that adds up to a release that won't work well for people. In which case the important ones should be P1 ... No, that misses the point. A mass of bugs, each itself not too critical, can still make a release that is of substandard quality. Think of the integral of perceived quality over the intended user-base. Yes, that was the meaning more-or-less of my 50 P2 criteria. Still, I would like to hear an opinion on what to do with regard to long standing bugs that are clearly not going to be fixed in stage3/4. This was the main point of my message. Paolo
Re: possible buffer overflow in calls.c?
Assuming you have a copyright assignment, just send a patch to gcc-patches with the explanation. This is code which will never be used for any popular target. The patch is probably small enough that it does not require assignment, given the description in his original message. Paolo
Re: Difference between vec_shl_vector_mode and ashlvector_mode3
Bingfeng Mei wrote: Hello, Could anyone explain to me what is difference between vec_shl_vector_mode and ashlvector_mode3 patterns? It seems to me that both shift a vector operand 1 with scalar operand 2. I tried to understand some targets' implemenation, e.g., ia64 as follows, and cannot grasp their difference. Does the whole vector shift of vec_shl means treating a vector as a long scalar? Thanks in advance. vec_shl_mode is indeed treating a vector as a long scalar, while lshrmode3 is for SIMD shifts. Only for shifts, the second argument can be an integer mode specifying that the shift count has to be the same for all SIMD elements. Notice that in the vec_shr_mode you pasted, the shift is carried out in DImode [(set (match_operand:VECINT 0 gr_register_operand ) (lshiftrt:DI (match_operand:VECINT 1 gr_register_operand ) (match_operand:DI 2 gr_reg_or_6bit_operand )))] while in the lshrmode3 it is carried out in the vector mode: [(set (match_operand:VECINT24 0 gr_register_operand =r) (lshiftrt:VECINT24 (match_operand:VECINT24 1 gr_register_operand r) (match_operand:DI 2 gr_reg_or_5bit_operand rn)))] Paolo
Re: proposal for improved management bugzilla priorities/release criteria
- The more conservative one is to use more aggressively the release milestone field. Hard-to-fix bugs would be left as P2, but bumped to the next major release at the beginning of stage 3. Advantages: no need for churn in the bug database---very easy to implement Disadvantages: the milestone field is not visible in search lists (maybe this can be changed)? I think using the milestone will get us more confused only. We already have the issue that what we make a blocker (P1) for 4.4 is not a blocker for, say, 4.3.4. Unless we want to start duplicating bugs for each open branch I'd rather not touch our target milestone policy. Right now the target milestone is useless, as it could be computed algorithmically: [4.2/4.3/4.4 regression] - milestone is next 4.2 release [4.3/4.4 regression] - milestone is next 4.3 release [4.4 regression] - milestone is next 4.4 release else - milestone not used The situation that you mentioned (P1 for 4.4 but not for 4.3.4) would be handled by having [4.3/4.4 regression] with milestone of 4.4. This is why I think the more aggressive usage of the milestone field would be advantageous. I think the only reasonable release criteria is zero P1 regressions over some period. 50 P2 regressions doesn't make a release blocker, neither is 49 P2 regressions a clear sign for a non-blocked release. I agree. Paolo
Re: Constant folding and Constant propagation
Jean Christophe Beyler wrote: Ok, thanks for all this information and if you can dig that up it would be nice too. I'll start looking at that patch and PR33699 to see if I can adapt them to my needs. Here it is. Paolo /* Copy propagation on RTL for GNU compiler. Copyright (C) 2006 Free Software Foundation, Inc. This file is part of GCC. GCC is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. GCC is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with GCC; see the file COPYING. If not, write to the Free Software Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. */ #include config.h #include system.h #include coretypes.h #include tm.h #include rtl.h #include obstack.h #include basic-block.h #include insn-config.h #include recog.h #include alloc-pool.h #include timevar.h #include tree-pass.h /* The basic idea is to keep a table of registers with the same value, and replace expressions encountered so that they use the equivalent register. We do this processing on extended basic blocks. Note this can turn a conditional or computed jump into a nop or an unconditional jump. This is left to be cleaned up by CSE, for now. At the start of each basic block, an assignment places a register in a distinct group number. During scan, when the code copies one register (or a related expression, see below) into another, we copy the quantity number. When a register is loaded in any other way, we allocate a new quantity number to describe the value generated by this operation. `reg_group[regno].num' records what quantity register REGNO is currently thought of as containing. Other expressions: A CLOBBER rtx in an instruction invalidates its operand for further reuse. Related expressions: Registers that differ only by an additive integer are called related. Related registers share the same quantity number. */ /* Per-group information tracking. */ struct group_table_elem { rtxreg; HOST_WIDE_INT delta; /* Basic block in which it was defined. */ basic_blockbb; struct group_table_elem *prev, *next; }; /* Per-register data, including register-group mapping. */ struct reg_data { /* Current group. */ struct group_table_elem *entry; /* Pointer into the table of group heads, indexed by register number. Always 0 for unknown values. */ int num; }; /* Length of group_head vector. */ static int max_group; /* Next quantity number to be allocated. This is 1 + the largest number needed so far. */ static int next_group; /* The table of all groups, indexed by group number. */ static struct group_table_elem **group_head; /* The table of all pseudos, indexed by regno. */ static struct reg_data *reg_group; /* Allocation pool. */ static alloc_pool group_elem_pool; /* Basic block being processed. */ static basic_block current_bb; /* We store the state of each basic block (not visited, part of current EBB, finished) in its AUX field. These two functions return the state. */ static inline bool bb_visited (basic_block bb) { return bb-aux != 0; } static inline bool bb_active (basic_block bb) { return bb-aux == (void *) 1L; } /* Routines to manage the data structures of this pass. */ /* Remove the register DEST from the equivalence tables. */ static struct group_table_elem * remove_from_group (rtx dest) { struct group_table_elem *ent = reg_group[REGNO (dest)].entry; int num = reg_group[REGNO (dest)].num; reg_group[REGNO (dest)].num = 0; reg_group[REGNO (dest)].entry = NULL; gcc_assert (ent); if (ent-prev) ent-prev-next = ent-next; else group_head[num] = ent-next; if (ent-next) ent-next-prev = ent-prev; ent-prev = ent-next = NULL; return ent; } /* Check if the entry for register REG in the equivalence tables is up to date. Remove it and return NULL if it is not. Otherwise, return the entry, or NULL if it is absent. */ static inline struct group_table_elem * reg_group_entry (rtx reg) { struct group_table_elem *ent = reg_group[REGNO (reg)].entry; if (ent !bb_active (ent-bb)) { remove_from_group (reg); return NULL; } else return ent; } /* Return the head of the equivalence class for REG (removing stale entries). */ static struct group_table_elem * reg_group_head (rtx reg) { int num = reg_group[REGNO (reg)].num; struct group_table_elem *canon_ent = group_head[num]; while (canon_ent !bb_active (canon_ent-bb)) { remove_from_group
Re: Constant folding and Constant propagation
Steven Bosscher wrote: On Fri, Feb 6, 2009 at 7:32 PM, Adam Nemet ane...@caviumnetworks.com wrote: I think you really need the Joern's optmize_related_values patch. Also see PR33699. I wouldn't recommend that patch, but yes: Something that performs that optimization ;-) Yes, something doing that using LCM was on my list of things to do after fwprop to do what CSE currently does, but better. I never got round to implementing it, I think. I had an LCM-based replacement for canon_reg that I wanted to use as a basis, maybe I can dig it up. Paolo
proposal for improved management bugzilla priorities/release criteria
The current system for managing bugzilla priorities has a major problem, in that it does not identify bugs that reasonably cannot be fixed before the release. The current set of priorities is in practice like this: - P1: most wrong code bugs, and other absolutely blocking problems - P2: problems worth a look on important platforms - P3: uncategorized - P4: problems worth a look on less important platforms - P5: other The problem with this set is that while P1 bugs will absolutely be fixed before the release (and backported usually), P2 bugs are a one-catch-all group for everything else that's worth looking at. It is impossible to distinguish stuff that will probably be fixed before the release (and presumably backported to all branches), and what instead requires new stage1/stage2 material. As a result, the release criteria we have are not really a measure of the quality of the release, and especially are not really a measure of the work being done towards a release. I propose two solutions to this problem. - The more conservative one is to use more aggressively the release milestone field. Hard-to-fix bugs would be left as P2, but bumped to the next major release at the beginning of stage 3. Advantages: no need for churn in the bug database---very easy to implement Disadvantages: the milestone field is not visible in search lists (maybe this can be changed)? - Alternatively, we could add a new priority P-- for uncategorized bugs, and split P2/P3 like this: P2 bugs will be fixed in stage 3/4, P3 bugs will most likely be postponed to stage 1/2. Advantages: quicker impression from the bug searches, especially during bug triage Disadvantages: need to rethink bugzilla queries I think any of these two approaches would provide a serious added value to judging a release quality. Meeting the release criteria (no more than 50 P2 regressions) in the past included the release managers downgrading bugs from P2 to P4, which is in my opinion cheating. In the proposed scheme, this would be less necessary, because the release criteria could take into account a broader view, such as respectively for the two approaches: - At most 60 P2 regressions, of which at most 15 should have release milestone 4.4.0. - No more than 15 P2 regressions and 45 P3 regressions. Any opinions? Paolo
Re: GCC OpenCL ?
I am just starting to think about adding OpenCL support into future versions of GCC, as it looks like a useful way of programming highly parallel type systems, particularly with hetrogeneous processors. At this point, I am wondering what kind of interest people have in working together on OpenCL in the GCC compiler? I might be working on parallelization (though in LLVM) for the next one or two years. If I have some free time to put into GCC, I'd love to port my work to it and to collaborate with people already working on OpenCL. Off hand, I think the first stage is to get OpenCL to work in a homogeneous multi-core system before diving into the hetrogeneous systems. Yes, also because for example we have no access to the GPUs' instruction set. These papers details an experience in porting CUDA (the predecessor to OpenCL) to multicore systems: http://www.gigascale.org/pubs/1278.html http://www.gigascale.org/pubs/1417.html http://impact.crhc.illinois.edu/mcuda.php Paolo
Re: GCC OpenCL ?
Although the OpenCL infrastructure doesn't confine itself to it, this compute-on-the-graphic-processor type of parallellism mostly concerns itself with let's do the FFT (or DGEMM) really fast on this processor and then return to the user. Not really, it's not about FFT/DGEMM only -- the parallel stuff can be expressed in a high-level language, and the communication cost is actually something you have to consider seriously. If it isn't (surely not for us meteorology types) this approach is of limited use. I'm pretty sure you meteorology guys can benefit quite from it. Paolo
Re: New GCC Runtime Library Exception: not fit for purpose
Joern Rennecke wrote: Quoting Ian Lance Taylor i...@google.com: I'm not sure what your point is here. newlib is not under the GPL in any case. It is not affected by the gcc runtime library license. The old runtime library exception allowed you to distribute binaries that both include pieces of the gcc runtime and arbitrary pieces of newlib, without requiring the distribution to be under the terms of the GPL. I.e. your could link non-GPL code against both the gcc runtime and newlib and distribute it. The new license does not allow this unless all parts included from newlib are written in a high level language AND use the gcc runtime. If they do not use the GCC runtime, why should those parts be affected by the GCC runtime license? If anything, the loophole in the exception is that, if you rewrite libgcc, then you can use a non-eligible compilation process and still distribute the result under a proprietary license. Paolo
Re: sizeof in initializer expression not working as expected
Bruce Korb wrote: Hi, I was trying to figure out how come a memory allocation was short. I think I've stumbled onto the issue. evt_t is a 48 byte structure and tpd_uptr is a uintptr_t. sz initializes to 52 (decimal). The value would be correct if I were not trying to multiply the size of the pointer by 4. The result should be 64. I think all you can do is the usual preprocessed testcase submission to bugzilla. Paolo
Re: x86-64 and large code model questions/bugs
He'll get much better code by putting the program into a -fPIC .so, loading it from a small stub and then unmap the stub. large model generates really very bad code because all jumps will be indirect. Is it also true with -fpie? Paolo
Re: x86-64 and large code model questions/bugs
Andi Kleen wrote: On Wed, Jan 28, 2009 at 09:39:39AM +0100, Paolo Bonzini wrote: He'll get much better code by putting the program into a -fPIC .so, loading it from a small stub and then unmap the stub. large model generates really very bad code because all jumps will be indirect. Is it also true with -fpie? Not sure what you mean? Right, sorry. I meant can you also use -fpie and use a linker script to relocate the text section, if you want to place it high but the code is not gigantic? AFAIK -fpie code is the same quality as -fPIC. Both are much much better than large model. Exactly. Paolo
Re: Serious code generation/optimisation bug (I think)
James Dennett wrote: On Mon, Jan 26, 2009 at 11:52 PM, zol...@bendor.com.au wrote: I was debugging a function and by inserting the debug statement crashed the system. Some investigation revealed that gcc 4.3.2 arm-eabi (compiled from sources) with -O2 under some circumstances assumes that if a pointer is dereferenced, it can not be NULL therefore explicite tests against NULL can be later eliminated. That's an optimization permitted by the language standard, but possibly unhelpful on your particular target. Not really, he's just using the sloppiness allowed by MMU-less targets, but he doesn't care about the value passed to Debug if tst == NULL. However, -fno-delete-null-pointer-checks will do. Paolo
Re: Serious code generation/optimisation bug (I think)
However, -fno-delete-null-pointer-checks will do. Not for PTA though ;) Care to expand? Paolo
Re: Serious code generation/optimisation bug (I think)
Not for PTA though ;) Care to expand? PTA tracks points-to-NULL as pointing to nothing. This probably should be conditional on -fdelete-null-pointer-checks. Otherwise *NULL and *anything won't alias. Yes, you're right. I'll see if I can construct a testcase and a patch. BTW, I was thinking of not doing the optimization anyway on volatile pointers. What do you think? Paolo
Re: Serious code generation/optimisation bug (I think)
Richard Guenther wrote: On Tue, Jan 27, 2009 at 11:35 AM, Paolo Bonzini bonz...@gnu.org wrote: Not for PTA though ;) Care to expand? PTA tracks points-to-NULL as pointing to nothing. This probably should be conditional on -fdelete-null-pointer-checks. Otherwise *NULL and *anything won't alias. Yes, you're right. I'll see if I can construct a testcase and a patch. Thanks. It is now PR38984. Andrew's point about -fnon-call-exceptions is also worth pondering (and that's an understatement). BTW, I was thinking of not doing the optimization anyway on volatile pointers. What do you think? It should be taken care of automatically by loading it from memory before each dereference, no? In this case, I meant more specifically the NULL test optimization that the OP stumbled in. That would be a GCC extension, not taking advantage for volatile pointers of the undefinedness that the standard guarantees. Paolo
Re: A question about SRA and out-of-bounds array accesses
int * x(void) { register int *a asm(unknown_register); /* { dg-error invalid register } */ int *v[1] = {a}; return v[1]; } I think simply scalarizing for the above testcase is ok - the behavior is undefined anyway. What about moving the error to the frontend? Paolo
Re: Feature request concerning opcodes in the function prolog
movl.s %edi, %edi pushl %ebp movl.s %esp, %ebp Have you thought about making .s an assembler command-line flag, so that this flag could be passed automatically by the compiler under mingw? Paolo
Re: Feature request concerning opcodes in the function prolog
For my purposes it is not really suitable, because we have to make sure that the push %ebp and mov %esp, %ebp are there, no matter what the compiler arguments are(-fomit-frame-pointer). So just adding the mov %edi, %edi isn't enough, and while I'm at it I can add the .s to the insns anyway. (see the archives for more details) Yes, I mentioned the commandline option because you talked about 31-c0 vs. 33-c0 for xor %eax, %eax somewhere else in the thread. Paolo
Re: libmudflap and emutls question
Which version of gcc did you use? gcc 4.1 (maybe and 4.2) will report error. But gcc 4.3 compiles OK. I tested using x86_64 native gcc from Debian unstable. __emutls_get_address is defined in libgcc even the target has real TLS. Uff... not my day. I used 4.2 (emutls was posted in 4.2 time but committed in 4.3 only). But I didn't think of the simplest solution: use greps together with strings(1): strings ./conftest | grep __emutls_get_address. Paolo
Re: Compiler turns off warnings unexpectedly
I have here an (attached) testcase which unexpectedly turns off warnings. Compiling it using `gcc test.c -c -Wall` (or test.i) gives: test.c: In function 'pam_sm_authenticate': test.c:6: warning: implicit declaration of function 'undef' This works on the trunk but fails on the 4.3 branch. gcc 4.1 also produces the expected output (implicit declaration undef2), so it seems like a recent regression. I guess filling a bug at http://gcc.gnu.org/bugzilla about this regression would help. As would adding the reduced testcase to the testsuite for trunk to ensure we don't regress again. :-) Will do so next week if nobody beats me to it. Paolo
Re: Official GCC git repository
Rafael Espindola wrote: Because the right one should have been $ git config --add remote.origin.fetch '+refs/heads/*:refs/remotes/origin/*' That is what git clone adds, but with that git branch -r will not list the remote branches. Uhm, it does here (I don't have a GCC repo, it's another one): $ git branch -r mirror/cpp mirror/exc-handling-alternate-fix mirror/filesystem mirror/ipv6 mirror/magritte mirror/master mirror/omnibrowser mirror/opengl mirror/opengl-nurbs mirror/poll-for-win32 mirror/pool-resolution mirror/roe mirror/sdl mirror/seaside mirror/stable-2.1 mirror/stable-2.2 mirror/stable-2.3 mirror/stable-3.0 origin/HEAD origin/master origin/stable-2.1 origin/stable-2.2 origin/stable-2.3 origin/stable-3.0 stephen/master stephen/pool-resolution stephen/stable-3.0 You can see that it also lists branches for different remotes (with distributed version control you need many of them, maybe one per contributor). Have you tried (after changing the .git/config line for remote.origin.fetch) doing a git fetch origin to refresh the list of available branches for the origin remote? If it works now, you probably want to remove the files in .git/refs/remotes/*. Paolo
Re: Official GCC git repository
Rafael Espindola wrote: git config --add remote.origin.fetch '+refs/remotes/*:refs/remotes/*' This will put the remote branch heads in refs/remotes, you might want to put them in refs/remotes/origin instead. $ git config --add remote.origin.fetch '+refs/remotes/*:refs/remotes/origin/*' One small problem I have with this. When I do git branch lto origin/lto the generated config entry says: [branch lto] remote = origin merge = refs/heads/lto and git pull will fail. Manually updating it to [branch lto] remote = origin merge = refs/remotes/lto Because the right one should have been $ git config --add remote.origin.fetch '+refs/heads/*:refs/remotes/origin/*' ? Paolo
Re: libjava and raw_cxx
Andreas Schwab wrote: Why is the libjava directory configured with raw_cxx? Makefile.def:151:target_modules = { module= libjava; raw_cxx=true; }; The problem with this is that it keeps the libtool test for dynamic linker characteristics from working properly, due to the undefined reference to __gxx_personality_v0 which is defined in libstdc++. If we weren't using libtool, it would be better to eliminate this and instead special case the linker in libjava's Makefile. But using libtool, it is basically a catch-22 (you need C++ in configure, but then C++ goes in the libtool script, and then you cannot eliminate it from the makefile). If it bothers you (does it cause a PR?), I think it's easiest to define a cache variable somewhere so that the test is forced to pass. Anyway you know you do not need to build C++ executables (only Java) in libjava. Paolo
Re: libjava and raw_cxx
Andreas Schwab wrote: Paolo Bonzini bonz...@gnu.org writes: If it bothers you (does it cause a PR?), It causes a program to fail to run during build. ./gcj-dbtool -n classmap.db || touch classmap.db /usr/local/gcc/gcc-20081202/Build/powerpc64-suse-linux/libjava/.libs/gcj-dbtool: error while loading shared libraries: libgcj.so.10: cannot open shared object file: No such file or directory Anyway you know you do not need to build C++ executables (only Java) in libjava. See above. But that's not a C++ program, it's a Java program. Paolo
Re: libjava and raw_cxx
If it bothers you (does it cause a PR?), It causes a program to fail to run during build. ./gcj-dbtool -n classmap.db || touch classmap.db /usr/local/gcc/gcc-20081202/Build/powerpc64-suse-linux/libjava/.libs/gcj-dbtool: error while loading shared libraries: libgcj.so.10: cannot open shared object file: No such file or directory Anyway you know you do not need to build C++ executables (only Java) in libjava. See above. But that's not a C++ program, it's a Java program. Yes, this is true. But even though the test that sets shlibpath_overrides_runpath is run for every compiler, only one result is then used for all link commands, and that happens to be the result of the C++ test. That's the bug then I'd say... Ralf what do you think? Paolo
Re: question on optimizing calls to library functions
The main difference that springs to mind: SIN is built-in, MATMUL is a library function. In gcc/builtin.defs, one finds Not just that: SIN is a pure (or const, depending on -frounding-math) function, which can be subject to CSE and DCE. I don't see anything suggesting that for MATMUL in intrinsic.c. In fact, since MATMUL receives the return array by reference and writes to it, it would be very wrong to make MATMUL const or pure. Paolo
Re: Cygwin support
To get around this you'd have to either link a separate copy of the plugin for each executable, or access the symbols in the executable indirectly through GetProcAddress and function pointers. Hacking the compiler (or postlinker!) to emit a special constructor that does the necessary GetProcAddress invocations seems not too hard... Paolo
Re: GNU Hurd changes vs. GCC: ``regression fixes and docs only''
Thomas Schwinge wrote: Hello! We, the GNU Hurd people, would like to get GCC in a compilable/usable shape for us again, without needing to do the patching that was needed since the 4.2 release. I have already some weeks ago sent the needed patches to the gcc-patches mailing list, where they have been acked by Paolo and Matthias. Now that my GCC copyright assignment papers are on file and my sourceware account has been enabled for accessing the GCC repository, I could in theory install the patches. However, as I read on the homepage, GCC trunk is currently in ``regression fixes and docs only'' mode. Asking on OFTC's #gcc channel, after having stated that ``changes that are entirely port specific generally have some leeway; that is, if the change can only affect the Hurd target, then the Hurd maintainer may approve fixes for serious bugs even in regression-only mode'' I'm a build system maintainer, so I can approve fixes for those bugs if they only touch the build system and the fixes are clearly specific to Hurd. Global reviewers can do the same with the other fixes. Paolo
Re: bootstrap4 vs. compare?
though that is probably inadequate. Especially because Makefile.in is automatically generated. :-) It's not the default goal that matters, but if bootstrap4 is a goal at all. Or if compare3 is a goal. I have a (correct) patch which I'll apply in a day or two. Thanks, Paolo
Re: -fno-ira removal
The following ports haven't been converted yet: arc m32c m68hc11 mmix pdp11 score vax DJ has reported problems on the list for m32c. Regarding ARC and MMIX we might expect some action from Joern and H-P respectively, but nobody is probably going to do the work for the others Paolo
Re: Support for NT based OS on ARM.
Farlie A wrote: Hi, Would you be willing to consider supporting the PE object formats on the ARM based port of ReactOS? If you are willing to contribute code for this, that's possible indeed. Otherwise, no one will probably do the work. Paolo
Re: Apple, iPhone, and GPLv3 troubles
This means that you couldn't use *GCC* if you did something the FSF found objectionable, closing an easy work-around. This doesn't work, because it breaks out of the basic framework of copyright law. Nobody signs anything or accepts any terms in order to use gcc. The FSF wants to stop people from distributing proprietary binary plugins to gcc. The copyright on gcc does not apply to those plugins. Also, even if you could develop a license similar to the GPL but with an additional restriction to this end, this would not be the GPL anymore, because GPLv3 limits the non-permissive additional terms to the ones listed in Section 7: a) [disclaiming warranty] b) [requiring preservation of legal notices] c) [requiring that modified versions of such material be marked] d) [limiting the use for publicity purposes of names] e) [declining to grant rights for use of some trademarks] f) [the Apache License's patent indemnification clause] All other non-permissive additional terms are considered further restrictions within the meaning of section 10. If the Program as you received it, or any part of it, contains a notice stating that it is governed by this License along with a term that is a further restriction, you may remove that term. Even without considering the feasibility of adding such a bullet, it would not be a good PR move for the FSF to add a bullet g to the above list for the sake of GCC. The possibility of modifying the permissive terms of the runtime libraries (which are distributed with every binary produced by the compiler) does seem like a good way to control the *usage* of the compiler as opposed to its distribution. Paolo
Re: Apple, iPhone, and GPLv3 troubles
Off-topic, but I feel this is important, since Apple contributed to gcc, and it is licensed under GPLv3 now. The license of GCC does not matter, unless the iPhone includes a copy of GCC's binaries for a recent-enough version. In which case, of course, Apple would be violating the GPLv3 and you should tell the FSF. [offtopic parts rot13'd] V fgvyy ubcr vg pna or fbyirq Jung fubhyq or fbyirq? Npghnyyl, jung pbhyq or fbyirq? OGJ, gur TCYi2 unf abar bs gur pynhfrf gung pnhfr gur gebhoyr Nccyr vf snpvat jvgu gur TCYi3 naq gur vCubar, fb vg vf BX. Fbzr znl guvax gung gur TCYi3 vf gur bar gung vf BX... Naljnl, va pnfr lbh jrera'g njner, vg'f abg whfg gur TCYi3 gung vf nssrpgrq. Orpnhfr bs gur AQN gurl fubiry qbja lbhe guebng jura lbh qbjaybnq gur FQX, Nccyr rssrpgviryl qrpvqrq gb ybpx nal xvaq bs bcra fbhepr fbsgjner (abg whfg serr fbsgjner) bhg bs gur vCubar. Ab ovt qrny, gung zrnaf gung V jba'g jevgr cebtenzf sbe gur vCubar V qba'g bja. Paolo
Apple-employed maintainers (was Re: Apple, iPhone, and GPLv3 troubles)
Peter O'Gorman wrote: Yuhong Bao wrote: and Apple uses GCC (which is now under GPLv3) and Mac OS X on it. Unfortunately, the iPhone is incompatible with GPLv3, if you want more see the link I mentioned. Apple does not use a GPLv3 version of GCC. Ah, actually I think I now see the OP's point. Apple is scared of the GPLv3 because the iPhone might violate it, so they are not contributing to anything that falls under the GPLv3. It is indeed in-topic. There are four Darwin maintainers listed in MAINTAINERS: darwin port Dale Johannesen [EMAIL PROTECTED] darwin port Mike Stump [EMAIL PROTECTED] darwin port Eric Christopher[EMAIL PROTECTED] darwin port Stan Shebs [EMAIL PROTECTED] and three of them are not allowed to read the GCC patches mailing list. They might do something if CCed, but not necessarily so. Same for Objective-C/C++: objective-c/c++ Mike Stump [EMAIL PROTECTED] objective-c/c++ Stan Shebs [EMAIL PROTECTED] Now I wonder: 1) does it make sense to keep a maintainer category that is known to be inactive? 2) who should then get maintainership of darwin? note that there are some patches for darwin like this one: http://article.gmane.org/gmane.comp.gcc.patches/172498 It's sad, but I think that there is need for the SC to take action on this. Paolo
Re: Apple-employed maintainers (was Re: Apple, iPhone, and GPLv3 troubles)
Well at least that explains their total inactivity in the last year. Is Dale the one still allowed to read the gcc-patches mailing list? No, that would be Stan just because he's not at Apple. It must be said also that Mike Stump accepted to review/discuss Darwin/ObjC patches that he was CCed on, but most people don't know that they need to do so. As a side note, Mike also wrote this last February: The SC knows of the issue Still, after six months it would be nice to have a clearer idea of what will happen with respect to Darwin/ObjC, especially since the previous statement (which I suppose was as clear as Mike could do) was buried under an unrelated thread. Paolo
Re: Apple, iPhone, and GPLv3 troubles
Steven Bosscher wrote: On Wed, Sep 24, 2008 at 4:06 PM, Ian Lance Taylor [EMAIL PROTECTED] wrote: Apple's dislike of GPLv3 is a problem for gcc, yes. Well, excuse me for being a-political, but I don't see this problem. The relationship between GCC and Apple has never been really good AFAIK, but that hasn't hampered either to be quite successful. I agree with you, but if you don't look at GCC as a whole -- but rather at the small intersection represented by FSF GCC on Darwin -- it *has* hampered it. Apple GCC is basically a fork nowadays, and it is often impossible to compile Leopard application using FSF GCC (in turn because of the lack of Objective-C 2.0 support). Sometimes I wonder why Darwin is still part of FSF GCC, just like it is not supported in binutils or gdb... I guess just for the sake of GCC developers that are working on a Mac. Even outside *-*-darwin*, what caused the development of two separate Objective-C runtimes, the one in FSF GCC being a big chainball for the removal of dead code from the compiler? Note that basically all Objective-C code in existence either does not care about the runtime, or has support for both runtimes; so it would not be a problem to deprecate libobjc if Apple contributed their own implementation. (There is now a third runtime, named Étoilé). Paolo ps: of course, there is no offense intended for poor Mike who's CCed in this thread.
Re: C/C++ FEs: Do we really need three char_type_nodes?
Diego Novillo wrote: On Fri, Sep 19, 2008 at 12:55, Jakub Jelinek [EMAIL PROTECTED] wrote: On Fri, Sep 19, 2008 at 12:36:12PM -0400, Diego Novillo wrote: When we instantiate char_type_node in tree.c:build_common_tree_nodes we very explicitly create a char_type_node that is signed or unsigned based on the value of -funsigned-char, but instead of make char_type_node point to signed_char_type_node or unsigned_char_type_node, we explicitly instantiate a different type. C++ e.g. requires that char (c) is mangled differently from unsigned char (h) and signed char (a), it is a distinct type. Thanks, that answer my question. But does it need to be streamed out differently? I mean, char_type_node could be streamed out as signed_char_type_node or unsigned_char_type_node, because the mangling has already been done. Paolo
Re: worst case register classes (Was: Re: IRA_COVER_CLASSES for m32c)
I think our mxp is more 'interesting'. [snip] I think it's more like 'insane', :-) and a miracle that a retargetable compiler can be ported to it. Paolo
Re: extra instructions lost from -O0 to -O1
Thomas A.M. Bernard wrote: Well I found another way to solve the problem by updating the dce for not taking out my instructions. I inserted setallocate as a native operator in the back-end which comes from a GIMPLE node and map to the RTL pattern. Earlier in the discussion, it's been discussed that the dce was taking out the instruction when flag -O1 was engaged. To solve that, in 'tree-ssa-dce.c', I flagged this node with the function, mark_stmt_necessary. And it works fine so far. The instruction is not omitted anymore by the dce :-) Do not add it as a GIMPLE node. Add it as a builtin function, so that the tree-level DCE will treat like every other call and not remove it. IOW, do not add new kinds of node. Use builtins for trees, and unspecs for RTL. Paolo
Re: extra instructions lost from -O0 to -O1
Thomas A.M. Bernard wrote: I have tried unspec_volatile without success though. As follow, (define_insn setallocate [(setallocate (unspec_volatile:DI [ (match_operand:DI 0 general_operand r)] UNSPEC_ALLOCATE) )] allocate %0\t\t#TCB_INSTRUCTIONS[(set_attr type multi)]) This more or less should work, except that it completely subsumes the SETALLOCATE rtx code that you've added. The pattern should just be [(unspec_volatile:DI [(match_operand:DI 0 general_operand r)] UNSPEC_ALLOCATE)] Paolo
Re: Passing LDFLAGS to stage2 and stage3 gcc
Rainer Emrich wrote: [EMAIL PROTECTED] schrieb: Rainer Emrich [EMAIL PROTECTED] wrote: So I wan't to pass LDFLAGS=-Wl, -rpath, /somedir to stage3 to link gcc, cpp, etc. with the rpath information. I do this by editing LDFLAGS_FOR_TARGET in the top-level Makefile.in, and also passing LDFLAGS, BOOT_LDFLAGS, and HOST_LDFLAGS assignments as arguments to make. I'm not cross-compiling, though, so you may have to adjust that somewhat. Paul, thank's for the hint. HOST_LDFLAGS does not exist, and you should be able to pass LDFLAGS_FOR_TARGET on the command-line. Anyways, for your needs all you have to do is make BOOT_LDFLAGS=-Wl,-rpath,/somedir Paolo
Re: [PATCH] Update libtool to latest git tip
Well, libtool-2.2.6 is finally released (twice even). Actual approval depends on your answer to this question, but the patch is technically okay. Can you commit it to the src repository too? There is some regeneration to do there too. I know that GCC is now in stage 3, and that we missed the end of stage 1 by a week, but I would still like to update gcc's libtool to 2.2.6. It fixed a Darwin bug, right? Paolo
Re: [PATCH] Update libtool to latest git tip
Peter O'Gorman wrote: On Mon, Sep 08, 2008 at 08:29:37PM +0200, Paolo Bonzini wrote: Well, libtool-2.2.6 is finally released (twice even). Actual approval depends on your answer to this question, but the patch is technically okay. Can you commit it to the src repository too? There is some regeneration to do there too. I know that GCC is now in stage 3, and that we missed the end of stage 1 by a week, but I would still like to update gcc's libtool to 2.2.6. It fixed a Darwin bug, right? Yes, though I do not know if Jack actually filed a PR for it, it was about debugging libstdc++ on darwin. Post an updated patch and, next week, I'll apply it. Paolo
Re: PR37363: PR36090 and PR36182 all over again
As H-P says, the predicates on move expanders are generally ignored. emit_move_insn subroutines deliberately don't check them. It's even worse; force_reg is effectively hardcoding movXX's operand 1 to be a general_operand. (But my point was that force_reg does use LEGITIMATE_CONSTANT_P through general_operand). Not necessarily; anything that's found in a non-legitimate constant must be handled by force_reg, and force_reg also tries using force_operand if what it gets is not a general_operand. But maybe it's necessary to add a if (GET_CODE (value) == CONST) value = XEXP (value, 0); in force_operand. As you say, force_operand currently does nothing with constants. My understanding is that that really is by design (in the loosest possible sense of the word). As H-P says, it's then the move expander's responsibility to handle the thing. force_reg is weird: 1) it tries the move expander if operand 1 is a general_operand first; 2) it tries force_operand if it is not; 3) it tries the move expander if force_operand fails. It would make sense to have something like: 1) check the move expander's predicate; 2) try force_operand; 3) abort. But I agree that it is not a lightweight to change it, and I wouldn't propose it -- especially now. OTOH every message in this thread is highlighting something fishy. would in some cases be accurate.) I think using an unspec in rs6000 would solve some of the port-specific issues. In particular, I don't think 36090 would have happened with an unspec representation. I agree. So your plan would be to change rs6000 to an unspec, and drop the problematic hunk in simplify-rtx.c? That would be okay with me, but it's not a small change for rs6000. Paolo
Re: PR37363: PR36090 and PR36182 all over again
Only with a LEGITIMATE_CONSTANT_P catching it... Of course. So, can we agree on some or all of: 1. This (PR37363/PR36182) and PR36090 (in both ports) and whatever other port will be affected should be solved by a stricter LEGITIMATE_CONSTANT_P check, and where canonicalization is undefined (and a new definition can't get consensus agreed upon), the port has to check itself for whatever RTL expression it accepts. 2. Change the LEGITIMATE_CONSTANT_P documentation. 3. Change the default of LEGITIMATE_CONSTANT_P to a helper function, maybe trivial_constant_expression_p above. Agreed, but I don't see t_c_e_p in GCC sources (if you meant my function using the predicate, it cannot work because the predicate might in turn call LEGITIMATE_CONSTANT_P). It could be if (GET_CODE (x) != CONST) return true; x = XEXP (x, 0); return GET_CODE (x) == PLUS GET_CODE (XEXP (x, 1)) == CONST_INT (GET_CODE (XEXP (x, 0)) == SYMBOL_REF || GET_CODE (XEXP (x, 0)) == LABEL_REF); (i.e. the test in cse.c) or something like that. Would you change simplify-rtx.c to test LEGITIMATE_CONSTANT_P before wrapping something with a CONST? Alternatively, I wouldn't mind see rs6000 use unspecs for GOT/TOC offsets as other ports do; this would allow removing the optimization in simplify_plus_minus, which would fix CRIS too (because I'm worried that other targets might be affected, not just CRIS). Of course, if that gives known pessimizations on rs6000 it would not be a good thing to do, and probably no one would volunteer to do that change anyway, so... Paolo
Re: PR37363: PR36090 and PR36182 all over again
Hans-Peter Nilsson wrote: Date: Fri, 5 Sep 2008 14:57:00 +0200 From: Hans-Peter Nilsson [EMAIL PROTECTED] Maybe as part of a change from target macro to target hook, with LEGITIMATE_CONSTANT_P as a default would fit, even at this stage? Sorry, I mean CONSTANT_P, not LEGITIMATE_CONSTANT_P. Or maybe a new macro or hook What about replacing the problematic uses of gen_rtx_CONST with plus_constant (x, 0)? plus_constant knows when to make a CONST rtx. There are just a handful of places where this would be needed: instead of the check after the wrong comment in cse.c, and everywhere gen_rtx_CONST is used in simplify-rtx.c. Paolo
Re: PR37363: PR36090 and PR36182 all over again
Paolo Bonzini wrote: Hans-Peter Nilsson wrote: Date: Fri, 5 Sep 2008 14:57:00 +0200 From: Hans-Peter Nilsson [EMAIL PROTECTED] Maybe as part of a change from target macro to target hook, with LEGITIMATE_CONSTANT_P as a default would fit, even at this stage? Sorry, I mean CONSTANT_P, not LEGITIMATE_CONSTANT_P. Or maybe a new macro or hook What about replacing the problematic uses of gen_rtx_CONST with plus_constant (x, 0)? plus_constant knows when to make a CONST rtx. There are just a handful of places where this would be needed: instead of the check after the wrong comment in cse.c, and everywhere gen_rtx_CONST is used in simplify-rtx.c. Here is a prototype patch, untested. Paolo 2008-09-06 Paolo Bonzini [EMAIL PROTECTED] * explow.c (plus_constant): Don't exit early if c == 0, to allow canonicalizing CONSTs. * cse.c (fold_rtx): Use plus_constant instead of wrapping with CONST. * simplify-rtx.c (simplify_plus_minus): Likewise. Index: cse.c === --- cse.c (revision 134435) +++ cse.c (working copy) @@ -3161,10 +3161,8 @@ fold_rtx (rtx x, rtx insn) FIXME: those ports should be fixed. */ if (new != 0 is_const GET_CODE (new) == PLUS -(GET_CODE (XEXP (new, 0)) == SYMBOL_REF - || GET_CODE (XEXP (new, 0)) == LABEL_REF) GET_CODE (XEXP (new, 1)) == CONST_INT) - new = gen_rtx_CONST (mode, new); + new = plus_constant (XEXP (new, 0), XEXP (new, 1)); + else + new = plus_constant (new, 0); } break; Index: simplify-rtx.c === --- simplify-rtx.c (revision 140055) +++ simplify-rtx.c (working copy) @@ -3625,7 +3625,7 @@ simplify_plus_minus (enum rtx_code code, tem = simplify_binary_operation (ncode, mode, tem_lhs, tem_rhs); if (tem !CONSTANT_P (tem)) - tem = gen_rtx_CONST (GET_MODE (tem), tem); + tem = plus_constant (tem, 0); } else tem = simplify_binary_operation (ncode, mode, lhs, rhs); @@ -3690,7 +3690,7 @@ simplify_plus_minus (enum rtx_code code, GET_CODE (ops[i].op) == GET_CODE (ops[i - 1].op)) { ops[i - 1].op = gen_rtx_MINUS (mode, ops[i - 1].op, ops[i].op); - ops[i - 1].op = gen_rtx_CONST (mode, ops[i - 1].op); + ops[i - 1].op = plus_constant (ops[i - 1].op, 0); if (i n_ops - 1) ops[i] = ops[i + 1]; n_ops--; @@ -5247,7 +5247,7 @@ simplify_subreg (enum machine_mode outer GET_MODE_BITSIZE (innermode) = (2 * GET_MODE_BITSIZE (outermode)) GET_CODE (XEXP (op, 1)) == CONST_INT (INTVAL (XEXP (op, 1)) (GET_MODE_BITSIZE (outermode) - 1)) == 0 - INTVAL (XEXP (op, 1)) GET_MODE_BITSIZE (innermode) + INTVAL (XEXP (op, 1)) GET_MODE_BITSIZE (innermode) byte == subreg_lowpart_offset (outermode, innermode)) { int shifted_bytes = INTVAL (XEXP (op, 1)) / BITS_PER_UNIT; Index: explow.c === --- explow.c(revision 134435) +++ explow.c(working copy) @@ -83,9 +83,6 @@ plus_constant (rtx x, HOST_WIDE_INT c) rtx tem; int all_constant = 0; - if (c == 0) -return x; - restart: code = GET_CODE (x);
Re: PR37363: PR36090 and PR36182 all over again
I'm not sure about this bit. Couldn't [snip cse.c code] simply be replaced by: /* We can't simplify extension ops unless we know the original mode. */ if ((code == ZERO_EXTEND || code == SIGN_EXTEND) mode_arg0 == VOIDmode) break; new = simplify_unary_operation (code, mode, const_arg0 ? const_arg0 : folded_arg0, mode_arg0); ? (Sorry if I'm repeating earlier discussion here.) I think so -- I was just trying to resemble the existing code as much as possible (stage3), but it's probably better to clean up instead. What do you thing about the simplify-rtx.c part instead? Paolo 2008-09-06 Paolo Bonzini [EMAIL PROTECTED] * cse.c (fold_rtx): Let simplify_unary_operation handle CONSTs. * explow.c (plus_constant): Don't exit early if c == 0, to allow canonicalizing CONSTs. * simplify-rtx.c (simplify_plus_minus): Likewise. Index: cse.c === --- cse.c (revision 134435) +++ cse.c (working copy) @@ -3138,33 +3138,20 @@ fold_rtx (rtx x, rtx insn) { case RTX_UNARY: { - int is_const = 0; - /* We can't simplify extension ops unless we know the original mode. */ if ((code == ZERO_EXTEND || code == SIGN_EXTEND) mode_arg0 == VOIDmode) break; - /* If we had a CONST, strip it off and put it back later if we - fold. */ + /* If we had a CONST, strip it off and let simplify_unary_operation + put it back if it can simplify something. */ if (const_arg0 != 0 GET_CODE (const_arg0) == CONST) - is_const = 1, const_arg0 = XEXP (const_arg0, 0); + const_arg0 = XEXP (const_arg0, 0); new = simplify_unary_operation (code, mode, const_arg0 ? const_arg0 : folded_arg0, mode_arg0); - /* NEG of PLUS could be converted into MINUS, but that causes - expressions of the form - (CONST (MINUS (CONST_INT) (SYMBOL_REF))) - which many ports mistakenly treat as LEGITIMATE_CONSTANT_P. - FIXME: those ports should be fixed. */ - if (new != 0 is_const -GET_CODE (new) == PLUS -(GET_CODE (XEXP (new, 0)) == SYMBOL_REF - || GET_CODE (XEXP (new, 0)) == LABEL_REF) -GET_CODE (XEXP (new, 1)) == CONST_INT) - new = gen_rtx_CONST (mode, new); } break; Index: simplify-rtx.c === --- simplify-rtx.c (revision 140055) +++ simplify-rtx.c (working copy) @@ -3625,7 +3625,7 @@ simplify_plus_minus (enum rtx_code code, tem = simplify_binary_operation (ncode, mode, tem_lhs, tem_rhs); if (tem !CONSTANT_P (tem)) - tem = gen_rtx_CONST (GET_MODE (tem), tem); + tem = plus_constant (tem, 0); } else tem = simplify_binary_operation (ncode, mode, lhs, rhs); @@ -3690,7 +3690,7 @@ simplify_plus_minus (enum rtx_code code, GET_CODE (ops[i].op) == GET_CODE (ops[i - 1].op)) { ops[i - 1].op = gen_rtx_MINUS (mode, ops[i - 1].op, ops[i].op); - ops[i - 1].op = gen_rtx_CONST (mode, ops[i - 1].op); + ops[i - 1].op = plus_constant (ops[i - 1].op, 0); if (i n_ops - 1) ops[i] = ops[i + 1]; n_ops--; @@ -5247,7 +5247,7 @@ simplify_subreg (enum machine_mode outer GET_MODE_BITSIZE (innermode) = (2 * GET_MODE_BITSIZE (outermode)) GET_CODE (XEXP (op, 1)) == CONST_INT (INTVAL (XEXP (op, 1)) (GET_MODE_BITSIZE (outermode) - 1)) == 0 - INTVAL (XEXP (op, 1)) GET_MODE_BITSIZE (innermode) + INTVAL (XEXP (op, 1)) GET_MODE_BITSIZE (innermode) byte == subreg_lowpart_offset (outermode, innermode)) { int shifted_bytes = INTVAL (XEXP (op, 1)) / BITS_PER_UNIT; Index: explow.c === --- explow.c(revision 134435) +++ explow.c(working copy) @@ -83,9 +83,6 @@ plus_constant (rtx x, HOST_WIDE_INT c) rtx tem; int all_constant = 0; - if (c == 0) -return x; - restart: code = GET_CODE (x);
Re: PR37363: PR36090 and PR36182 all over again
if plus_constant _knows_ that something can be wrapped in a CONST, simplify_binary_operation should have given us the CONST to begin with. Also, the only cases that plus_constant can handle are CONST, SYMBOL_REF and LABEL_REF, all of which satisfy CONSTANT_P. So the new form ought to be dead on two counts. Yes, and in the other case too: ops[i - 1].op = gen_rtx_MINUS (mode, ops[i - 1].op, ops[i].op); ops[i - 1].op = plus_constant (ops[i - 1].op, 0); plus_constant won't understand the MINUS, and won't generate a CONST. Still, having a new target hook for this seems overkill. For example, since ports do have to deal with complicated constants when they expand moves, and since some of them already look inside CONSTs in their LEGITIMATE_CONSTANT_P, another possibility to throw in the air is something like (better names welcome...) rtx avoid_terrible_constants (rtx x) { if (!CONSTANT_P (x)) x = gen_rtx_CONST (x); /* If the target's move expanders will take care of it, it must not be that bad. */ icode = optab_handler (mov_optab, GET_MODE (x))-insn_code; if (*insn_data[icode].operand[1].predicate (x, GET_MODE (x))) return x; return NULL; } In case of cris, the predicate goes into general_operand, which does if (CONSTANT_P (op)) return ((GET_MODE (op) == VOIDmode || GET_MODE (op) == mode || mode == VOIDmode) (! flag_pic || LEGITIMATE_PIC_OPERAND_P (op)) LEGITIMATE_CONSTANT_P (op)); H-P can check for the problematic case inside his LEGITIMATE_CONSTANT_P (*), or add a move expander for it. (*) but then does this mean the documentation for L_C_P is obsolete, and returning 1 is not necessarily a good thing to do for targets with sections? Maybe there is a better definition that can be the default? Anyway, at least how to use this function is pretty obvious: tem_rhs = GET_CODE (rhs) == CONST ? XEXP (rhs, 0) : rhs; tem = simplify_binary_operation (ncode, mode, tem_lhs, tem_rhs); - if (tem !CONSTANT_P (tem)) - tem = gen_rtx_CONST (GET_MODE (tem), tem); + if (tem) + tem = avoid_terrible_constants (tem); } else tem = simplify_binary_operation (ncode, mode, lhs, rhs); ... CONSTANT_P (ops[i].op) GET_CODE (ops[i].op) == GET_CODE (ops[i - 1].op)) { - ops[i - 1].op = gen_rtx_MINUS (mode, ops[i - 1].op, ops[i].op); - ops[i - 1].op = gen_rtx_CONST (mode, ops[i - 1].op); - if (i n_ops - 1) - ops[i] = ops[i + 1]; - n_ops--; + rtx x; + x = gen_rtx_MINUS (mode, ops[i - 1].op, ops[i].op); + x = avoid_terrible_constants (x); + if (x) + { + ops[i - 1].op = x; + if (i n_ops - 1) + ops[i] = ops[i + 1]; + n_ops--; +} } if (n_ops 1 I'm absolutely unsure that this is the way to go; but it has two advantages: 1) not leaking really bad constants outside simplify-rtx.c; 2) it makes clear how to fix bugs -- you restrict LEGITIMATE_CONSTANT_P/LEGITIMATE_PIC_OPERAND_P or add a move expander. Paolo
Re: PR37363: PR36090 and PR36182 all over again
In case of cris, the predicate goes into general_operand, which does if (CONSTANT_P (op)) return ((GET_MODE (op) == VOIDmode || GET_MODE (op) == mode || mode == VOIDmode) (! flag_pic || LEGITIMATE_PIC_OPERAND_P (op)) LEGITIMATE_CONSTANT_P (op)); H-P can check for the problematic case inside his LEGITIMATE_CONSTANT_P (*), or add a move expander for it. I think you're mixing up CRIS and rs6000, the latter which generated something it had to handle but which was munged, PR36090. CRIS is mainstream in that sense. (You'd have to get buy-in from David Edelsohn on a LEGITIMATE_CONSTANT_P definition in rs6000 if PR36090 resurfaces.) This is from CRIS: (define_expand movsi [(set (match_operand:SI 0 nonimmediate_operand ) (match_operand:SI 1 cris_general_operand_or_symbol ))] ... (define_special_predicate cris_general_operand_or_symbol (ior (match_operand 0 general_operand) (and (match_code const, symbol_ref, label_ref) ... Did you mean this as a short-term or long-term solution? (Mind, we already have a proposed short-term solution.) As a long term solution. Though not in that exact shape -- I wanted to have discussion on it and converge together to a real solution. (*) but then does this mean the documentation for L_C_P is obsolete, and returning 1 is not necessarily a good thing to do for targets with sections? Maybe there is a better definition that can be the default? Again, LEGITIMATE_CONSTANT_P is the wrong macro, it's for checking constants which are appropriate as immediate operands (to non-move insns), not for being at-all-legitimate. LEGITIMATE_CONSTANT_P is just what is used by general_operand. I'm proposing another use of *the predicate for mov's operand 1*, not of LEGITIMATE_CONSTANT_P. With the above questions, I was expressing my doubts on the doc for LEGITIMATE_CONSTANT_P in general. Signalling that they are not legitimate means they can still be handled by a move. That's why I used the predicate. 2) it makes clear how to fix bugs -- you restrict LEGITIMATE_CONSTANT_P/LEGITIMATE_PIC_OPERAND_P or add a move expander. Contradicting current use, where anything that's found in a non-LEGITIMATE_CONSTANT_P/LEGITIMATE_PIC_OPERAND_P must be handled by a move expander! Not necessarily; anything that's found in a non-legitimate constant must be handled by force_reg, and force_reg also tries using force_operand if what it gets is not a general_operand. But maybe it's necessary to add a if (GET_CODE (value) == CONST) value = XEXP (value, 0); in force_operand. To wit: a new bug would surface: you could here form something that wasn't LEGITIMATE_CONSTANT_P but which was handled by a move expander, and you'd force this into an insn which *isn't* a move. N.B. the insn in PR36182 wasn't a move. Shouldn't the insn fail recognization, then? (FWIW, I'll add a LEGITIMATE_CONSTANT_P to CRIS just to cover my bases. It won't solve the basic problem, because that could just cause that invalid CONST contents in PR37363 and PR36182 to end up in a move insn instead.) I don't think so, because general_operand would pass the CONST to your LEGITIMATE_CONSTANT_P, and hence cause it to be rejected. Paolo
Re: PR37363: PR36090 and PR36182 all over again (was: Re: Call for testers, ppc64-linux)
I got negative feedback on that patch (no, not regression results :) on IRC from David Edelsohn and understandably you held off your testing because of this, as for one the patch affects the rs6000 backend. What kind of negative feedback? For CRIS (as well as other targets IIUC) the cause of PR37363 is that there's code that wraps a MINUS of two symbol_ref's in a CONST without checking that the two symbol_ref's make up a valid address. After that, the CONST effectively acts as a barrier for target hooks (no need to look, we know that thing there is a valid constant expression). The three possibilities I see are: 1) removing the wrapping CONST? 2) using the patch in http://gcc.gnu.org/bugzilla/attachment.cgi?id=15620action=view which however just papers around this problem. 3) adding a check that the MINUS is a legitimate address, and only wrap it in CONST if it is. Paolo
Re: PR37363: PR36090 and PR36182 all over again
3) adding a check that the MINUS is a legitimate address, and only wrap it in CONST if it is. s/address/constant/; it's not clear that it's used as an address at that point; it's just two expressions that gcc tries to reduce. Right. But I get the point; I'm leaning towards something like strengthening that it's a legitimate constant. See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36182#c12 and other comments in that PR. But... should we really redefine LEGITIMATE_CONSTANT_P and its documentation at this stage? We can do it incrementally. For now, only redefine LEGITIMATE_CONSTANT_P on CRIS and in the documentation, and use it in simplify_plus_minus. For 4.5, we can look at other places using gen_rtx_CONST and strengthen them too. Paolo
Re: [PATCH] Use lwsync in PowerPC sync_* builtins
David Edelsohn wrote: On Wed, Sep 3, 2008 at 6:53 PM, Anton Blanchard [EMAIL PROTECTED] wrote: The only thing lwsync wont order is a store followed by a load. Since the lwsync will always be paired with a store (the stwcx), we will order all accesses before it and provide a release barrier. Anton, My one other concern is developers using the builtins for applications on embedded PowerPC processors. lwsync will not order accesses to device memory space, AFAICT. Don't you need eieio+sync for that? GCC does not generate the eieio now. Paolo