Re: Official GCC git repository
Bernardo Innocenti wrote: Harvey Harrison wrote: A few things I'd like to clean up if we move a copy over: 1) generate a author's file so the commits have better than just a login name as the commiter/author. Or in the ChangeLog. Paolo
Re: Official GCC git repository
My current plan is to bug a few of our devs to try git, and a few to try hg (for a few weeks each), giving them whatever tutorials are around, and see if they find it better enough than subversion. I can try to use git, but I'm already quite experienced in it so I'm not representative. (Personally, I use hg now because being able to log/etc the entire gcc history and do offline commits makes my life a lot easier now that i travel more). Same here. I was using tla for other projects, but it was so slow that the only benefit for me was offline commits. How useful it is to have the entire history at hand, that's something you just have to try to understand. :-) Paolo
Re: why don't have 'umul' rtx code
Eric Fisher wrote: hi I'm not clear why we have 'udiv', but don't have 'umul' for Standard Pattern Names. Does I need to define a nameless pattern for it? Because non-widening multiplication is the same for signed and unsigned. We have: mul3 mul3 (signed x signed) umul3 (unsigned x unsigned) usmul3 (unsigned x signed) umul3_highpart(unsigned x unsigned) smul3_highpart(signed x signed) Paolo
Re: [trunk] Addition to subreg section of rtl.text.
I think one reason is that allowing zero_extracts of multi-word modes is (like this subreg thing) a little hard to pin down. What happens when WORDS_BIG_ENDIAN && !BYTES_BIG_ENDIAN Unless I had my grep wrong, the only such machines to do this are PDP11 and ARM with special flags (-mbig-endian -mwords-little-endian) that were "for backward compatibility with older versions of GCC" in 1999 [1]. So, is this special case worth keeping? Paolo [1] http://www.ecos.sourceware.org/ecos/docs-1.2.1/ref/gnupro-ref/arm/index.html
Re: [trunk] Addition to subreg section of rtl.text.
Richard Sandiford wrote: Paolo Bonzini <[EMAIL PROTECTED]> writes: I think one reason is that allowing zero_extracts of multi-word modes is (like this subreg thing) a little hard to pin down. What happens when WORDS_BIG_ENDIAN && !BYTES_BIG_ENDIAN Unless I had my grep wrong, the only such machines to do this are PDP11 and ARM with special flags (-mbig-endian -mwords-little-endian) that were "for backward compatibility with older versions of GCC" in 1999 [1]. So, is this special case worth keeping? Good question. Unless I'm missing something, PDP11 isn't yet on the deprecrated list. Is that right? If so, I suppose we can't remove it before 4.5 at the earliest. It was in the 4.3 list, then Paul Koning stepped up to do some work on it but then nothing happened. [context for Paul: PDP-11 is the last target for which BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN] http://gcc.gnu.org/ml/gcc/2008-01/msg00339.html Paolo
Re: [trunk] Addition to subreg section of rtl.text.
(Yes, the documentation suggests byte_mode for MEMs, but the SH port uses zero_extracts of SImode MEMs as well, so presumably we're supposed to support other modes besides the documented ones.) I think it is just that no one cares about a MEM's mode in this case. Paolo
Re: [trunk] Addition to subreg section of rtl.text.
Bernd Schmidt wrote: Joern Rennecke wrote: And @code{(subreg:SI (reg:DF 10) 0)} would be a natural way to express that you are using the floating point register as a 32 bit integer register, with writes clobbering the entire 64 bit of the register. Yes, this is one possible definition. But there's no reason in this situation why you couldn't just use a single REG. Why use subregs at all? Because before reload, you use pseudos. And in order for (subreg:SI (reg:DF ...) ...) to be viable, it still has to be viable between hard register allocation and alter_reg. Is that even valid? Are there any known ports using this? AFAIR the middle-end doesn't create this (although it will use (subreg:SF (reg:DI)). SPE has patterns for [(set (match_operand:SI 0 "rs6000_nonimmediate_operand" "+r,m") (subreg:SI (match_operand:SPE64TF 1 "register_operand" "r,r") 4))] for example. Paolo
Re: Different *CFLAGS in gcc/Makefile.in
## the C flags (without any gcc -I...stuff) to be included in ## compilation of MELT generated C code thru the melt-cc-script ## do not put $(INTERNAL_CFLAGS) $(COVERAGE_FLAGS) $(WARN_CFLAGS) ##there! MELT_CFLAGS= $(X_CFLAGS) $(T_CFLAGS) $(CFLAGS) $(XCFLAGS) But I'm not sure of the T_CFLAGS (it probably is related to target specific stuff only). T_CFLAGS is flags that the *target* decides to add. It's actually used only by ia64/t-hpux and gcc.c, and getting rid of it in some way would be a good thing. I think you're okay with your choice. Paolo
Re: [trunk] Addition to subreg section of rtl.text.
The second is to say explicitly that subregs of subregs are not legal. Yes, you should always use (directly or indirectly) simplify_gen_subreg. Paolo
Re: Official GCC git repository
I was only suggesting it as a nicity, if people are happy with the login name alone. What about "Real Name <[EMAIL PROTECTED]>"? The overseers have the mapping, or you can sort of guess it from the names in the ChangeLog. This has to be decided before the first push, so it's kind of urgent to decide it. Paolo
Re: GSOC Student application
There are issues of Garbage Collection from libgcc or Boehms's GC Two mistakes in one line. Congratulations J.C. for confusing a prospective GSoC contributor. So far your messages were just useless and decreasing signal-to-noise ratio. Now you've escalated to actually damaging activity. Paolo
Re: GSOC Student application
Joe> It's best to ignore J.C. Pizarro. He's an attention-seeking troll, Joe> who has just enough technical knowledge to derail conversation. I think that if we've reached the point where an SC member feels the need to post disclaimers about someone's posts, then that someone ought to simply be banned. I know this is extreme, and as far as I know we've never done it before. But, in my opinion, we've been more than tolerant here. There's no benefit that I can see to putting up with this kind of bad behavior. The downside of banning J.C. is that if he replies-to-all, no one else would be alerted of his message -- and whoever he replies to (Alexey in this case) may have no clue that he should not pay attention to the message. Paolo
Re: Bootstrap failure due to a typo in gcc/fwprop.c
This is due to revision 133828 and fixed by the following patch: --- ../_gcc_clean/gcc/fwprop.c 2008-04-02 12:12:57.0 +0200 +++ gcc/fwprop.c2008-04-02 13:44:07.0 +0200 @@ -231,7 +231,7 @@ PR_HANDLE_MEM is set when the source of the propagation was not another MEM. Then, it is safe not to treat non-read-only MEMs as ``opaque'' objects. */ - PR_HANDLE_MEM = 2, + PR_HANDLE_MEM = 2 }; Committed as 133833. Paolo
Re: Bootstrap comparison failures on i586
Eric Botcazou wrote: Hi, Since yesterday I'm having seemingly random bootstrap comparisons failures on i586-suse-linux: for caller-save.o yesterday, for build/gensupport.o today at revision 133861. But a second tree at the same revision bootstrapped fine. Is anyone else seeing this? Have you tried running valgrind? Paolo
Re: RFC Test suite fix testing of no_trampolines
Andy H wrote: There are several test in testsuite that use trampolines that are still run with dejagnu switch set to no_trampolines. Its on my TODO list for AVR target but a recent email reminded me that it affects testing of other targets than can't or won't support trampolines. Theres an old patch by Björn Haase that was approved but not committed in 2005 that addressed many of these http://gcc.gnu.org/ml/gcc-patches/2005-05/msg01919.html The patch was even approved... Paolo
Re: US-CERT Vulnerability Note VU#162289
Rainer Emrich wrote: http://www.kb.cert.org/vuls/id/162289 Any comments? See http://www.airs.com/blog/archives/120 for a good blog post by Ian Lance Taylor about this issue. -Wstrict-overflow=5 can be used to find cases where optimizations break not standard specified overflow cases, since GCC 4.2. Also, -ftrapv is a little broken and may have false negatives. On the other hand, -fwrapv should not cause any problems. If you find that -fwrapv hinders performance of your application, you can also try "-fwrapv -funsafe-loop-optimizations -Wunsafe-loop-optimizations". This will restrict overflow assumptions to those needed to optimize loops, and also warn whenever the compiler made this kind of assumptions. You can then audit any warning that you get to see if they have security implications for your application. Paolo
Re: Copyright assignment wiki page
Then I suggest changing our contribute page from contact us (either via the gcc@gcc.gnu.org list or the GCC maintainer that is taking care of your contributions) to obtain the relevant forms to contact us (either via the gcc@gcc.gnu.org list or a GCC Steering Commitee member) to obtain the relevant forms to reflect this. It's not so hard actually. Any person who has a GNU account can get them. I just checked and, among people who are not SC members and are usually on IRC I counted 6-7 people. Just ask them and they will forward you the administrivia form. Paolo
Re: US-CERT Vulnerability Note VU#162289
(as an aside, as most target implementations treat pointers as unsigned values, its not clear that presuming signed integer overflow semantics are a reasonable choice for pointer comparison optimization) The point is not of presuming signed integer overflow semantics (I was corrected on this by Ian Taylor). It is of presuming that pointers never move before the beginning of their object. If you have an array of 20 elements, pointers &a[0] to &a[20] are valid (accessing &a[20] is not valid), but the compiler can assume that the program does not refer to &a[-2]. Paolo
Re: US-CERT Vulnerability Note VU#162289
A theoretical argument for why somebody might write problematic code is http://www.fefe.de/openldap-mail.txt . But that's like "putting the cart before the horses" (and complaining that it does not work). You find a security problem, you find a solution, you find the compiler optimizes away, you blame the compiler. You don't look for an alternative, which would be the most sensible: compare the length with the size, without unnecessary pointer arithmetic. Since the length is unsigned, it's enough to do this: if (len > (size_t) (max - ptr)) /* overflow */ ; Paolo
Re: IRA for GCC 4.4
(The testcase is 400k lines of preprocessed Fortran code, 16M is size, available here: http://www.pci.unizh.ch/vandevondele/tmp/all_cp2k_gfortran.f90.gz) Thanks, I'll check it. Vlad, I think you should also try to understand what does trunk do with global (and without local allocation) at -O0. That will give a measure of the benefit from Peter's patches for conflict graph building. Another thing to evaluate is the impact of changing gimplify.c so that it always follows the "if (optimize)" paths. The differences are there exactly because we don't run global register allocation at -O0, and they create more pseudos. Paolo
Re: Security vulernarability or security feature?
I think Java handles it OK for floats. I.e. Tests for positive infinity and negative infinity etc. I don't think Java handles it for integer maths. Java integer math is mandated to have wrap-around semantics. So you can do something like if ((b^c) > 0 && (a^c) < 0 && (a^b) < 0) overflow Paolo
Re: Weird result for modulus operation
Ang Way Chuang wrote: Ang Way Chuang wrote: Andrew Pinski wrote: On Tue, Apr 29, 2008 at 9:08 PM, Ang Way Chuang <[EMAIL PROTECTED]> wrote: Thanks for the speedy reply. But why this code: int a = 17, b = 16; a = a++ % 16; results in a = 2 then? I think I need to know what is sequence point. I'll google that. As I mentioned, the code is undefined so it could be any value. Is there any flag in gcc that can provide warning to code that relies on undefined behaviours? Found it. -Wsequence-point which is enabled by -Wall. But gcc didn't fart out any warning with -Wall or -Wsequence-point flag :( You found a bug, it does point out the problem with the second example here. Paolo
Re: Weird result for modulus operation
Thanks for the speedy reply. But why this code: int a = 17, b = 16; a = a++ % 16; Huh? Now you got me confused. Since it is an undefined behaviour, gcc is free to whatever it likes. Sure, but if you ask gcc to signal a warning, it is supposed to do so. :-) It is a bug that gcc with -Wsequence-point signals a warning for "a = a++ % 16" but not when you use abc.a. Though the answer given by the first and second examples show inconsistency in gcc in handling the undefined behaviour. That's not a problem. GCC does not have to be consistent. But both should be warned about. I can't forward to gmane.comp.gcc.devel newsgroup with my account. No problem, you can delete it. Paolo
Re: Division using FMAC, reciprocal estimates and Newton-Raphson - eg ia64, rs6000, SSE, ARM MaverickCrunch?
I'd like to implement something similar for MaverickCrunch, using the integer 32-bit MAC functions, but there is no reciprocal estimate function on the MaverickCrunch. I guess a lookup table could be implemented, but how many entries will need to be generated, and how accurate will it have to be IEEE754 compliant (in the swdiv routine)? I think sh does something like that. It is quite a mess, as it has half a dozen ways to implement division. The idea is to use integer arithmetic to compute the right exponent, and the lookup table to estimate the mantissa. I used something like this for square root: 1) shift the entire FP number by 1 to the right (logical right shift) 2) sum 0x2000 so that the exponent is still offset by 64 3) extract the 8 bits from 14 to 22 and look them up in a 256-entry, 32-bit table 4) sum the value (as a 32-bit integer!) with the content of the table 5) perform 2 Newton-Raphson iterations as necessary example, 3.9921875 byte representation = 0x407F8000 shift right = 0x203FC000 sum = 0x403FC000 extract bits = 255 lookup table value = -4194312 = -0x48 adjusted value = 16r3FFFBFF8, which is the square root the table is simply making sure that if the rightmost 14 bits of the mantissa is zero the return value is right. by summing the content of the lookup table, you can of course interpolate between the values. With a 12-bit table (i.e. 16 kilobytes instead of just one) you will only need 1 iteration. The algorithm will have to be adjusted for reciprocal (subtracting the FP number from 16r7F00 or better 16r7EFF should do the trick for the first two steps; and since you don't shift right by one you'll use bits 15-23). Here is a sample program to generate the table. It's written in Smalltalk (sorry :-P), it should not be hard to understand (but remember that indices are 1-based). To double check, the first entries of the table are 1 -32512 -64519 -96026. | a int adj table | table := ##(| table a val estim | table := Array new: 256. 0 to: 255 do: [ :i | a := ByteArray new: 4. "Create number" a intAt: 1 put: (i bitShift: 15). a at: 1 put: 64. val := (a floatAt: 1) reciprocal. "Perform estimation" a intAt: 1 put: (16r7EFF - (a intAt: 1)). estim := a intAt: 1. "Compute delta with actual value and store it" a floatAt: 1 put: val. table at: i + 1 put: ((a intAt: 1) - estim) ]. table). "Here we do the actual calculation. `self' is the number to be reciprocated." a := ByteArray new: 4. a floatAt: 1 put: self. "Perform estimation as above" int := 16r7EFF - (a intAt: 1). "Extract bits 15-23 and access the table." adj := table at: ((a intAt: 1) // 32768 \\ 256) + 1. "Sum the delta and convert from 32-bit integer to float" a intAt: 1 put: (int + adj). ^(a floatAt: 1) Also, where should I be sticking such an instruction / table? Should I put it in the kernel, and trap an invalid instruction? Alternatively, should I put it in libgcc Yes, you could do this. Paolo
Re: [RFC] Adjust output for strings in tree-pretty-print.c
Notice the added final '\0' in the C case; I don't know if it's bad to have it there, but I don't see a way to not output it and still have the correct output for Fortran (whose strings are not NUL-terminated). I think the best thing to do is to have a langhook then. I'm actually not sure that you want all those \0's in the Fortran front-end since the kind can be recovered from the {lb:1 sz:4} that is appended to the string. Endianness issues may also appear. Maybe you should call iconv in the langhook to get back to UTF-8, and print that representation instead. Paolo
Re: [RFC] Adjust output for strings in tree-pretty-print.c
FX wrote: I think the best thing to do is to have a langhook then. It seems a bit weird to have a langhook for a one-character difference, but if there's a consensus on it, I'll go along. To me too, but I still maintain that it's better to print in UTF-8 (which would make the langhook more useful). The recent Unicode patches for C possibly could use the langhook too. Endianness issues may also appear. Maybe you should call iconv in the langhook to get back to UTF-8, and print that representation instead. Endianness is already take care of, in the sense that the string is encoded in the target's endianness already. But for testing you want a standardized endianness. Otherwise some targets will need to scan "I\0\0\0" and others will need to scan "\0\0\0I". However, that makes calling iconv more difficult, because that has us going from target's endianness to UTF-8, which will be a pain. No, you can use UTF-32BE and UTF-32LE encodings. Paolo
Re: Question about building hash values from pointers
it's uintptr_t which should be used, if only as an intermediate cast - (unsigned long)(uintptr_t)ptr. That's not possible because, IIRC, gcc must compile on C90 systems. Right, so the only type remaining is size_t. IIRC there is problem for this type on some targets, too. AFAIC there are 24-bit pointers ... This is the reason, why I was querying to introduce a new general type for such stuff in gcc. size_t is ok I think, but just in case, there is an autoconf macro (used in libgfortran and libdecnumber) that provides int*_t. Paolo
Re: 4.3.0 and 4.3.1 don't build startfiles (crtXXX.o files)
Then, running "make all-target-libgcc" built them, but I finally settled for just "make" - it didn't error out. Yes, the advantage of Paul's suggested process are not only that the installations are reproducible and always use the complete feature set of the underlying libc (that's the big part), but also that "make" just works and you are more shielded from changes in the build system. Paolo
Re: 4.3.0 and 4.3.1 don't build startfiles (crtXXX.o files)
1) Binutils 2) Whatever bits of compiler are required to produce... 3) libc headers 4) A basic C compiler+libgcc that is sufficient to build... 5) libc 6) A full compiler+runtime, c++, fortran, etc. If someone is willing to expand on the above and explain what exactly do I need to do in step 2, in step 3, in step 4, that would be helpful. You already made step 2. Step 3 depends on your C library, I don't know the details for uclibc. Step 4 is the same as step 2, but you will need less --disable-* options. Paolo
Re: Inefficient loop unrolling.
Bingfeng Mei wrote: Steven, I just created a bug report. You should receive a CCed mail now. I can see these issues are solvable at RTL-level, but require lots of efforts. The main optimization in loop unrolling pass, split iv, can reduce dependence chain but not extra ADDs and alias issue. What is the main reason that loop unrolling should belong to RTL level? Is it fundamental? No, it is just effectiveness of the code size expansion heuristics. Ivopts is already complex enough on the tree level, that doing it on RTL would be insane. But other low-level loop optimizations had already been written on the RTL level and since there were no compelling reasons, they were left there. That said, this is a bug -- fwprop should have folded the ADDs, at the very least. I'll look at the PR. Paolo
Re: Is this the expected behavior?
Mohamed Shafi wrote: 2008/7/15 Ramana Radhakrishnan <[EMAIL PROTECTED]>: I agree with you, but what about when there are still caller save register are available and there are no register restrictions for any instructions? In my case i find that GCC has used only the argument registers, stack pointer and callee saved registers. So out of the 16 available registers ony 5+1+4 registers were used, even though there was 6 caller save registers were available Check your REG_ALLOC_ORDER macro ? The order is argument registers, caller save registers and finally the callee save registers. Are there instructions that only work on the callee-save registers? This might confuse regclass (the pass that decides the register class preferences). Paolo
Re: [Ada] multilib first patch
To build and be later able to install more than one version of the ada library we need to change (at least) this assumption in some way and keep more than one library build result around. The best way would be to bite the bullet, and move the RTS for real to libada (instead of having libada as a proxy). The fact that the compiler needs it is not a problem, you just need to make gnat depend on a host libada. However, this is not a small change. Paolo
Re: [Ada] multilib patch take two => multilib build working
I had to solve one rts source issue though: gcc/ada/system-linux-x86_64.ads and x86.ads do hardcode the number of bits for a word (64 and 32 respectively), I changed them both to be completely the same and use GNAT defined Standard'Word_Size attribute. - Word_Size: constant := 64; - Memory_Size : constant := 2 ** 64; + Word_Size: constant := Standard'Word_Size; + Memory_Size : constant := 2 ** Word_Size; The same change will have to be done on other 32/64 bits Ada targets. I don't know if this change has adverse effect on GNAT build in some situations. I think this is worthwhile on its own, before the build patch goes in. The patch is not complete yet of course but I'd appreciate feedback on wether or not I'm on the right track for 4.4 inclusion. It looks good to me, though I'll defer to Arnaud and other adacore people. One nit: +GNATLIBMULTI := $(subst /,,$(MULTISUBDIR)) Please substitute / with _ instead, to avoid unlikely but possible clashes. [EMAIL PROTECTED]:~/tmp$ gnatmake -f -g -aO/home/guerby/build-mlib7/gcc/ada/rts32 -m32 p I guess this fixing this requires duplicating in gnatmake and gnatbind the logic in gcc.c that uses the info produced by genmultilib. Search gcc.c for multilib_raw multilib_matches_raw multilib_extra multilib_exclusions_raw multilib_options set_multilib_dir Maybe it makes sense to make a separate .c module for this, so that both the driver and gnat{make,bind} can use it. I'm not sure how much churn is there in gcc/ada/Makefile.in. If there is little, it probably makes more sense to work on a branch. If there is much, it probably makes more sense to commit the partially working patch. Again, I'll defer to AdaCore people on this. Paolo
Re: [Ada] multilib patch take two => multilib build working
Arnaud Charlet wrote: I had to solve one rts source issue though: gcc/ada/system-linux-x86_64.ads and x86.ads do hardcode the number of bits for a word (64 and 32 respectively), I changed them both to be completely the same and use GNAT defined Standard'Word_Size attribute. - Word_Size: constant := 64; - Memory_Size : constant := 2 ** 64; + Word_Size: constant := Standard'Word_Size; + Memory_Size : constant := 2 ** Word_Size; The same change will have to be done on other 32/64 bits Ada targets. I don't know if this change has adverse effect on GNAT build in some situations. I think this is worthwhile on its own, before the build patch goes in. Not clear to me actually. The idea currently is to make these values explicit so that when people read system.ads, they know right away what the right value is. Also, this points to a real flaw in the approach, since e.g. in case of little/big endian multilibs, similar issue would arise. Yes, if different multilibs should use different sets of sources, we have a problem. On the other hand, this is also something that can be solved by moving the RTS to libada. The standard approach with C multilibs is to rely on configure tests, or on #define constants, to pick the appropriate choice between multilibs, and I don't see why this shouldn't work with Ada. For example, g-soccon* files are generated automatically -- then, why not go one step further and generate a g-soccon file at configure time? It seems that a branch would be more appropriate for this kind of work, since there's a long way to go before getting in reasonable shape. Given the above, I agree. Also, it's not clear how using $(RTS) for building gnattools will work properly. Only target modules are multilibbed, so only one copy of gnattools is built. I assume that the characteristics of the target do not affect the operation of gnattools -- different multilibs may use different filesystem paths and thus behave differently, but the *code* of gnattools should be the same in all cases. [EMAIL PROTECTED]:~/tmp$ gnatmake -f -g -aO/home/guerby/build-mlib7/gcc/ada/rts32 -m32 p There's an existing mechanism in gnatmake which is the use of the --RTS switch, so ideally it would be great to match multilib install into --RTS=xxx compatible dirs, and also have -xxx (e.g. -32 or -64) imply --RTS=xxx. Yes, you would basically take from gcc.c the code that turns "-m32" into "use multilib 32", and use it to make a --RTS option. Paolo
Re: [Ada] multilib patch take two => multilib build working
This will need some additionals tests on MULTILIB in the LIBGNAT_TARGET_PAIRS machinery (3 files for x86 vs x86_64, solaris looks like already done, powerpc seem 32 bits only right now, s390/s390x, others?) but it doesn't seem like a blocking issue with the proposed design since each MULTILIB rts build has completely separate directory and stamp (through RTSDIR) so there is no possibility of conflict through sharing. Do you agree with this assessment? Unfortunately not. The solaris bits are just "factoring" a bit the definition of LIBGNAT_TARGET_PAIRS. The actual multilib definition can be arbitrary, for example you can have big-endian/little-endian multilibs. If the design is explicitly to have constants spelled out in system-* files (instead of having them, for example, in Autoconf macros), there is not much that you can do. Paolo
Re: [Ada] multilib patch take two => multilib build working
Paolo, do you know where to look for the list of multilibs on a given platform in the GCC sources? And if we want to disable some of them for Ada? In the makefile fragments t-*, in places like config/i386/t-linux64 MULTILIB_OPTIONS = m64/m32 MULTILIB_DIRNAMES = 64 32 MULTILIB_OSDIRNAMES = ../lib64 $(if $(wildcard $(shell echo $(SYSTEM_HEADER_DIR))/../../usr/lib32),../lib32,../lib) Problematic (from an Ada point of view) configurations are those like config/arm/t-xscale-elf: MULTILIB_OPTIONS = mbig-endian MULTILIB_DIRNAMES= be MULTILIB_EXCEPTIONS = MULTILIB_MATCHES = mbig-endian=mbe mlittle-endian=mle and 32/64-bit configurations like x86_64-pc-linux-gnu. Paolo
Re: [Ada] multilib patch take two => multilib build working
There is a Standard'Default_Bit_Order so it's the same as Word_Size: we just loose "source" documentation (and gain less diff between target file). Yes, but Arnaud said that system-* constants are written down for a reason. I don't understand *what* is the reason, but that's just because I have no clue (among other things) of what parts of the Ada RTS are installed. For example, are these system-*.ads files installed? If so, is it possible to install different versions for each multilib? If not (question to Arnaud), is this self-documentation meant only for the people reading the source code of GNAT? IOW, is it meant as a quick-and-dirty reference as to what are the characteristics of the target? My impression is that 50% of the constants system-*.ads files are boilerplate (same for every file), 30% is easily derivable from configure.ac tests or from other Standard'xyz constants, and only 20% is actually something that depends on how GNAT works on the target. If it was up to me, I would make the system.ads file automatically generated (except for the latter 20%, which would have to be specified in some way). That would simplify multilibbing, but is moot unless there is a guarantee that this 20% is *totally* derivable from the target triplet, so that no conceivable flag combination can affect it. If this guarantee is not there, any attempt to multilib the Ada RTS is going to be a sore failure. Paolo
Re: [Ada] multilib patch take two => multilib build working
I think you will end up having to support generating different source trees for each multilib variant to be safe and correct. Yes, that comes out naturally if the RTS is built in libada. In fact, Arnaud said: The idea currently is to make these values explicit so that when people read system.ads, they know right away what the right value is. That's "when people read system.ads", not "when people read system-linux-x86.ads". In other words, he's not necessarily against automatically generating system.ads from other means, for example using configure tests. Which, I repeat, comes out naturally if the RTS build is confined in libada. This will work for native builds but may have problems on cross builds where you can't run a program. I know for the RTEMS g-soccon* file we have to run a program on target hardware and capture the output. Do you really need to run programs? Most of gen-soccon can be done by running Ada source code through the C pre-processor and massaging the output. In fact, the code that would be passed through cpp strongly resembles gen-soccon.c itself. If you move the source to libada and start potentially using different source combinations for different multilib variants, then it does need to be on a branch. But some of the patches so far seem like they would be OK to commit on the mainline and minimize diffs. Yes, that's true. Paolo
Re: [Ada] multilib patch take two => multilib build working
Arnaud Charlet wrote: The idea currently is to make these values explicit so that when people read system.ads, they know right away what the right value is. That's "when people read system.ads", not "when people read system-linux-x86.ads". In other words, he's not necessarily against automatically generating system.ads from other means, for example using configure tests. Which, I repeat, comes out naturally if the RTS build is confined in libada. Right, that's one possibility, although people seem to be focusing in system.ads alot, which is actually only the tip of the iceberg, and not a real issue per se. Still it's the biggest problem so far. For example, g-bytswa-x86.adb vs. g-bytswa.adb is also a problem because a -mcpu=i386 multilib should use the latter; however, that's arguably already a GNAT bug for the i386-pc-linux-gnu configuration, so it can be left for later. Good to hear about soccon, though. Paolo
Re: [Ada] multilib patch / -imultilib and language drivers
What triggers the passing of -imultilib to a language driver? It is passed together with -isystem, -iprefix and friends when %I is in a spec. I'm sure you can define a new spec function and use it to pass the multilib_dir variable down to the Ada driver (see default_compilers, I guess you have to read some gcc.c). Once gcc passes this info to gnat1 it will likely be easy to have gnatmake/bind/link extract it when needed since those tools call gcc. I believe you on this. :-) Paolo
Re: [Ada] multilib patch take three
Arnaud Charlet wrote: Yes I volunteer. We're in stage1 so we have some time to sort out reported issues before release. OK. I'm still concerned that there is no simple fallback for all targets that will break, except for --disable-multilib which is too strong since it impacts other languages. I'd be much more confortable with e.g. adding a --disable-mutilibada or some such that would basically fall back to the previous state. I volunteer to check if there is support for --enable-multilib=libstdc++-v3,libjava and if not add it. Unfortunately, --disable-multilib=ada cannot work (because --disable-xxx is the same as --enable-multilib=no). As an alternative, people that don't want multilibbed libada can not use libada altogether. More on this in a second. Only libada/Makefile.in install will be invoked for all multilib by the install machinery so gcc/ada/Makefile.in install cannot do properly the rts install work anymore, hence the change. The relevant libada/Makefile.in parts are now: What about people using disable-multilib or --disable-libada ? I'd rather keep the part in ada/Makefile.in for these cases. Note that Laurent commented out install-gnatlib in ada/Make-lang.in. I agree though that it doesn't hurt to keep those targets, and I think that this hunk should not be included. Well, I still do not understand how install-gnatlib works in the new scheme, I guess I need more detailed explanation, since I am not familiar with multilib scheme. Could you explain in details how make install will install the various gnatlibs in the new scheme ? I guess everything will be more clear after the above: gcc/ada/Makefile.in is not changed, and both gcc/ada/Make-lang.in (once Laurent undoes that hunk) and libada/Makefile.in invoke it. The difference introduced by multilibbing is just a few lines like these: + $(MULTIDO) DO=all multi-do # $(MAKE) + $(MULTIDO) DO=install multi-do # $(MAKE) When multilibbing is disabled, MULTIDO=: and these lines is a no-op. When it is enabled, MULTIDO=$(MAKE) and they cause one recursive make invocation for each multilib (for example libada/32/Makefile). The multilibs differ in the default values of CFLAGS, and they know about this difference via another variable, MULTISUBDIR (which is for example /32 for libada/32/Makefile). This is what Laurent uses to conditionalize the PAIRS in gcc/ada/Makefile.in. Review of the patch follows: - -if [ -f gnat1$(exeext) ] ; \ - then \ - $(MAKE) $(FLAGS_TO_PASS) $(ADA_FLAGS_TO_PASS) install-gnatlib; \ - fi +# -if [ -f gnat1$(exeext) ] ; \ +# then \ +#$(MAKE) $(FLAGS_TO_PASS) $(ADA_FLAGS_TO_PASS) install-gnatlib; \ +# fi -install-gnatlib: - $(MAKE) -C ada $(FLAGS_TO_PASS) $(ADA_FLAGS_TO_PASS) install-gnatlib$(LIBGNAT_TARGET) +#install-gnatlib: +# $(MAKE) -C ada $(FLAGS_TO_PASS) $(ADA_FLAGS_TO_PASS) install-gnatlib$(LIBGNAT_TARGET) +# +#install-gnatlib-obj: +# $(MAKE) -C ada $(FLAGS_TO_PASS) $(ADA_FLAGS_TO_PASS) install-gnatlib-obj -install-gnatlib-obj: - $(MAKE) -C ada $(FLAGS_TO_PASS) $(ADA_FLAGS_TO_PASS) install-gnatlib-obj - ada.install-man: Revert as asked by Arnaud. +ifeq ($(strip $(filter-out %x86_64 linux%,$(arch) $(osys))),) + ifeq ($(strip $(MULTISUBDIR)),/32) +arch:=i686 + endif +endif Just $(filter-out %x86_64, $(arch)). No need to check for linux too, the /32 multilib name is pretty common. The same should be enough for both powerpc64 and sparc64. Index: libada/configure Don't include regenerated files in the patch, only in the ChangeLog. :-) +# GNU Make needs to see an explicit $(MAKE) variable in the command it +# runs to enable its job server during parallel builds. Hence the +# comments below. + +all-multi: + $(MULTIDO) $(AM_MAKEFLAGS) DO=all multi-do # $(MAKE) +install-multi: + $(MULTIDO) $(AM_MAKEFLAGS) DO=install multi-do # $(MAKE) + +.PHONY: all-multi install-multi + + +mostlyclean-multi: + $(MULTICLEAN) $(AM_MAKEFLAGS) DO=mostlyclean multi-clean # $(MAKE) +clean-multi: + $(MULTICLEAN) $(AM_MAKEFLAGS) DO=clean multi-clean # $(MAKE) +distclean-multi: + $(MULTICLEAN) $(AM_MAKEFLAGS) DO=distclean multi-clean # $(MAKE) +maintainer-clean-multi: + $(MULTICLEAN) $(AM_MAKEFLAGS) DO=maintainer-clean multi-clean # $(MAKE) + +.PHONY: mostlyclean-multi clean-multi distclean-multi maintainer-clean-multi + +install-exec-am: install-multi +## No uninstall rule? + + +## These cleaning rules are recursive. They should not be +## registered as dependencies of *-am rules. For instance +## otherwise running `make clean' would cause both +## clean-multi and mostlyclean-multi to be run, while only +## clean-multi is really expected (since clean-multi recursively +## call clean, it already do the job of mostlyclean). +mostlyclean: mostlyclean-multi +clean: clean-multi +distclean: distclean-multi +maintainer-clean: maintainer-clean-multi Not needed. +
Re: [Ada] multilib patch take three
I volunteer to check if there is support for --enable-multilib=libstdc++-v3,libjava and if not add it. Unfortunately, --disable-multilib=ada cannot work (because --disable-xxx is the same as --enable-multilib=no). Does that mean that libgcc is implicitely multilibed if --enable-multilib= is used ? No, I just meant "adding a parameter to --enable-multilib to specify what to multilib". As an alternative, people that don't want multilibbed libada can not use libada altogether. More on this in a second. Still not clear to me what you mean here. I was thinking about using --disable-libada and instead using the "make -C gcc gnatlib" target. You will get no multilibs but I'm not up-to-date as to how you build the tools without libada nowadays. Paolo
Re: [Ada] multilib patch take three
Laurent GUERBY wrote: On Fri, 2008-07-25 at 10:55 +, Joseph S. Myers wrote: i686-linux-gnu --enable-targets=all and x86_64-linux-gnu are equivalent, differing only in whether the default is 32-bit or 64-bit. Do you select the right files for both multilibs of i686-linux-gnu, as well as for both multilibs of x86_64-linux-gnu? (Some targets have the 32-bit-default as the only or the normal target, e.g. Solaris and Darwin.) I didn't know about --enable-targets=all so I'd say this case is not handled by the current pre-patch on i686 but I will work on adding support for it. I think you just need to check for a /64 multilib and change the arch accordingly. Paolo
Re: lto gimple types and debug info
Mark Mitchell wrote: For that matter, "print sizeof(X)" should print the same value when debugging optimized code as when debugging unoptimized code, even if the compiler has optimized X away to an empty structure! I disagree. sizeof(X) in the code will return a value as small as possible in that case (so that malloc-ing an array of structures) does not waste memory, and the debugger should do the same. Paolo
Re: lto gimple types and debug info
For that matter, "print sizeof(X)" should print the same value when debugging optimized code as when debugging unoptimized code, even if the compiler has optimized X away to an empty structure! I disagree. sizeof(X) in the code will return a value as small as possible in that case (so that malloc-ing an array of structures) does not waste memory, and the debugger should do the same. I don't think that's a viable option. The value of sizeof(X) is a compile-time constant, specified by a combination of ISO C and platform ABI rules. In C++, sizeof(X) can even be used as a (constant) template parameter, way before we get to any optimization. Then you are right. This adds another constraint... Paolo
Support for OpenVMS host supports
I stumbled in this while looking at how x-* host files are used. There are two files in this configuration that "must be compiled with DEC C". One is vcrt0.o, which has about 5 lines of executable code. This makes me think that it would be best if someone with access to the OS would compile it, so that we can put assembly-language source code for it. The other is pcrt0.o, which AFAICT had a syntax error since its inception: 48001 kenner __main (arg1, arg2, arg3, image_file_desc, arg5, arg6) 48001 kenner void *arg1, *arg2, *arg3; 48001 kenner void *image_file_desc; 48001 kenner void *arg5, *arg6) So, the question is: do we care about this target? Maybe AdaCore has patches to fix it? Paolo
Re: GCC 4.3.2 Status Report (2008-07-31)
Priority # Change from Last Report --- --- P13 - 5 P2 115 - 2 P32 - 1 --- --- Total 120 - 8 PR35752, which is a P2 regression caused by libtool, is waiting for approval upstream. Should we make an exception to the usual rules and apply the fix on the branch? Paolo
Re: GCC 4.3.2 Status Report (2008-07-31)
As we are approaching the intended release date of 4.3.2 we need to address the P1 bugs or downgrade them accordingly. Two of the P1s have patches posted (more than 3 resp. 2 weeks ago), so they just need reviewing. For the record, these are: http://gcc.gnu.org/ml/gcc-patches/2008-07/msg00722.html reload.c (CCing Ulrich Weigand) http://gcc.gnu.org/ml/gcc-patches/2008-06/msg01305.html dwarf2out.c (CCing Jason Merrill) Paolo
Re: GCC 4.3.2 Status Report (2008-07-31)
Ralf Wildenhues wrote: Hi Paolo, * Paolo Bonzini wrote on Thu, Jul 31, 2008 at 02:53:21PM CEST: PR35752, which is a P2 regression caused by libtool, is waiting for approval upstream. Should we make an exception to the usual rules and apply the fix on the branch? If by exception to the usual rule, you mean that you would like to apply the patch to GCC before it's accepted in Libtool And only on the branch. Tomorrow it's a public holiday here, so I wouldn't apply the patch before monday anyway. Paolo
Re: configuring in-tree gmp/mpfr with "none"?
Jay wrote: Andrew, Can you explain more why? Because at some point, no released version worked on intel macs. And then gmp/configure runs flex. And then sometimes?always flex tries to run getenv("M4") || "m4". Yes, Flex uses m4. gmp/configure probably should not be setting M4 Yes, I think that setting M4=m4-not-needed should be done only for debugging purposes. Otherwise, GMP should always look for m4 in its configure script, and set it to a valid value in the makefile. Paolo
Re: GNAT build failure on cross
Arnaud Charlet wrote: Any suggestions? I would double check that you are indeed using the freshly built corresponding native. Maybe your native installation didn't work as expected and your building from an older compiler. That's the most likely explanation. Alternatively, there have been changes recently in the libada and ada Makefiles by Paolo Bonzini that might be related. I agree with Arnaud that the most likely explanation is not-recent-enough native tools. But you can try reverting to r138299 to see if my patches are at fault. Paolo
Re: configuring in-tree gmp/mpfr with "none"?
Jay wrote: Because at some point, no released version worked on intel macs. Long since passed and can be removed? I don't think so, http://gmp.darwinports.com/ shows that it is still a problem with 4.2.2. Besides, GMP's authors say that it is often a stress test for compilers, so using more C and less assembly can be a good thing (GCC's usage of GMP does not include manipulating really really huge numbers). gmp/configure is where the blame really lies, but if gcc configured gmp "normally", this wouldn't occur. Or, is cpu=none not so abnormal? Just that I hadn't seen it? It's a GMP-only thing. Given that this is a problem because of Python's apparently broken handling of signals, we cannot do anything about it. Complain with the Python maintainers that they should reset the signals they ignore, before exec-ing another program. Paolo
Re: Update libtool?
That said, updating in trunk is a different matter. There, the question IMHO is mostly which libtool version to update to. The git version may still have a regression or two, but 2.2.4 doesn't have the -fPIC on HP/IA patch from Steve (which would be trivial to backport of course). Alternatively GCC can wait for 2.2.6 (hopefully in the "couple of weeks at most" time frame). Updating to 2.2.6 would be okay for me. Paolo
Re: Richard Sandiford appointed RTL maintainer
On 06/28/2011 03:52 PM, Vladimir Makarov wrote: They are complicated, solving NP-problems in heuristic ways. I totally trust that people like Eric or Richard would _not_ approve changes to those heuristics without contacting you or others. On the other hand, I would totally trust them to approve expander patches, but they were said historically to fall outside the realm of RTL maintainership. :) Paolo
Re: GSOC - Student Roundup
On 07/05/2011 06:58 PM, Dimitrios Apostolou wrote: The level of my understanding of this part is still basic, I've now only scratched the surface of Dataflow Analysis. Well you're not looking at df proper, which is mostly a textbook implementation with some quirks; you're looking at RTL operand scanning, which should indeed have a big "here be dragons" sign on it. But you're doing fine. :) Paolo
Re: A visualization of GCC's passes, as a subway map
On 07/11/2011 07:56 PM, David Malcolm wrote: Hope this is fun/helpful (and that I'm correctly interpreting the data!) You are, and it shows some bugs even. gimple_lcx is obviously destroyed by expand, and I find it unlikely that no pass ever introduces a critical edge... Paolo
Re: C++ bootstrap of GCC - still useful ?
On 07/12/2011 08:54 AM, Arnaud Charlet wrote: >> I'm not sure because I don't think we want to compile the C files of the Ada >> > runtime with the C++ compiler. We want to do that only for the compiler. > > Right, we definitely don't want to use the C++ compiler for building the > Ada run-time. But apparently they already are (when building the compiler), otherwise the patch in http://gcc.gnu.org/ml/gcc/2009-06/txt4.txt would make no sense: Index: gcc/ada/env.c === --- gcc/ada/env.c (revision 148953) +++ gcc/ada/env.c (working copy) @@ -29,6 +29,11 @@ * * / + +#ifdef __cplusplus +extern "C" { +#endif + #ifdef IN_RTS #include "tconfig.h" #include "tsystem.h" @@ -313,3 +318,7 @@ clearenv (); #endif } + +#ifdef __cplusplus +} +#endif Perhaps it is better to always build those files with cc, perhaps not. Since there are two versions of the Ada RTL, the one in the compiler and the one in libada, my questions are: 1) Do they share any object files when not cross-compiling? 2) If not, is using C++ for the former okay? If the answers are "no" and "yes" respectively, I still think a patch like the one I suggested, where the host files in gcc/ are uniformly compiled with C++, is preferrable. You do need to force usage of a C compiler when compiling libada: Index: Makefile.in === --- Makefile.in (revision 169877) +++ Makefile.in (working copy) @@ -2451,6 +2451,7 @@ gnatlib: ../stamp-gnatlib1-$(RTSDIR) ../ $(MAKE) -C $(RTSDIR) \ CC="`echo \"$(GCC_FOR_TARGET)\" \ | sed -e 's,\./xgcc,../../xgcc,' -e 's,-B\./,-B../../,'`" \ + ENABLE_BUILD_WITH_CXX=no \ INCLUDES="$(INCLUDES_FOR_SUBDIR) -I./../.." \ CFLAGS="$(GNATLIBCFLAGS_FOR_C)" \ FORCE_DEBUG_ADAFLAGS="$(FORCE_DEBUG_ADAFLAGS)" \ @@ -2459,6 +2460,7 @@ gnatlib: ../stamp-gnatlib1-$(RTSDIR) ../ $(MAKE) -C $(RTSDIR) \ CC="`echo \"$(GCC_FOR_TARGET)\" \ | sed -e 's,\./xgcc,../../xgcc,' -e 's,-B\./,-B../../,'`" \ + ENABLE_BUILD_WITH_CXX=no \ ADA_INCLUDES="" \ CFLAGS="$(GNATLIBCFLAGS)" \ ADAFLAGS="$(GNATLIBFLAGS)" \ And of course extern "C" needs to be added to the headers, so that public symbols used by compiled Ada source are not mangled. However, static and private symbols need not be extern "C". Paolo
Re: C++ bootstrap of GCC - still useful ?
On 07/12/2011 10:00 AM, Eric Botcazou wrote: But your patch isn't necessary to do that, the C files are already compiled with the C++ compiler as of today; the only issue is at the linking stage. The problem is that the patches links gnattools unconditionally with g++. It should depend on --enable-build-with-cxx instead. Paolo
Re: A visualization of GCC's passes, as a subway map
On 07/12/2011 10:43 AM, Paulo J. Matos wrote: Hope this is fun/helpful (and that I'm correctly interpreting the data!) You are, and it shows some bugs even. gimple_lcx is obviously destroyed by expand, and I find it unlikely that no pass ever introduces a critical edge... But the diagram shows gimple_lcx stopping at expand but continuing its lifetime through RTL passes (so gimple_lcx according to the diagram is _not_ destroyed by expand). So, I am left wondering if the bug is in the diagram or GCC. It shows bugs in GCC's pass description, to be clear. Paolo
Re: A visualization of GCC's passes, as a subway map
On 07/12/2011 06:07 PM, David Malcolm wrote: On this build of GCC (standard Fedora 15 gcc package of 4.6.0), the relevant part of cfgexpand.c looks like this: struct rtl_opt_pass pass_expand = { { RTL_PASS, "expand", /* name */ [...snip...] PROP_ssa | PROP_gimple_leh | PROP_cfg | PROP_gimple_lcx, /* properties_required */ PROP_rtl, /* properties_provided */ PROP_ssa | PROP_trees, /* properties_destroyed */ [...snip...] } and gcc/tree-pass.h has: #define PROP_trees \ (PROP_gimple_any | PROP_gimple_lcf | PROP_gimple_leh | PROP_gimple_lomp) and that matches up with both the diagram, and the entry for "expand" in the table below [1]. So it seems that the diagram is correctly accessing the "properties_destroyed" data for the "expand" pass; does PROP_gimple_lcx need to be added somewhere? (or should the diagram we taught to specialcase some things, perhaps?) Yes, PROP_gimple_lcx needs to be added to PROP_trees. I cannot approve the patch, unfortunately. Also, several passes are likely lacking PROP_crited in their properties_destroyed. At least all those that can be followed by TODO_cleanup_cfg: * pass_split_functions * pass_call_cdcen * pass_build_cfg * pass_cleanup_eh * pass_if_conversion * pass_ipa_inline * pass_early_inline * pass_fixup_cfg * pass_cse_sincos * pass_predcom * pass_lim * pass_loop_prefetch * pass_vectorize * pass_iv_canon * pass_tree_unswitch * pass_vrp * pass_sra_early * pass_sra * pass_early_ipa_sra * pass_ccp * pass_fold_builtins * pass_copy_prop * pass_dce * pass_dce_loop * pass_cd_dce * pass_dominator * pass_phi_only_cprop * pass_forwprop * pass_tree_ifcombine * pass_scev_cprop * pass_parallelize_loops * pass_ch * pass_cselim * pass_pre * pass_fre * pass_tail_recursion * pass_tail_calls Paolo
Re: A visualization of GCC's passes, as a subway map
On 07/13/2011 12:54 PM, Richard Guenther wrote: > Yes, PROP_gimple_lcx needs to be added to PROP_trees. I cannot approve the > patch, unfortunately. Hm, why? complex operations are lowered after a complex lowering pass has executed. they are still lowered on RTL, so I don't see why we need to destroy them technically. Because it's PROP_*gimple*_lcx. :) Paolo
Re: A visualization of GCC's passes, as a subway map
On 07/14/2011 11:11 AM, Richard Guenther wrote: >> Hm, why? complex operations are lowered after a complex lowering pass >> has executed. they are still lowered on RTL, so I don't see why we need >> to destroy them technically. > > Because it's PROP_*gimple*_lcx.:) Shouldn't it then be PROP_*gimple* instead of PROP_*trees*?;) Heh, you have a point! Paolo
Re: [RFC] Remove -freorder-blocks-and-partition
On 07/25/2011 06:42 AM, Xinliang David Li wrote: FYI the performance impact of this option with SPEC06 (built with google_46 compiler and measured on a core2 box). The base line number is FDO, and ref number is FDO + reorder_with_partitioning. xalancbmk improves> 3.5% perlbench improves> 1.5% dealII and bzip2 degrades about 1.4%. Note the partitioning scheme is not tuned at all -- there is not even a tunable parameter to play with. Did you check what is pushed down to the cold section in these cases? Paolo
Re: [RFC] Remove -freorder-blocks-and-partition
On 07/27/2011 06:51 AM, Xinliang David Li wrote: > If we could retain most of the speedups when the optimization works well > but avoid most of the slowdown in the benchmarks that are currently hurt, > we could improve the overall SPEC06 score. And hopefully, this would > also be beneficial to other code. Agree. There are certainly problems in the partition pass, as for bzip2 the icache misses actually go up with partition, which is not expected. It needs further analysis. It's probably too aggressive. Icache misses go up because a) the overall size of the executable grows; b) cold parts are probably not cold enough in the case of bzip2.f Paolo
Re: RFC: PATCH: Require and use int64 for x86 options
On 07/27/2011 06:42 PM, H.J. Lu wrote: + if (max == 64) + var_mask_1[var] = "1LL" This must be ((HOST_WIDE_INT)1). Paolo
Re: libgcc: strange optimization
On 08/04/2011 01:10 PM, Andrew Haley wrote: >> It's the sort of thing that gets done in threaded interpreters, >> where you really need to keep a few pointers in registers and >> the interpreter itself is a very long function. gcc has always >> done a dreadful job of register allocation in such cases. > > Sure, but what I have seen people use global register variables > for this (which means they get taken away from the register allocator). Not always though, and the x86 has so few registers that using a global register variable is very problematic. I suppose you could compile the threaded interpreter in a file of its own, but I'm not sure that has quite the same semantics as local register variables. Indeed, local register variables give almost the same benefit as globals with half the burden. The idea is that you don't care about the exact register that holds the contents but, by specifying a callee-save register, GCC will use those instead of memory across calls. This reduces _a lot_ the number of spills. The problem is that people who care about this stuff very much don't always read...@gcc.gnu.org so won't be heard. But in their own world (LISP, Forth) nice features like register variables and labels as values have led to gcc being the preferred compiler for this kind of work. /me raises hands. For GNU Smalltalk, using #if defined(__i386__) # define __DECL_REG1 __asm("%esi") # define __DECL_REG2 __asm("%edi") # define __DECL_REG3 /* no more caller-save regs if PIC is in use! */ #endif #if defined(__x86_64__) # define __DECL_REG1 __asm("%r12") # define __DECL_REG2 __asm("%r13") # define __DECL_REG3 __asm("%rbx") #endif ... register unsigned char *ip __DECL_REG1; register OOP * sp __DECL_REG2; register intptr_t arg __DECL_REG3; improves performance by up to 20% if I remember correctly. I can benchmark it if desired. It does not come for free, in some cases the register allocator does some stupid things due to the hard register declaration. But it gets much better code overall, so who cares about the microoptimization. Of course, if the register allocator did the right thing, or if I could use simply unsigned char *ip __attribute__(__do_not_spill_me__(20))); OOP *sp __attribute__(__do_not_spill_me__(10))); intptr_t arg __attrbite__(__do_not_spill_me__(0))); that would be just fine. Paolo
Re: libgcc: strange optimization
On 08/08/2011 10:06 AM, Richard Guenther wrote: Like if register unsigned char *ip; would increase spill cost of ip compared to unsigned char *ip; ? Remember we're talking about a function with 11000 pseudos and 4000 allocnos (not to mention a 1500 basic blocks). You cannot really blame IRA for not doing the right thing. And actually, ip and sp are live everywhere, so there's no hope of reserving a register for them, especially since all x86 callee-save registers have special uses in string functions. If I understand the huge dumps correctly, the missing part is trying to use callee-save registers for spilling, rather than memory. However, perhaps another way to do it is a specialized region management scheme for large switch statements, treating each switch arm as a separate region?? There are few registers live across the switch, and all of them are used either "a lot" or "almost never" (and always in cold blocks). BTW, here are some measurements on x86-64: 1) with regalloc hints: 450060432 bytecodes/sec; 12819996 calls/sec 2) without regalloc hints: 263002439 bytecodes/sec; 9458816 sends/sec Probably even worse on x86-32. None of -fira-region=all, -fira-region=one, -fira-algorithm=priority had significant changes. In fact, it's pretty much a "binary" result: I'd expect register allocation results to be either on par with (1) or similar to (2); everything else is mostly noise. Paolo
Re: Just what are rtx costs?
On 08/17/2011 07:52 AM, Richard Sandiford wrote: cost = rtx_cost (SET_SRC (set), SET, speed); return cost> 0 ? cost : COSTS_N_INSNS (1); This ignores SET_DEST (the problem I'm trying to fix). It also means that constants that are slightly more expensive than a register -- somewhere in the range [0, COSTS_N_INSNS (1)] -- end up seeming cheaper than registers. This can be fixed by doing return cost >= COSTS_N_INSNS (1) ? cost : COSTS_N_INSNS (1); One approach I'm trying is to make sure that every target that doesn't explicitly handle SET does nothing with it. (Targets that do handle SET remain unchanged.) Then, if we see a SET whose SET_SRC is a register, constant, memory or subreg, we give it cost: COSTS_N_INSNS (1) + rtx_cost (SET_DEST (x), SET, speed) + rtx_cost (SET_SRC (x), SET, speed) as now. In other cases we give it a cost of: rtx_cost (SET_DEST (x), SET, speed) + rtx_cost (SET_SRC (x), SET, speed) But that hardly seems clean either. Perhaps we should instead make the SET_SRC always include the cost of the SET, even for registers, constants and the like. Thoughts? Similarly, this becomes dest_cost = rtx_cost (SET_DEST (x), SET, speed); src_cost = MAX (rtx_cost (SET_SRC (x), SET, speed), COSTS_N_INSNS (1)); return dest_cost + src_cost; How does this look? Paolo
should sync builtins be full optimization barriers?
Hi all, sync builtins are described in the documentations as being full memory barriers, with the possible exception of __sync_lock_test_and_set. However, GCC is not enforcing the fact that they are also full _optimization_ barriers. The RTL produced by builtins does not in general include a memory optimization barrier such as a set of (mem/v:BLK (scratch:P)). This can cause problems with lock-free algorithms, for example this: http://libdispatch.macosforge.org/trac/ticket/35 This can be solved either in generic code, by wrapping sync builtins (before and after) with an asm("":::"memory"), or in the single machine descriptions by adding a memory barrier in parallel to the locked instructions or with the ll/sc instructions. Is the above analysis correct? Or should the users put explicit compiler barriers? Paolo
Re: should sync builtins be full optimization barriers?
On 09/09/2011 10:17 AM, Jakub Jelinek wrote: > Is the above analysis correct? Or should the users put explicit > compiler barriers? I'd say they should be optimization barriers too (and at the tree level they I think work that way, being represented as function calls), so if they don't act as memory barriers in RTL, the *.md patterns should be fixed. The only exception should be IMHO the __SYNC_MEM_RELAXED variants - if the CPU can reorder memory accesses across them at will, why shouldn't the compiler be able to do the same as well? Agreed, so we have a bug in all released versions of GCC. :( Paolo
Re: should sync builtins be full optimization barriers?
On 09/09/2011 04:22 PM, Andrew MacLeod wrote: Yeah, some of this is part of the ongoing C++0x work... the memory model parameter is going to allow certain types of code movement in optimizers based on whether its an acquire operation, a release operation, neither, or both.It is ongoing and hopefully we will eventually have proper consistency. The older __sync builtins are eventually going to invoke the new__sync_mem routines and their new patterns, but will fall back to the old ones if new patterns aren't specified. In the case of your program, this would in fact be a valid transformation I believe... __sync_lock_test_and_set is documented to only have ACQUIRE semantics. Yes, that's true. However, there's nothing special in the compiler to handle __sync_lock_test_and_set differently (optimization-wise) from say __sync_fetch_and_add. I don't see anything in this pattern however that would enforce acquire mode and prevent the reverse operation.. moving something from after to before it... so there may be a bug there anyway. Yes. And I suspect most people actually expect all the old __sync routines to be full optimization barriers all the time... maybe we should consider just doing that... That would be very nice. I would like to introduce that kind of data structure in QEMU, too. :) Paolo
Re: should sync builtins be full optimization barriers?
On Sat, Sep 10, 2011 at 03:09, Geert Bosch wrote: > For example, for atomic objects accessed only from a single processor > (but possibly multiple threads), you'd not want the compiler to reorder > memory accesses to global variables across the atomic operations, but > you wouldn't have to emit the expensive fences. I am not 100% sure, but I tend to disagree. The original bug report can be represented as node->next = NULL [relaxed]; xchg(tail, node) [seq_cst]; and the problem was that the two operations were swapped. But that's not a problem with the first access, but rather with the second. So it should be fine if the [relaxed] access does not include a barrier, because it relies on the [seq_cst] access providing it later. Paolo
Re: should sync builtins be full optimization barriers?
On 09/11/2011 04:12 PM, Andrew MacLeod wrote: tail->value = othervalue // global variable write atomic_exchange (&var, tail) // acquire operation although the optimizer moving the store of tail->value to AFTER the exchange seems very wrong on the surface, it's really emulating what another thread could possibly see.When another thread synchronizes and reads 'var', an acquire operation doesn't cause outstanding stores to be fully flushed, so the other process has no guarantee that the store to tail->value has happened yet even though it gets the expected value of 'var'. You're right that using lock_test_and_set as an exchange is very wrong because of the compiler barrier semantics, but I think this is entirely a red herring in this case. The same problem could happen with a fetch_and_add or even a lock_release operation. Paolo
Re: should sync builtins be full optimization barriers?
On 09/11/2011 09:00 PM, Geert Bosch wrote: So, if I understand correctly, then operations using relaxed memory order will still need fences, but indeed do not require any optimization barrier. For memory_order_seq_cst we'll need a full barrier, and for the others there is a partial barrier. If you do not need an optimization barrier, you do not need a processor barrier either, and vice versa. Optimizations are just another factor that can lead to reordered loads and stores. Paolo
Re: should sync builtins be full optimization barriers?
On 09/12/2011 01:22 AM, Andrew MacLeod wrote: You're right that using lock_test_and_set as an exchange is very wrong because of the compiler barrier semantics, but I think this is entirely a red herring in this case. The same problem could happen with a fetch_and_add or even a lock_release operation. My point is that if even once we get the right barriers in place, due to its definition as acquire, this testcase could actually still fail, AND the optimization is valid... Ah, sure. unless we decide to retroactively make all the original sync routine set_cst. I've certainly seen code using lock_test_and_set to avoid asm for xchg. That would be very much against the documentation with respect to the values of the second parameter, and that's also why clang introduced __sync_swap. However, perhaps it makes sense to make lock_test_and_set provide sequential consistency. Probably not much so for lock_release, which is quite clearly a store-release. Paolo
Re: should sync builtins be full optimization barriers?
On Mon, Sep 12, 2011 at 20:40, Geert Bosch wrote: > Assuming that statement is true, that would imply that even for relaxed > ordering there has to be an optimization barrier. Clearly fences need to be > used for any atomic accesses, including those with relaxed memory order. > > Consider 4 threads and an atomic int x: > > thread 1 thread 2 thread 3 thread 4 > > x=1; r1=x x=3; r3=x; > x=2; r2=x x=4; r4=x; > > Even with relaxed memory ordering, all modifications to x have to occur in > some particular total order, called the modification order of x. > > So, if r1==2,r2==3 and r3==4,r4==1, that would be an error. However, > without fences, this can easily happen on an SMP machine, even one with > a nice memory model such as the x86. How? (Honest question). All stores are to the same location. I don't see how that can happen without processor fences, much less without optimization fences. Paolo
Re: should sync builtins be full optimization barriers?
On Tue, Sep 13, 2011 at 03:52, Geert Bosch wrote: > No, it is possible, and actually likely. Basically, the issue is write > buffers. > The coherency mechanisms come into play at a lower level in the > hierarchy (typically at the last-level cache), which is why we need fences > to start with to implement things like spin locks. You need fences on x86 to implement Petterson or Dekkar spin locks but only because they involve write-read ordering to different memory locations (I'm mentioning those spin lock algorithms because they do not require locked memory accesses). Write-write, read-read and for the same location write-read ordering are guaranteed by the processor. Same for coherency which is a looser property. However, accesses in the those spin lock algorithm are definitely _not_ relaxed; not all of them, at least. > No that's false. Even on systems with nice memory models, such as x86 > and SPARC with a TSO model, you need a fence to avoid that a write-load > of the same location is forced to make it all the way to coherent memory > and not forwarded directly from the write buffer or L1 cache. Not sure about SPARC, but this is definitely false on x86. Granted, even if you do not have to put fences those writes are likely _not_ free. The processor needs to do more than say on PPC, so I wouldn't be surprised if conflicting memory accesses are quite more expensive on x86 than PPC. Recently, a colleague of mine tried replacing optimization barriers with full barriers in one of two threads implementing a ring buffer; that thread was now 30% slower, but the other thread sped up by basically the same time. Paolo
Re: should sync builtins be full optimization barriers?
On 09/15/2011 06:19 PM, Richard Henderson wrote: I wouldn't go that far. They *used* to be compiler barriers, but clearly something broke at some point without anyone noticing. We don't know how many versions are affected until we debug it. For all we know it broke in 4.5 and 4.4 is fine. 4.4 is not necessarily fine, it may also be that an unrelated 4.5 change exposed a latent bug. But indeed Richard Sandiford mentioned offlist that perhaps ALIAS_SET_MEMORY_BARRIER machinery broke. Fixing the bug in 4.5/4.6/4.7 will definitely shed more light. There's no reference to a GCC bug report about this in the thread. Did the folks over at the libdispatch project never think to file one? I asked them to attach a preprocessed testcase somewhere, but they haven't done so yet. :( Paolo
Re: should sync builtins be full optimization barriers?
On 09/15/2011 06:26 PM, Paolo Bonzini wrote: There's no reference to a GCC bug report about this in the thread. Did the folks over at the libdispatch project never think to file one? I asked them to attach a preprocessed testcase somewhere, but they haven't done so yet. :( They now attached it, and the bug turns out to be a missing parenthesis in an #ifdef. This made libdispatch compile the xchg as an asm rather than a sync builtin. And of course the asm was wrong. Apparently, Apple people on the mailing list were looking at the Apple trunk, but the reporter was obviously compiling from the public trunk. Paolo
Re: Question on cse_not_expected in explow.c:memory_address_addr_space()
On 09/28/2011 02:14 PM, Georg-Johann Lay wrote: This leads to unpleasant code. The machine can access all RAM locations by direct addressing. However, the resulting code is: foo: ldi r24,lo8(-86) ; 6 *movqi/2[length = 1] ldi r30,lo8(-64) ; 34 *movhi/5[length = 2] ldi r31,lo8(10) std Z+3,r24 ; 7 *movqi/3[length = 1] .L2: lds r24,2754 ; 10 *movqi/4[length = 2] sbrs r24,7 ; 43 *sbrx_branchhi [length = 2] rjmp .L2 ldi r24,lo8(-69) ; 16 *movqi/2[length = 1] ldi r30,lo8(-64) ; 33 *movhi/5[length = 2] ldi r31,lo8(10) std Z+3,r24 ; 17 *movqi/3[length = 1] .L3: lds r24,2754 ; 20 *movqi/4[length = 2] sbrs r24,7 ; 42 *sbrx_branchhi [length = 2] rjmp .L3 ret ; 39 return [length = 1] Insn 34 loads 2752 (0xAC0) to r30/r31 (Z) and does an indirect access (*(Z+3), i.e. *2755) in insn 7. The same happens in insn 33 (load 2752) and access (insn 17). Is there a way to avoid this? I tried -f[no-]rerun-cse-after-loop but without effect, same for -Os/-O2 and trying to patch rtx_costs. cse_not_expected is overridden in some places in the middle-end. fwprop should take care of propagating the address. Have you tried patching address_costs? Might be as simple as this (untested): Index: avr.c === --- avr.c (revision 177688) +++ avr.c (working copy) @@ -5986,8 +5986,8 @@ avr_address_cost (rtx x, bool speed ATTR return 18; if (CONSTANT_ADDRESS_P (x)) { - if (optimize > 0 && io_address_operand (x, QImode)) - return 2; + if (optimize > 0) + return io_address_operand (x, QImode) ? 2 : 3; return 4; } return 4; Paolo
Re: IRA changes rules of the game
On 10/20/2011 07:46 PM, Paulo J. Matos wrote: However, it failed to compile libgcc with: ../../../../../../../devHost/gcc46/gcc/libgcc/../gcc/libgcc2.c:272:1: internal compiler error: in df_uses_record, at df-scan.c:3178 This feels like a GCC bug. I will try to get a better look at it tomorrow. What's the SET it is failing on? Paolo
Re: libgcc: why emutls.c in LIB2ADDEH instead of LIB2ADD?
On 11/21/2011 01:54 AM, Richard Henderson wrote: > Emulating TLS has nothing to do with exception-handling, nor is > there something that might throw while calling one of its > functions. > > Ok to fix that? Not without further study. There was a reason we wanted these in libgcc_eh.a. I can't recall exactly why at the moment; it should be in the archives... Nope, the first version at http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00903.html already had it in LIB2ADDEH*. Perhaps Jakub has some ideas too. H-P, can you try bootstrapping your patch on cygwin and/or mingw too before applying it? Paolo
Re: [PATCH 1/3] colorize: use isatty module
On 01/03/2012 09:48 AM, Jim Meyering wrote: Paolo Bonzini wrote: * bootstrap.conf: Add isatty module. * gnulib: Update to latest. * lib/colorize.h: Remove argument from should_colorize. * lib/ms/colorize.h: Likewise. * lib/colorize-impl.c: Factor isatty call out of here... * lib/ms/colorize-impl.c: ... and here... * src/main.c: ... into here. Hi Paolo, At least with gcc-4.7.0 20120102, a warning-enabled build now fails like this: colorize.c: In function 'init_colorize': colorize.c:37:6: error: function might be candidate for attribute 'const' [-Werror=suggest-attribute=const] cc1: all warnings being treated as errors Thanks, my GCC is indeed older. Perhaps GCC should be changed to avoid the warning on functions returning void. If a void function can be const, it pretty much has to be empty, and so it is quite likely a placeholder for something that is not const. Paolo
Re: Renaming Stage 1 and Stage 3
Il 11/06/2012 11:18, Richard Guenther ha scritto: > > Instead of renaming Stage 3 to Stage 2 at that point we figured that > > using different terminology would reduce confusion. I am not wedded > > to Stage A and B, though this seems to be the most straightforward > > option (over colors, Alpha and Beta carrying a different meaning in > > software development,...). > > > Eh - why not give them names with an actual meaning? "Development Stage" > and "Stabilizing Stage"? I realize those are rather long names, but you > can always put short forms in tables, like Dev Stage and Stab Stage. Or just "Development" and "Feature freeze"? Paolo
LC_COLLATE (was Re: SVN Test Repo updated)
The sort alghorithm has nothing to do with ls, but with your selection of LC_COLLATE. But then, BSD (at least the variant used in MacOSX) is way behind current l10n standards. At least they do not break s/[A-Z]// which on "well-internationalized" OSes is case-insensitive with most locales other than C. I still haven't dug enough to understand if the responsible for this is the POSIX specification for localization, the ANSI specification for strcoll, or somebody in the glibc team. But I know that it was the most-reported sed "bug" before I explicitly flagged it as a non-bug in the manual. I can only guess the outcry if Perl started obeying LC_COLLATE. Paolo
Re: LC_COLLATE (was Re: SVN Test Repo updated)
I can only guess the outcry if Perl started obeying LC_COLLATE. What do you mean, "started"? It's been doing that for years now. By default, Perl ignores the current locale. The "use locale" pragma tells Perl to use the current locale for some operations: and these do not include regex character ranges. LC_COLLATE would only be used for sorting and for string comparisons. Paolo
Project submission for GCC 4.1 - AltiVec rewrite
I had already submitted this to Mark, but since I have improved a few rough spots in the code I think it's better to make it public. * Project Title AltiVec rewrite. * Project Contributors Paolo Bonzini * Dependencies none > * Delivery Date March 15th or earlier (the implementation is complete and has no regressions). > * Description The project reimplements the AltiVec vector primitives in a saner way, without putting the burden on the preprocessor and instead processing the "overloading" in the C front-end. This would benefit compilation speed on AltiVec vector code, and move the big table of AltiVec overloading pairs from an installed header file to the compiler (an 11000-line header file is reduced to 500 lines plus 2500 in the compiler). The changes are so far self contained in the PowerPC backend, but I would expect that a hack that I am using will require to be changed upon review. Unfortunately, a previous RFC I posted on the gcc mailing list had (almost) no answer. I plan to take a look at apple-ppc-branch, which supposedly does not need this hack, or to ask for feedback when I submit the project. The current implementation improves the existing implementation in that anything but predicates will accept unparenthesized literals even in C. This line: vec_add (x, (vector unsigned int) {1, 2, 3, 4}) now fails in C and works in C++, but with the new implementation would work in C as well. On the other hand, using a predicate like this vec_all_eq (x, (vector unsigned int) {1, 2, 3, 4}) will still not work in C (it will *not* be a regression in C++, where it will be okay both without and with my changes). It would have to be written as vec_all_eq (x, ((vector unsigned int) {1, 2, 3, 4})) exactly as in the current implementation. Paolo
Re: MMX built-ins performance oddities
- vector version is about 3% faster than above instead of 10% slower - wow! So why is gcc 4.0 producing worse code when using intel style intrinsics and why isn't the union version using builtins as fast as using the vector version? I can answer why unions are slower: that's because they are spilled to memory on every assignment -- GCC 4.0 knows how to replace structs with different scalar variables (one per item), but not unions. GCC 3.4 knew about none of these possibilities. About why vectors are faster, well, a lot of the vector support has been rewritten in GCC 4.0 so that may be the case. I do not know exactly why builtins are still slower, but you may want to create a PR and add me on the CC list ([EMAIL PROTECTED]). Paolo
Re: [RFA:] change back name of initial rtl dump suffix to ".rtl".
> Ah, that's triggered by -fdump-rtl-expand-detailed (it is revision 2.28, > for which I could not find an entry on gcc-patches). Do you know of a reason why that isn't on by default? Because -fdump-rtl-expand-detailed includes *two* copies of the RTL: one lacks the prologue and epilogue but is interleaved with trees, the other is the standard -fdump-rtl-expand dump. > ISTR the name change was to avoid a switch named -fdump-rtl-rtl. To invent an option name alias and use a minor repetition in it as a reason for changing the old behavior is Bad. It is not merely an option name alias. It came together with a redesign of the way RTL dumps work, to integrate their management with tree dumps and to allow (in the future) to have various levels of detail in the RTL dumps as well. I never had a problem with the rename because I use fname*.00.* (or the analogous completion key sequence) to invoke an editor on the RTL dump (or in general any other dump). Paolo
Re: [RFA:] change back name of initial rtl dump suffix to ".rtl".
Gabriel Dos Reis wrote: Paolo Bonzini <[EMAIL PROTECTED]> writes: | > > ISTR the name change was to avoid a switch named -fdump-rtl-rtl. | > To invent an option name alias and use a minor repetition in it | > as a reason for changing the old behavior is Bad. | | It is not merely an option name alias. It came together with a | redesign of the way RTL dumps work, to integrate their management with | tree dumps and to allow (in the future) to have various levels of | detail in the RTL dumps as well. maybe an option with argument? -fdump-rtl=detailed -fdump-rtl=classic # same as -fdump-rtl -fdump-rtl-* (where the star is a path name) is the name of options that enable RTL dumps, e.g. -fdump-rtl-lreg or -fdump-rtl-cse2; "expand" is just a path name, and since the final part of the dump file name is the pass name, the file name gets changed from "foobar.00.rtl" to "foobar.00.expand". Paolo
Re: a mudflap experiment on freebsd
I think the decision to force the user to specify -lmudflap should be revisited. I agree. The fixincludes build failed with link errors for undefined mudflap functions. The fixincludes Makefile.in does not use CFLAGS when linking. I added $(CFLAGS) to the 3 rules that contain a link command. I think this can be committed as obvious, especially by a GWP person as you are... I now get a configure failure in the intl directory. configure is unable to run programs compiled by $CC. I gets an error from the dynamic linker complaining that libgcc.so can't be found. The problem here is that the toplevel Makefile support for LD_LIBRARY_PATH is confused. See the SET_GCC_LIB_PATH support. It is completely broken. If you modify configure.ac, and make from within the GCC directory, the LD_LIBRARY_PATH will not even include the GCC directory. I have a patch queued for 4.1 for this, but I want to see the PRs and try to reproduce the problem. I don't think it is a requirement to only run make from within the toplevel, even if TARGET-* variables alleviate the problem. ! /* We must allocate one more entry here, as we use NOTE_INSN_MAX as the !default field for line number notes. */ ! static const char *const note_insn_name[NOTE_INSN_MAX+1] = { I think this also ought to be committed. Paolo
Re: Benchmark of gcc 4.0
For GCC, I used in both cases the flags -march=pentium4 -mfpmath=sse -O3 -fomit-frame-pointer -ffast-math > As for gcc4 vs gcc3.4, degradataion on x86 architecture is most probably because of higher register pressure created with more aggressive SSA optimizations in gcc4. Try these five combinations: -O2 -fomit-frame-pointer -ffast-math -O2 -fomit-frame-pointer -ffast-math -fno-tree-pre -O2 -fomit-frame-pointer -ffast-math -fno-tree-pre -fno-gcse -O3 -fomit-frame-pointer -ffast-math -fno-tree-pre -O3 -fomit-frame-pointer -ffast-math -fno-tree-pre -fno-gcse You may also want to try -mfpmath=sse,387 in case your benchmarks use sin, cos and other trascendental functions that GCC knows about when using 387 instructions. Paolo
Re: GCC 4.1 Projects
Daniel Jacobowitz wrote: On Sun, Feb 27, 2005 at 03:56:26PM -0800, Mark Mitchell wrote: Daniel Jacobowitz wrote: On Sun, Feb 27, 2005 at 02:57:05PM -0800, Mark Mitchell wrote: Nathanael said it did not interfere with any of the other _projects_, not that it would be disjoint from all Stage 1 _patches_. Fair point. I would certainly prefer that you hold off until Stage 2, as indicated by the documented I posted. Could you explain what benefits from waiting? None of the other large, The primary benefit is just reduced risk, as you suggest. The Stage 1 schedule looks full to me, and I'd like to see those patches go in soon so that we can start shaking out the inevitable problems. I'm much less worried about the long-term impact of Nathanael's patch; if it breaks something, it will get fixed, and then that will be that. But, that brief breakage might make things difficult for people putting in things during Stage 1, or compound the problem of having an unstable mainline. I think that's not a useful criteria for scheduling decisions. Let me be more concrete. Paolo Bonzini posted a patch to move in-srcdir builds to a host subdirectory. This is a substantial build infrastructure change, even though it will not affect a substantial number of developers - assuming it works correctly. I consider it no different "in kind" from Nathanael's changes. He can approve that; so a system where he can't approve his own tested patch is one in which you are overriding his judgement. ISTR that that is exactly what you did not want to do with this scheduling exercise. No offense intended to Paolo, of course! I picked a recent example. We're less than a week into stage 1, so I don't have much in the way of samples to draw on. No offense perceived, of course. FWIW I fully agree with you, and my next queued patch is something to clean up the SET_LIB_PATH mess in the toplevel, and it does have a potential of breaking bootstrap (I'll post it as a call-for-testing, because it affects only ia64 in practice and I don't have one of them). I just came across this, so I did not post it as a project to Mark. Paolo
Re: GCC 4.1 Projects
Daniel Jacobowitz wrote: On Sun, Feb 27, 2005 at 03:56:26PM -0800, Mark Mitchell wrote: Daniel Jacobowitz wrote: On Sun, Feb 27, 2005 at 02:57:05PM -0800, Mark Mitchell wrote: Nathanael said it did not interfere with any of the other _projects_, not that it would be disjoint from all Stage 1 _patches_. Fair point. I would certainly prefer that you hold off until Stage 2, as indicated by the documented I posted. Could you explain what benefits from waiting? None of the other large, The primary benefit is just reduced risk, as you suggest. The Stage 1 schedule looks full to me, and I'd like to see those patches go in soon so that we can start shaking out the inevitable problems. I'm much less worried about the long-term impact of Nathanael's patch; if it breaks something, it will get fixed, and then that will be that. But, that brief breakage might make things difficult for people putting in things during Stage 1, or compound the problem of having an unstable mainline. I think that's not a useful criteria for scheduling decisions. Let me be more concrete. Paolo Bonzini posted a patch to move in-srcdir builds to a host subdirectory. This is a substantial build infrastructure change, even though it will not affect a substantial number of developers - assuming it works correctly. I consider it no different "in kind" from Nathanael's changes. He can approve that; so a system where he can't approve his own tested patch is one in which you are overriding his judgement. ISTR that that is exactly what you did not want to do with this scheduling exercise. No offense intended to Paolo, of course! I picked a recent example. We're less than a week into stage 1, so I don't have much in the way of samples to draw on. No offense perceived, of course. FWIW I fully agree with you, and my next queued patch is something to clean up the SET_LIB_PATH mess in the toplevel, and it does have a potential of breaking bootstrap (I'll post it as a call-for-testing, because it affects only ia64 in practice and I don't have one of them). I just came across this, so I did not post it as a project to Mark. Paolo
Re: request for timings - makedepend
and report (a) the numbers reported by the "time" command, (b) what sort of machine this is and how old, and (c) whether or not you would be willing to trade that much additional delay in an edit-compile-debug cycle for not having to write dependencies manually anymore. Linux P4 3.4 GHz: real0m5.212s user0m4.330s sys 0m0.320s MacOS G4 1.5 GHz: real 0m17.100s user 0m12.740s sys 0m2.720s Maybe you can use "$?" in some way? It would be fine for me to trade a slowdown (e.g. the full dependencies being built in twice the time) in the first compilation, with a much smaller edit-compile-debug delay. Paolo
Re: [BUG mm] "fixed" i386 memcpy inlining buggy
The only thing that would avoid this is to either tell the compiler to never put esi/edi in memory (which I think is not possibly across different versions of gcc) or to always generate a single asm section for all the different cases. Use __asm__ ("%esi") and __asm__ ("%edi"). It is not guaranteed that they access the registers always (you can still have copy propagation etcetera); but, if your __asm__ statement constraints match the register you specify, then you can be reasonably sure that good code is produced. Paolo
Re: Input and print statements for Front End?
I can't seem to find any info regarding an input or print statement, so i can read integers(my language only deals with integers) from the stdio and return integer results to stdio. You need to map these to printf/scanf calls. Paolo
Re: GCC 4.0 RC1 Available
Kaveh R. Ghazi wrote: Nathanael removed the surrounding for-stmt but left the break inside the if-stmt. http://gcc.gnu.org/ml/gcc-patches/2003-11/msg02109.html I think it is enough to remove it. bash does not complain if it finds a stray break, it seems. Ok to commit to mainline (and src)? Mark, if you decide to fix it in 4.0, I think it is better that you do it yourself also because of the time zone difference (I'll be out of home this evening, which is morning/afternoon for you). Paolo 2005-04-12 Paolo Bonzini <[EMAIL PROTECTED]> * configure: Regenerate. config: 2005-04-12 Paolo Bonzini <[EMAIL PROTECTED]> * acx.m4 (ACX_PROG_GNAT): Remove stray break. Index: acx.m4 === RCS file: /cvs/gcc/gcc/config/acx.m4,v retrieving revision 1.11 diff -p -u -r1.11 acx.m4 --- acx.m4 28 Feb 2005 13:25:55 - 1.11 +++ acx.m4 12 Apr 2005 07:04:16 - @@ -212,7 +212,6 @@ acx_cv_cc_gcc_supports_ada=no errors=`(${CC} -c conftest.adb) 2>&1 || echo failure` if test x"$errors" = x && test -f conftest.$ac_objext; then acx_cv_cc_gcc_supports_ada=yes - break fi rm -f conftest.*])
Re: My opinions on tree-level and RTL-level optimization
I think Roger simply mis-spoke because in his original message, he said what you said: the important issue is having the alias information available in RTL. Much (but not all: eg., SUBREG info) of that information is best imported down from the tree level. Well, paradoxical subregs are just a mess: optimizations on paradoxical subregs are better served at the tree level, because it is just obfuscation of e.g. QImode arithmetic. Indeed, my patch removed an optimization on paradoxical subregs, and kept an optimization on non-paradoxical subregs. Take this code: long long a, b, c, d; int x; ... c = a * b; d = (int) x * (a * b); In my view, tree-level optimization will catch (a * b) as a redundant expression. RTL-level optimization will catch that the high-part of "(int) x" is zero. Roger proposed lowering 64-bit arithmetic to 32-bit in tree-ssa! How would you do it? Take long long a, b, c; c = a + b; Would it be c = ((int)a + (int)b) + ((int) (a >> 32) + (int) (b >> 32) + ((unsigned int) a < (unsigned int) b)) << 32; Or will you introduce new tree codes and uglifying tree-ssa? Seriously... This is a very inaccurate characterization of CSE. Yes, it does those things, but eliminating common subexpressions is indeed the major task it performs. It was. Right now, the only thing that fold_rtx tries to simplify is (mult:SI (reg:SI 58) 8) to (ashiftrt:SI (reg:SI 58) 3) Only to find out it is not a valid memory_operand... I have a patch to completely disable calling fold_rtx recursively, only equiv_constant. That was meant to be part 3/n of the cleanup fold_rtx series. I was prepared to take responsibility for every pessimization resulting from these cleanups, and I expected to be sure I'd find a better way to do the same thing. A 7000-lines constant propagator... I think there's a serious conceptual issue in making the tree level too machine-dependent. The *whole point* of doing tree-level optimizations is to do machine-*independent* optimizations. Trees are machine-independent and RTL is machine-dependent. If we go too far away from that, I think we miss the point. No, the whole point of doing tree-level optimizations is to be aware of high-level concepts before they are lowered. No need to worry about support for QImode-size arithmetic. No need to worry if 64-bit multiplication had to be lowered. Besides, the RTL optimizers are not exactly a part of GCC to be proud of if "ugliness" is a measure. Really? The biggest and less readable files right now are combine.c, reload.c, reload1.c. cse.c is big (though not extreme) but unreadable. OTOH, stuff like simplify-rtx.c or especially fold-const.c is big but readable. Of course GCC will always need a low-level IR. But, combine is instruction selection in the worst possible way; It served GCC well for decades, so I hardly think that's a fair statement. Never heard about dynamic programming? reload is register allocation in the worst possible way, Reload is not supposed to do register allocation. To the extent that it does, I agree with you. But what this has to do with the issue of tree vs. RTL optimization is something I don't follow. Surely you aren't suggesting doing register allocation at the tree level? No, he's suggesting cleaning up stuff, so that it is easier to stop doing things in the worst possible way. He's suggesting to be realistic once code has run completely out of control. Luckily some GWP people do care about cleaning up. Richard Henderson did a lot of work on cleaning up RTL things left from olden times (think eh, nested functions, addressof, save_expr,...), Zack did some work on this ground in the past as well, Bernd is maybe the only guy who could pursue something such as reload-brench... I hate to make "clubs" out of a community, but it looks like only some people care of the state of the code... Steven has done most of the work for removing the define_function_unit processor descriptions. I removed ~5000 lines of code after tree-ssa went in (including awful stuff such as protect_from_queue, which made sense maybe in 1990, and half of stmt.c). Kazu is also in the CSE-cleanup game. Maybe, link in my case, it's only because I have limited time to spend on GCC and think that cleaning up is a productive way to use this time. But anyway I think it is worth the effort. Paolo
Re: GCC 4.0 RC1 Available
Kaveh R. Ghazi wrote: When this patch went into 4.0, Paolo didn't regenerate the top level configure, although the ChangeLog claims he did: http://gcc.gnu.org/ml/gcc-cvs/2005-04/msg00842.html You're right. I was being conservative and typed the "cvs ci" filenames manually, but in this case there was no need because I worked off a fresh checkout. Sorry. The patch should also be applied to mainline, since the "break" problem exists there too. I'm not sure why it wasn't, but perhaps your "OK for 4.0.0" didn't specify mainline and Paolo was being conservative. I think we should fix it there also. Yes, I was. But it looks like build machinery maintainers are being busy and toplevel patches are largely unnoticed. Paolo