from:"Paolo Bonzini"

Re: Official GCC git repository

2008-03-14 Thread Paolo Bonzini


Bernardo Innocenti wrote:

Harvey Harrison wrote:


A few things I'd like to clean up if we move a copy over:

1) generate a author's file so the commits have better than just a login
name as the commiter/author.


Or in the ChangeLog.

Paolo

Re: Official GCC git repository

2008-03-14 Thread Paolo Bonzini




My current plan is to bug a few of our devs to try git, and a few to
try hg (for a few weeks each), giving them whatever tutorials are
around, and see if they find it better enough than subversion.


I can try to use git, but I'm already quite experienced in it so I'm not 
representative.



(Personally, I use hg now because being able to log/etc the entire gcc
history and do offline commits makes my life a lot easier now that i
travel more).


Same here.  I was using tla for other projects, but it was so slow that 
the only benefit for me was offline commits.  How useful it is to have 
the entire history at hand, that's something you just have to try to 
understand. :-)


Paolo

Re: why don't have 'umul' rtx code

2008-03-18 Thread Paolo Bonzini


Eric Fisher wrote:

hi

I'm not clear why we have 'udiv', but don't have 'umul' for Standard
Pattern Names. Does I need to define a nameless pattern for it?


Because non-widening multiplication is the same for signed and unsigned. 
 We have:


  mul3
  mul3   (signed x signed)
  umul3  (unsigned x unsigned)
  usmul3 (unsigned x signed)
  umul3_highpart(unsigned x unsigned)
  smul3_highpart(signed x signed)

Paolo

Re: [trunk] Addition to subreg section of rtl.text.

2008-03-20 Thread Paolo Bonzini




I think one reason is that allowing zero_extracts of multi-word modes is
(like this subreg thing) a little hard to pin down.  What happens when
WORDS_BIG_ENDIAN && !BYTES_BIG_ENDIAN


Unless I had my grep wrong, the only such machines to do this are PDP11 
and ARM with special flags (-mbig-endian -mwords-little-endian) that 
were "for backward compatibility with older versions of GCC" in 1999 [1].


So, is this special case worth keeping?

Paolo

[1] 
http://www.ecos.sourceware.org/ecos/docs-1.2.1/ref/gnupro-ref/arm/index.html

Re: [trunk] Addition to subreg section of rtl.text.

2008-03-20 Thread Paolo Bonzini


Richard Sandiford wrote:

Paolo Bonzini <[EMAIL PROTECTED]> writes:

I think one reason is that allowing zero_extracts of multi-word modes is
(like this subreg thing) a little hard to pin down.  What happens when
WORDS_BIG_ENDIAN && !BYTES_BIG_ENDIAN
Unless I had my grep wrong, the only such machines to do this are PDP11 
and ARM with special flags (-mbig-endian -mwords-little-endian) that 
were "for backward compatibility with older versions of GCC" in 1999 [1].


So, is this special case worth keeping?


Good question.  Unless I'm missing something, PDP11 isn't yet on the
deprecrated list.  Is that right?  If so, I suppose we can't remove
it before 4.5 at the earliest.


It was in the 4.3 list, then Paul Koning stepped up to do some work on 
it but then nothing happened.


[context for Paul: PDP-11 is the last target for which BYTES_BIG_ENDIAN 
!= WORDS_BIG_ENDIAN]


http://gcc.gnu.org/ml/gcc/2008-01/msg00339.html

Paolo

Re: [trunk] Addition to subreg section of rtl.text.

2008-03-20 Thread Paolo Bonzini




(Yes, the documentation suggests byte_mode for MEMs, but the SH port
uses zero_extracts of SImode MEMs as well, so presumably we're supposed
to support other modes besides the documented ones.)


I think it is just that no one cares about a MEM's mode in this case.

Paolo

Re: [trunk] Addition to subreg section of rtl.text.

2008-03-20 Thread Paolo Bonzini


Bernd Schmidt wrote:

Joern Rennecke wrote:
And @code{(subreg:SI (reg:DF 10) 0)} would be a natural way to 
express that

you are using the floating point register as a 32 bit integer register,
with writes clobbering the entire 64 bit of the register.

Yes, this is one possible definition.  But there's no reason in this
situation why you couldn't just use a single REG.  Why use subregs at 
all?


Because before reload, you use pseudos.  And in order for
(subreg:SI (reg:DF ...) ...) to be viable, it still has to be viable 
between

hard register allocation and alter_reg.


Is that even valid?  Are there any known ports using this?  AFAIR the 
middle-end doesn't create this (although it will use (subreg:SF (reg:DI)).


SPE has patterns for

[(set (match_operand:SI 0 "rs6000_nonimmediate_operand" "+r,m")
  (subreg:SI (match_operand:SPE64TF 1 "register_operand" "r,r") 4))]

for example.

Paolo

Re: Different *CFLAGS in gcc/Makefile.in

2008-03-21 Thread Paolo Bonzini




## the C flags (without any gcc -I...stuff) to be included in
## compilation of MELT generated C code thru the melt-cc-script
## do not put  $(INTERNAL_CFLAGS) $(COVERAGE_FLAGS) $(WARN_CFLAGS) ##there!
MELT_CFLAGS= $(X_CFLAGS) $(T_CFLAGS) $(CFLAGS) $(XCFLAGS)

But I'm not sure of the T_CFLAGS (it probably is related to target 
specific stuff only).


T_CFLAGS is flags that the *target* decides to add.  It's actually used 
only by ia64/t-hpux and gcc.c, and getting rid of it in some way would 
be a good thing.


I think you're okay with your choice.

Paolo

Re: [trunk] Addition to subreg section of rtl.text.

2008-03-24 Thread Paolo Bonzini

The second is to say explicitly that 
subregs of subregs are not legal.


Yes, you should always use (directly or indirectly) simplify_gen_subreg.

Paolo

Re: Official GCC git repository

2008-03-26 Thread Paolo Bonzini




I was only suggesting it as a nicity, if people are happy with the
login name alone.


What about "Real Name  <[EMAIL PROTECTED]>"?  The overseers have the 
mapping, or you can sort of guess it from the names in the ChangeLog. 
This has to be decided before the first push, so it's kind of urgent to 
decide it.


Paolo

Re: GSOC Student application

2008-03-30 Thread Paolo Bonzini




There are issues of Garbage Collection from libgcc or Boehms's GC


Two mistakes in one line.  Congratulations J.C. for confusing a 
prospective GSoC contributor.


So far your messages were just useless and decreasing signal-to-noise 
ratio.  Now you've escalated to actually damaging activity.


Paolo

Re: GSOC Student application

2008-04-01 Thread Paolo Bonzini




Joe> It's best to ignore J.C. Pizarro.  He's an attention-seeking troll,
Joe> who has just enough technical knowledge to derail conversation.

I think that if we've reached the point where an SC member feels the
need to post disclaimers about someone's posts, then that someone
ought to simply be banned.

I know this is extreme, and as far as I know we've never done it
before.  But, in my opinion, we've been more than tolerant here.
There's no benefit that I can see to putting up with this kind of bad
behavior.


The downside of banning J.C. is that if he replies-to-all, no one else 
would be alerted of his message -- and whoever he replies to (Alexey in 
this case) may have no clue that he should not pay attention to the message.


Paolo

Re: Bootstrap failure due to a typo in gcc/fwprop.c

2008-04-02 Thread Paolo Bonzini




This is due to revision 133828 and fixed by the following patch:

--- ../_gcc_clean/gcc/fwprop.c  2008-04-02 12:12:57.0 +0200
+++ gcc/fwprop.c2008-04-02 13:44:07.0 +0200
@@ -231,7 +231,7 @@
  PR_HANDLE_MEM is set when the source of the propagation was not
  another MEM.  Then, it is safe not to treat non-read-only MEMs as
  ``opaque'' objects.  */
-  PR_HANDLE_MEM = 2,
+  PR_HANDLE_MEM = 2
 };


Committed as 133833.

Paolo

Re: Bootstrap comparison failures on i586

2008-04-03 Thread Paolo Bonzini


Eric Botcazou wrote:

Hi,

Since yesterday I'm having seemingly random bootstrap comparisons failures on 
i586-suse-linux: for caller-save.o yesterday, for build/gensupport.o today at 
revision 133861.  But a second tree at the same revision bootstrapped fine.


Is anyone else seeing this?


Have you tried running valgrind?

Paolo

Re: RFC Test suite fix testing of no_trampolines

2008-04-05 Thread Paolo Bonzini


Andy H wrote:
There are several test in testsuite that use trampolines that are still 
run with dejagnu switch set to  no_trampolines.


Its on my TODO list for AVR target but a recent email reminded me that 
it  affects testing of other targets than can't or won't support 
trampolines.


Theres an  old patch by Björn Haase that was approved but not committed 
in 2005 that addressed many of these


http://gcc.gnu.org/ml/gcc-patches/2005-05/msg01919.html


The patch was even approved...

Paolo

Re: US-CERT Vulnerability Note VU#162289

2008-04-07 Thread Paolo Bonzini


Rainer Emrich wrote:

http://www.kb.cert.org/vuls/id/162289

Any comments?


See http://www.airs.com/blog/archives/120 for a good blog post by Ian 
Lance Taylor about this issue.  -Wstrict-overflow=5  can be used to find 
cases where optimizations break not standard specified overflow cases, 
since GCC 4.2.


Also, -ftrapv is a little broken and may have false negatives.  On the 
other hand, -fwrapv should not cause any problems.


If you find that -fwrapv hinders performance of your application, you 
can also try "-fwrapv -funsafe-loop-optimizations 
-Wunsafe-loop-optimizations".  This will restrict overflow assumptions 
to those needed to optimize loops, and also warn whenever the compiler 
made this kind of assumptions.  You can then audit any warning that you 
get to see if they have security implications for your application.


Paolo

Re: Copyright assignment wiki page

2008-04-08 Thread Paolo Bonzini




Then I suggest changing our contribute page from
contact us (either via the gcc@gcc.gnu.org list or the GCC maintainer 
that is taking care of your contributions) to obtain the relevant forms



to
contact us (either via the gcc@gcc.gnu.org list or a GCC Steering 
Commitee member) to obtain the relevant forms




to reflect this.


It's not so hard actually.  Any person who has a GNU account can get 
them.  I just checked and, among people who are not SC members and are 
usually on IRC I counted 6-7 people.  Just ask them and they will 
forward you the administrivia form.


Paolo

Re: US-CERT Vulnerability Note VU#162289

2008-04-13 Thread Paolo Bonzini




(as an aside, as most target implementations treat pointers as unsigned
values, its not clear that presuming signed integer overflow semantics are
a reasonable choice for pointer comparison optimization)


The point is not of presuming signed integer overflow semantics (I was 
corrected on this by Ian Taylor).  It is of presuming that pointers 
never move before the beginning of their object.  If you have an array 
of 20 elements, pointers &a[0] to &a[20] are valid (accessing &a[20] is 
not valid), but the compiler can assume that the program does not refer 
to &a[-2].


Paolo

Re: US-CERT Vulnerability Note VU#162289

2008-04-15 Thread Paolo Bonzini




A theoretical argument for why somebody might write problematic code
is http://www.fefe.de/openldap-mail.txt .


But that's like "putting the cart before the horses" (and complaining 
that it does not work).


You find a security problem, you find a solution, you find the compiler 
optimizes away, you blame the compiler.  You don't look for an 
alternative, which would be the most sensible: compare the length with 
the size, without unnecessary pointer arithmetic.  Since the length is 
unsigned, it's enough to do this:


  if (len > (size_t) (max - ptr))
/* overflow */ ;

Paolo

Re: IRA for GCC 4.4

2008-04-24 Thread Paolo Bonzini




(The testcase is 400k lines of preprocessed Fortran code, 16M is size,
available here:
http://www.pci.unizh.ch/vandevondele/tmp/all_cp2k_gfortran.f90.gz)

  

Thanks, I'll check it.


Vlad, I think you should also try to understand what does trunk do with 
 global (and without local allocation) at -O0.  That will give a 
measure of the benefit from Peter's patches for conflict graph building.


Another thing to evaluate is the impact of changing gimplify.c so that 
it always follows the "if (optimize)" paths.  The differences are there 
exactly because we don't run global register allocation at -O0, and they 
create more pseudos.


Paolo

Re: Security vulernarability or security feature?

2008-04-27 Thread Paolo Bonzini




I think Java handles it OK for floats. I.e. Tests for positive
infinity and negative infinity etc.
I don't think Java handles it for integer maths.


Java integer math is mandated to have wrap-around semantics.  So you can 
do something like


if ((b^c) > 0 && (a^c) < 0 && (a^b) < 0)
  overflow

Paolo

Re: Weird result for modulus operation

2008-04-29 Thread Paolo Bonzini


Ang Way Chuang wrote:

Ang Way Chuang wrote:

Andrew Pinski wrote:

On Tue, Apr 29, 2008 at 9:08 PM, Ang Way Chuang <[EMAIL PROTECTED]> wrote:

 Thanks for the speedy reply. But why this code:

int a = 17, b = 16;
a = a++ % 16;

 results in a = 2 then? I think I need to know what is sequence 
point. I'll

google that.


As I mentioned, the code is undefined so it could be any value.


Is there any flag in gcc that can provide warning to code that relies 
on undefined behaviours?


Found it. -Wsequence-point which is enabled by -Wall. But gcc didn't 
fart out any warning with -Wall or -Wsequence-point flag :(


You found a bug, it does point out the problem with the second example here.

Paolo

Re: Weird result for modulus operation

2008-04-30 Thread Paolo Bonzini


 Thanks for the speedy reply. But why this code:

int a = 17, b = 16;
a = a++ % 16;


Huh? Now you got me confused. Since it is an undefined behaviour, gcc is 
free to whatever it likes.


Sure, but if you ask gcc to signal a warning, it is supposed to do so. 
:-)  It is a bug that gcc with -Wsequence-point signals a warning for "a 
= a++ % 16" but not when you use abc.a.


Though the answer given by the first and 
second examples show inconsistency in gcc in handling the undefined 
behaviour.


That's not a problem.  GCC does not have to be consistent.  But both 
should be warned about.


I can't forward to gmane.comp.gcc.devel newsgroup with my 
account.


No problem, you can delete it.

Paolo

Re: Division using FMAC, reciprocal estimates and Newton-Raphson - eg ia64, rs6000, SSE, ARM MaverickCrunch?

2008-05-09 Thread Paolo Bonzini




I'd like to implement something similar for MaverickCrunch, using the
integer 32-bit MAC functions, but there is no reciprocal estimate
function on the MaverickCrunch.  I guess a lookup table could be
implemented, but how many entries will need to be generated, and how
accurate will it have to be IEEE754 compliant (in the swdiv routine)?


I think sh does something like that.  It is quite a mess, as it has half 
a dozen ways to implement division.


The idea is to use integer arithmetic to compute the right exponent, and 
the lookup table to estimate the mantissa.  I used something like this 
for square root:


1) shift the entire FP number by 1 to the right (logical right shift)
2) sum 0x2000 so that the exponent is still offset by 64
3) extract the 8 bits from 14 to 22 and look them up in a 256-entry, 
32-bit table

4) sum the value (as a 32-bit integer!) with the content of the table
5) perform 2 Newton-Raphson iterations as necessary

example, 3.9921875

byte representation = 0x407F8000
shift right = 0x203FC000
sum = 0x403FC000
extract bits = 255
lookup table value = -4194312 = -0x48
adjusted value = 16r3FFFBFF8, which is the square root

the table is simply making sure that if the rightmost 14 bits of the 
mantissa is zero the return value is right.  by summing the content of 
the lookup table, you can of course interpolate between the values.


With a 12-bit table (i.e. 16 kilobytes instead of just one) you will 
only need 1 iteration.


The algorithm will have to be adjusted for reciprocal (subtracting the 
FP number from 16r7F00 or better 16r7EFF should do the trick for 
the first two steps; and since you don't shift right by one you'll use 
bits 15-23).


Here is a sample program to generate the table.  It's written in 
Smalltalk (sorry :-P), it should not be hard to understand (but remember 
that indices are 1-based).  To double check, the first entries of the 
table are 1 -32512 -64519 -96026.


| a int adj table |
table := ##(| table a val estim |  table := Array new: 256.
0 to: 255 do: [ :i |
   a := ByteArray new: 4.
   "Create number"
   a intAt: 1 put: (i bitShift: 15).
   a at: 1 put: 64.
   val := (a floatAt: 1) reciprocal.

   "Perform estimation"
   a intAt: 1 put: (16r7EFF - (a intAt: 1)).
   estim := a intAt: 1.

   "Compute delta with actual value and store it"
   a floatAt: 1 put: val.
   table at: i + 1 put: ((a intAt: 1) - estim)
].
table).

"Here we do the actual calculation. `self' is the number
 to be reciprocated."

a := ByteArray new: 4.
a floatAt: 1 put: self.

"Perform estimation as above"
int := 16r7EFF - (a intAt: 1).

"Extract bits 15-23 and access the table."
adj := table at: ((a intAt: 1) // 32768 \\ 256) + 1.

"Sum the delta and convert from 32-bit integer to float"
a intAt: 1 put: (int + adj).
^(a floatAt: 1)


Also, where should I be sticking such an instruction / table?  Should I
put it in the kernel, and trap an invalid instruction?  Alternatively,
should I put it in libgcc


Yes, you could do this.

Paolo

Re: [RFC] Adjust output for strings in tree-pretty-print.c

2008-05-19 Thread Paolo Bonzini




Notice the added final '\0' in the C case; I don't know if it's bad to
have it there, but I don't see a way to not output it and still have
the correct output for Fortran (whose strings are not NUL-terminated).


I think the best thing to do is to have a langhook then.  I'm actually 
not sure that you want all those \0's in the Fortran front-end since the 
kind can be recovered from the {lb:1 sz:4} that is appended to the 
string.  Endianness issues may also appear.  Maybe you should call iconv 
in the langhook to get back to UTF-8, and print that representation instead.


Paolo

Re: [RFC] Adjust output for strings in tree-pretty-print.c

2008-05-19 Thread Paolo Bonzini


FX wrote:

I think the best thing to do is to have a langhook then.


It seems a bit weird to have a langhook for a one-character
difference, but if there's a consensus on it, I'll go along.


To me too, but I still maintain that it's better to print in UTF-8 
(which would make the langhook more useful).  The recent Unicode patches 
for C possibly could use the langhook too.



Endianness issues may also appear.  Maybe you should call iconv in the
langhook to get back to UTF-8, and print that representation instead.


Endianness is already take care of, in the sense that the string is
encoded in the target's endianness already.


But for testing you want a standardized endianness.  Otherwise some 
targets will need to scan "I\0\0\0" and others will need to scan "\0\0\0I".



However, that makes
calling iconv more difficult, because that has us going from target's
endianness to UTF-8, which will be a pain.


No, you can use UTF-32BE and UTF-32LE encodings.

Paolo

Re: Question about building hash values from pointers

2008-05-30 Thread Paolo Bonzini




it's uintptr_t which should be used, if only as an intermediate cast -
(unsigned long)(uintptr_t)ptr.

That's not possible because, IIRC, gcc must compile on C90 systems.


Right, so the only type remaining is size_t. IIRC there is problem for 
this type on some targets, too. AFAIC there are 24-bit pointers ...
This is the reason, why I was querying to introduce a new general type for 
such stuff in gcc.


size_t is ok I think, but just in case, there is an autoconf macro (used 
in libgfortran and libdecnumber) that provides int*_t.


Paolo

Re: 4.3.0 and 4.3.1 don't build startfiles (crtXXX.o files)

2008-06-08 Thread Paolo Bonzini




Then, running "make all-target-libgcc" built them, but I finally settled
for just "make" - it didn't error out.


Yes, the advantage of Paul's suggested process are not only that the 
installations are reproducible and always use the complete feature set 
of the underlying libc (that's the big part), but also that "make" just 
works and you are more shielded from changes in the build system.


Paolo

Re: 4.3.0 and 4.3.1 don't build startfiles (crtXXX.o files)

2008-06-08 Thread Paolo Bonzini


1) Binutils
2) Whatever bits of compiler are required to produce...
3) libc headers
4) A basic C compiler+libgcc that is sufficient to build...
5) libc
6) A full compiler+runtime, c++, fortran, etc.


If someone is willing to expand on the above and explain what exactly
do I need to do in step 2, in step 3, in step 4, that would be helpful.


You already made step 2.  Step 3 depends on your C library, I don't know 
the details for uclibc.


Step 4 is the same as step 2, but you will need less --disable-* options.

Paolo

Re: Inefficient loop unrolling.

2008-07-10 Thread Paolo Bonzini


Bingfeng Mei wrote:

Steven,
I just created a bug report. You should receive a CCed mail now.

I can see these issues are solvable at RTL-level, but require lots of
efforts. The main optimization in loop unrolling pass, split iv, can
reduce dependence chain but not extra ADDs and alias issue. What is the
main reason that loop unrolling should belong to RTL level? Is it
fundamental?


No, it is just effectiveness of the code size expansion heuristics. 
Ivopts is already complex enough on the tree level, that doing it on RTL 
would be insane.  But other low-level loop optimizations had already 
been written on the RTL level and since there were no compelling 
reasons, they were left there.


That said, this is a bug -- fwprop should have folded the ADDs, at the 
very least.  I'll look at the PR.


Paolo

Re: Is this the expected behavior?

2008-07-16 Thread Paolo Bonzini


Mohamed Shafi wrote:

2008/7/15 Ramana Radhakrishnan <[EMAIL PROTECTED]>:




  I agree with you, but what about when there are still caller save
register are available and there are no register restrictions for any
instructions? In my case i find that GCC has used only the argument
registers, stack pointer and callee saved registers. So out of the 16
available registers ony 5+1+4 registers were used, even though there
was 6 caller save registers were available


Check your REG_ALLOC_ORDER macro ?


  The order is argument registers, caller save registers and finally
the callee save registers.


Are there instructions that only work on the callee-save registers? 
This might confuse regclass (the pass that decides the register class 
preferences).


Paolo

Re: [Ada] multilib first patch

2008-07-20 Thread Paolo Bonzini




To build and be later able to install more than one version of the ada
library we need to change (at least) this assumption in some way
and keep more than one library build result around.


The best way would be to bite the bullet, and move the RTS for real to 
libada (instead of having libada as a proxy).  The fact that the 
compiler needs it is not a problem, you just need to make gnat depend on 
a host libada.


However, this is not a small change.

Paolo

Re: [Ada] multilib patch take two => multilib build working

2008-07-22 Thread Paolo Bonzini




I had to solve one rts source issue though:
gcc/ada/system-linux-x86_64.ads and x86.ads do hardcode the number of
bits for a word (64 and 32 respectively), I changed them both to be
completely the same and use GNAT defined Standard'Word_Size attribute.

-   Word_Size: constant := 64;
-   Memory_Size  : constant := 2 ** 64;
+   Word_Size: constant := Standard'Word_Size;
+   Memory_Size  : constant := 2 ** Word_Size;

The same change will have to be done on other 32/64 bits Ada targets. I
don't know if this change has adverse effect on GNAT build in some
situations.


I think this is worthwhile on its own, before the build patch goes in.


The patch is not complete yet of course but I'd appreciate feedback on
wether or not I'm on the right track for 4.4 inclusion.


It looks good to me, though I'll defer to Arnaud and other adacore people.

One nit:

+GNATLIBMULTI := $(subst /,,$(MULTISUBDIR))

Please substitute / with _ instead, to avoid unlikely but possible clashes.


[EMAIL PROTECTED]:~/tmp$ gnatmake -f -g 
-aO/home/guerby/build-mlib7/gcc/ada/rts32 -m32 p


I guess this fixing this requires duplicating in gnatmake and gnatbind 
the logic in gcc.c that uses the info produced by genmultilib.  Search 
gcc.c for


multilib_raw
multilib_matches_raw
multilib_extra
multilib_exclusions_raw
multilib_options
set_multilib_dir

Maybe it makes sense to make a separate .c module for this, so that both 
the driver and gnat{make,bind} can use it.


I'm not sure how much churn is there in gcc/ada/Makefile.in.  If there 
is little, it probably makes more sense to work on a branch.  If there 
is much, it probably makes more sense to commit the partially working 
patch.  Again, I'll defer to AdaCore people on this.


Paolo

Re: [Ada] multilib patch take two => multilib build working

2008-07-22 Thread Paolo Bonzini


Arnaud Charlet wrote:

I had to solve one rts source issue though:
gcc/ada/system-linux-x86_64.ads and x86.ads do hardcode the number of
bits for a word (64 and 32 respectively), I changed them both to be
completely the same and use GNAT defined Standard'Word_Size attribute.

-   Word_Size: constant := 64;
-   Memory_Size  : constant := 2 ** 64;
+   Word_Size: constant := Standard'Word_Size;
+   Memory_Size  : constant := 2 ** Word_Size;

The same change will have to be done on other 32/64 bits Ada targets. I
don't know if this change has adverse effect on GNAT build in some
situations.

I think this is worthwhile on its own, before the build patch goes in.


Not clear to me actually. The idea currently is to make these values
explicit so that when people read system.ads, they know right away what
the right value is.

Also, this points to a real flaw in the approach, since e.g. in case
of little/big endian multilibs, similar issue would arise.


Yes, if different multilibs should use different sets of sources, we 
have a problem.


On the other hand, this is also something that can be solved by moving 
the RTS to libada.  The standard approach with C multilibs is to rely on 
configure tests, or on #define constants, to pick the appropriate choice 
between multilibs, and I don't see why this shouldn't work with Ada. 
For example, g-soccon* files are generated automatically -- then, why 
not go one step further and generate a g-soccon file at configure time?



It seems that a branch would be more appropriate for this kind of work,
since there's a long way to go before getting in reasonable shape.


Given the above, I agree.


Also, it's not clear how using $(RTS) for building gnattools will work
properly.


Only target modules are multilibbed, so only one copy of gnattools is 
built.  I assume that the characteristics of the target do not affect 
the operation of gnattools -- different multilibs may use different 
filesystem paths and thus behave differently, but the *code* of 
gnattools should be the same in all cases.



[EMAIL PROTECTED]:~/tmp$ gnatmake -f -g 
-aO/home/guerby/build-mlib7/gcc/ada/rts32 -m32 p


There's an existing mechanism in gnatmake which is the use of the --RTS switch,
so ideally it would be great to match multilib install into --RTS=xxx
compatible dirs, and also have -xxx (e.g. -32 or -64) imply --RTS=xxx.


Yes, you would basically take from gcc.c the code that turns "-m32" into 
"use multilib 32", and use it to make a --RTS option.


Paolo

Re: [Ada] multilib patch take two => multilib build working

2008-07-22 Thread Paolo Bonzini




This will need some additionals tests on MULTILIB in the LIBGNAT_TARGET_PAIRS
machinery (3 files for x86 vs x86_64, solaris looks like already done, powerpc
seem 32 bits only right now, s390/s390x, others?) but it doesn't seem like a
blocking issue with the proposed design since each MULTILIB rts build has
completely separate directory and stamp (through RTSDIR) so there is no
possibility of conflict through sharing.

Do you agree with this assessment?


Unfortunately not.  The solaris bits are just "factoring" a bit the 
definition of LIBGNAT_TARGET_PAIRS.  The actual multilib definition can 
be arbitrary, for example you can have big-endian/little-endian 
multilibs.  If the design is explicitly to have constants spelled out in 
system-* files (instead of having them, for example, in Autoconf 
macros), there is not much that you can do.


Paolo

Re: [Ada] multilib patch take two => multilib build working

2008-07-22 Thread Paolo Bonzini




Paolo, do you know where to look for the list of multilibs on a given platform
in the GCC sources? And if we want to disable some of them for Ada?


In the makefile fragments t-*, in places like config/i386/t-linux64

MULTILIB_OPTIONS = m64/m32
MULTILIB_DIRNAMES = 64 32
MULTILIB_OSDIRNAMES = ../lib64 $(if $(wildcard $(shell echo 
$(SYSTEM_HEADER_DIR))/../../usr/lib32),../lib32,../lib)


Problematic (from an Ada point of view) configurations are those like 
config/arm/t-xscale-elf:


MULTILIB_OPTIONS = mbig-endian
MULTILIB_DIRNAMES= be
MULTILIB_EXCEPTIONS  =
MULTILIB_MATCHES = mbig-endian=mbe mlittle-endian=mle

and 32/64-bit configurations like x86_64-pc-linux-gnu.

Paolo

Re: [Ada] multilib patch take two => multilib build working

2008-07-22 Thread Paolo Bonzini




There is a Standard'Default_Bit_Order so it's the same as Word_Size: we just
loose "source" documentation (and gain less diff between target file).


Yes, but Arnaud said that system-* constants are written down for a 
reason.  I don't understand *what* is the reason, but that's just 
because I have no clue (among other things) of what parts of the Ada RTS 
are installed.


For example, are these system-*.ads files installed?  If so, is it 
possible to install different versions for each multilib?  If not 
(question to Arnaud), is this self-documentation meant only for the 
people reading the source code of GNAT?  IOW, is it meant as a 
quick-and-dirty reference as to what are the characteristics of the target?


My impression is that 50% of the constants system-*.ads files are 
boilerplate (same for every file), 30% is easily derivable from 
configure.ac tests or from other Standard'xyz constants, and only 20% is 
actually something that depends on how GNAT works on the target.


If it was up to me, I would make the system.ads file automatically 
generated (except for the latter 20%, which would have to be specified 
in some way).  That would simplify multilibbing, but is moot unless 
there is a guarantee that this 20% is *totally* derivable from the 
target triplet, so that no conceivable flag combination can affect it. 
If this guarantee is not there, any attempt to multilib the Ada RTS is 
going to be a sore failure.


Paolo

Re: [Ada] multilib patch take two => multilib build working

2008-07-22 Thread Paolo Bonzini




I think you will end up having to support generating different
source trees for each multilib variant to be safe and correct.


Yes, that comes out naturally if the RTS is built in libada.  In fact, 
Arnaud said:



The idea currently is to make these values
explicit so that when people read system.ads, they know right away what
the right value is.


That's "when people read system.ads", not "when people read 
system-linux-x86.ads".  In other words, he's not necessarily against 
automatically generating system.ads from other means, for example using 
configure tests.  Which, I repeat, comes out naturally if the RTS build 
is confined in libada.



This will work for native builds but may have problems on
cross builds where you can't run a program.  I know for the
RTEMS g-soccon* file we have to run a program on target
hardware and capture the output.


Do you really need to run programs?  Most of gen-soccon can be done by 
running Ada source code through the C pre-processor and massaging the 
output.  In fact, the code that would be passed through cpp strongly 
resembles gen-soccon.c itself.



If you move the source to libada and start potentially
using different source combinations for different multilib
variants, then it does need to be on a branch.

But some of the patches so far seem like they would be OK
to commit on the mainline and minimize diffs.


Yes, that's true.

Paolo

Re: [Ada] multilib patch take two => multilib build working

2008-07-22 Thread Paolo Bonzini


Arnaud Charlet wrote:

The idea currently is to make these values
explicit so that when people read system.ads, they know right away what
the right value is.
That's "when people read system.ads", not "when people read 
system-linux-x86.ads".  In other words, he's not necessarily against 
automatically generating system.ads from other means, for example using 
configure tests.  Which, I repeat, comes out naturally if the RTS build is 
confined in libada.


Right, that's one possibility, although people seem to be focusing
in system.ads alot, which is actually only the tip of the iceberg, and not
a real issue per se.


Still it's the biggest problem so far.

For example, g-bytswa-x86.adb vs. g-bytswa.adb is also a problem because 
a -mcpu=i386 multilib should use the latter; however, that's arguably 
already a GNAT bug for the i386-pc-linux-gnu configuration, so it can be 
left for later.


Good to hear about soccon, though.

Paolo

Re: [Ada] multilib patch / -imultilib and language drivers

2008-07-22 Thread Paolo Bonzini




What triggers the passing of -imultilib to a language driver?


It is passed together with -isystem, -iprefix and friends when %I is in 
a spec.  I'm sure you can define a new spec function and use it to pass 
the multilib_dir variable down to the Ada driver (see default_compilers, 
I guess you have to read some gcc.c).



Once gcc passes this info to gnat1 it will likely be easy to have
gnatmake/bind/link extract it when needed since those tools
call gcc.


I believe you on this. :-)

Paolo

Re: [Ada] multilib patch take three

2008-07-25 Thread Paolo Bonzini


Arnaud Charlet wrote:

Yes I volunteer. We're in stage1 so we have some time to sort out
reported issues before release.


OK. I'm still concerned that there is no simple fallback for all targets
that will break, except for --disable-multilib which is too strong since
it impacts other languages.

I'd be much more confortable with e.g. adding a --disable-mutilibada
or some such that would basically fall back to the previous state.


I volunteer to check if there is support for 
--enable-multilib=libstdc++-v3,libjava and if not add it. 
Unfortunately, --disable-multilib=ada cannot work (because --disable-xxx 
is the same as --enable-multilib=no).


As an alternative, people that don't want multilibbed libada can not use 
libada altogether.  More on this in a second.



Only libada/Makefile.in install will be invoked for all multilib
by the install machinery so gcc/ada/Makefile.in install cannot
do properly the rts install work anymore, hence the change. The relevant
libada/Makefile.in parts are now:


What about people using disable-multilib or --disable-libada ? I'd rather
keep the part in ada/Makefile.in for these cases.


Note that Laurent commented out install-gnatlib in ada/Make-lang.in.  I 
agree though that it doesn't hurt to keep those targets, and I think 
that this hunk should not be included.



Well, I still do not understand how install-gnatlib works in the new scheme,
I guess I need more detailed explanation, since I am not familiar with
multilib scheme. Could you explain in details how make install will install
the various gnatlibs in the new scheme ?


I guess everything will be more clear after the above: 
gcc/ada/Makefile.in is not changed, and both gcc/ada/Make-lang.in (once 
Laurent undoes that hunk) and libada/Makefile.in invoke it.


The difference introduced by multilibbing is just a few lines like these:

+   $(MULTIDO) DO=all multi-do # $(MAKE)

+   $(MULTIDO) DO=install multi-do # $(MAKE)

When multilibbing is disabled, MULTIDO=: and these lines is a no-op. 
When it is enabled, MULTIDO=$(MAKE) and they cause one recursive make 
invocation for each multilib (for example libada/32/Makefile).  The 
multilibs differ in the default values of CFLAGS, and they know about 
this difference via another variable, MULTISUBDIR (which is for example 
/32 for libada/32/Makefile).  This is what Laurent uses to 
conditionalize the PAIRS in gcc/ada/Makefile.in.


Review of the patch follows:



-   -if [ -f gnat1$(exeext) ] ; \
-   then \
- $(MAKE) $(FLAGS_TO_PASS) $(ADA_FLAGS_TO_PASS) install-gnatlib; \
-   fi
+#  -if [ -f gnat1$(exeext) ] ; \
+#  then \
+#$(MAKE) $(FLAGS_TO_PASS) $(ADA_FLAGS_TO_PASS) install-gnatlib; \
+#  fi
 
-install-gnatlib:

-   $(MAKE) -C ada $(FLAGS_TO_PASS) $(ADA_FLAGS_TO_PASS) 
install-gnatlib$(LIBGNAT_TARGET)
+#install-gnatlib:
+#  $(MAKE) -C ada $(FLAGS_TO_PASS) $(ADA_FLAGS_TO_PASS) 
install-gnatlib$(LIBGNAT_TARGET)
+#
+#install-gnatlib-obj:
+#  $(MAKE) -C ada $(FLAGS_TO_PASS) $(ADA_FLAGS_TO_PASS) install-gnatlib-obj
 
-install-gnatlib-obj:

-   $(MAKE) -C ada $(FLAGS_TO_PASS) $(ADA_FLAGS_TO_PASS) install-gnatlib-obj
-
 ada.install-man:


Revert as asked by Arnaud.



+ifeq ($(strip $(filter-out %x86_64 linux%,$(arch) $(osys))),)
+  ifeq ($(strip $(MULTISUBDIR)),/32)
+arch:=i686
+  endif
+endif


Just $(filter-out %x86_64, $(arch)).  No need to check for linux too, 
the /32 multilib name is pretty common.


The same should be enough for both powerpc64 and sparc64.



Index: libada/configure


Don't include regenerated files in the patch, only in the ChangeLog. :-)



+# GNU Make needs to see an explicit $(MAKE) variable in the command it
+# runs to enable its job server during parallel builds.  Hence the
+# comments below.
+
+all-multi:
+   $(MULTIDO) $(AM_MAKEFLAGS) DO=all multi-do # $(MAKE)
+install-multi:
+   $(MULTIDO) $(AM_MAKEFLAGS) DO=install multi-do # $(MAKE)
+
+.PHONY: all-multi install-multi
+
+
+mostlyclean-multi:
+   $(MULTICLEAN) $(AM_MAKEFLAGS) DO=mostlyclean multi-clean # $(MAKE)
+clean-multi:
+   $(MULTICLEAN) $(AM_MAKEFLAGS) DO=clean multi-clean # $(MAKE)
+distclean-multi:
+   $(MULTICLEAN) $(AM_MAKEFLAGS) DO=distclean multi-clean # $(MAKE)
+maintainer-clean-multi:
+   $(MULTICLEAN) $(AM_MAKEFLAGS) DO=maintainer-clean multi-clean # $(MAKE)
+
+.PHONY: mostlyclean-multi clean-multi distclean-multi maintainer-clean-multi
+
+install-exec-am: install-multi
+## No uninstall rule?
+
+
+## These cleaning rules are recursive.  They should not be
+## registered as dependencies of *-am rules.  For instance
+## otherwise running `make clean' would cause both
+## clean-multi and mostlyclean-multi to be run, while only
+## clean-multi is really expected (since clean-multi recursively
+## call clean, it already do the job of mostlyclean).
+mostlyclean: mostlyclean-multi
+clean: clean-multi
+distclean: distclean-multi
+maintainer-clean: maintainer-clean-multi


Not needed.


+

Re: [Ada] multilib patch take three

2008-07-25 Thread Paolo Bonzini



I volunteer to check if there is support for 
--enable-multilib=libstdc++-v3,libjava and if not add it. Unfortunately, 
--disable-multilib=ada cannot work (because --disable-xxx is the same as 
--enable-multilib=no).


Does that mean that libgcc is implicitely multilibed if --enable-multilib=
is used ?


No, I just meant "adding a parameter to --enable-multilib to specify 
what to multilib".


As an alternative, people that don't want multilibbed libada can not use 
libada altogether.  More on this in a second.


Still not clear to me what you mean here.


I was thinking about using --disable-libada and instead using the "make 
-C gcc gnatlib" target.  You will get no multilibs but I'm not 
up-to-date as to how you build the tools without libada nowadays.


Paolo

Re: [Ada] multilib patch take three

2008-07-25 Thread Paolo Bonzini


Laurent GUERBY wrote:

On Fri, 2008-07-25 at 10:55 +, Joseph S. Myers wrote:
i686-linux-gnu --enable-targets=all and x86_64-linux-gnu are equivalent, 
differing only in whether the default is 32-bit or 64-bit.  Do you select 
the right files for both multilibs of i686-linux-gnu, as well as for both 
multilibs of x86_64-linux-gnu?  (Some targets have the 32-bit-default as 
the only or the normal target, e.g. Solaris and Darwin.)


I didn't know about --enable-targets=all so I'd say this case is
not handled by the current pre-patch on i686 but I will work on adding
support for it.


I think you just need to check for a /64 multilib and change the arch 
accordingly.


Paolo

Re: lto gimple types and debug info

2008-07-28 Thread Paolo Bonzini


Mark Mitchell wrote:
For that matter, "print sizeof(X)" should print the 
same value when debugging optimized code as when debugging unoptimized 
code, even if the compiler has optimized X away to an empty structure!


I disagree.  sizeof(X) in the code will return a value as small as 
possible in that case (so that malloc-ing an array of structures) does 
not waste memory, and the debugger should do the same.


Paolo

Re: lto gimple types and debug info

2008-07-29 Thread Paolo Bonzini



For that matter, "print sizeof(X)" should print the same value when 
debugging optimized code as when debugging unoptimized code, even if 
the compiler has optimized X away to an empty structure!


I disagree.  sizeof(X) in the code will return a value as small as 
possible in that case (so that malloc-ing an array of structures) 
does not waste memory, and the debugger should do the same.


I don't think that's a viable option.  The value of sizeof(X) is a 
compile-time constant, specified by a combination of ISO C and platform 
ABI rules.  In C++, sizeof(X) can even be used as a (constant) template 
parameter, way before we get to any optimization.


Then you are right.  This adds another constraint...

Paolo

Support for OpenVMS host supports

2008-07-30 Thread Paolo Bonzini

I stumbled in this while looking at how x-* host files are used.  There 
are two files in this configuration that "must be compiled with DEC C".


One is vcrt0.o, which has about 5 lines of executable code.  This makes 
me think that it would be best if someone with access to the OS would 
compile it, so that we can put assembly-language source code for it.


The other is pcrt0.o, which AFAICT had a syntax error since its inception:

 48001 kenner __main (arg1, arg2, arg3, image_file_desc, arg5, arg6)
 48001 kenner  void *arg1, *arg2, *arg3;
 48001 kenner  void *image_file_desc;
 48001 kenner  void *arg5, *arg6)

So, the question is: do we care about this target?  Maybe AdaCore has 
patches to fix it?


Paolo

Re: GCC 4.3.2 Status Report (2008-07-31)

2008-07-31 Thread Paolo Bonzini




Priority  # Change from Last Report
--- ---
P13 -  5
P2  115 -  2
P32 -  1
--- ---
Total   120 -  8


PR35752, which is a P2 regression caused by libtool, is waiting for 
approval upstream.  Should we make an exception to the usual rules and 
apply the fix on the branch?


Paolo

Re: GCC 4.3.2 Status Report (2008-07-31)

2008-07-31 Thread Paolo Bonzini




As we are approaching the intended release date of 4.3.2 we need to
address the P1 bugs or downgrade them accordingly.  Two of the P1s
have patches posted (more than 3 resp. 2 weeks ago), so they just
need reviewing.


For the record, these are:

http://gcc.gnu.org/ml/gcc-patches/2008-07/msg00722.html
reload.c (CCing Ulrich Weigand)

http://gcc.gnu.org/ml/gcc-patches/2008-06/msg01305.html
dwarf2out.c (CCing Jason Merrill)

Paolo

Re: GCC 4.3.2 Status Report (2008-07-31)

2008-07-31 Thread Paolo Bonzini


Ralf Wildenhues wrote:

Hi Paolo,

* Paolo Bonzini wrote on Thu, Jul 31, 2008 at 02:53:21PM CEST:
PR35752, which is a P2 regression caused by libtool, is waiting for  
approval upstream.  Should we make an exception to the usual rules and  
apply the fix on the branch?


If by exception to the usual rule, you mean that you would like to apply
the patch to GCC before it's accepted in Libtool


And only on the branch.

Tomorrow it's a public holiday here, so I wouldn't apply the patch 
before monday anyway.


Paolo

Re: configuring in-tree gmp/mpfr with "none"?

2008-07-31 Thread Paolo Bonzini


Jay wrote:

Andrew, Can you explain more why?


Because at some point, no released version worked on intel macs.


And then gmp/configure runs flex.
And then sometimes?always flex tries to run getenv("M4") || "m4".


Yes, Flex uses m4.


gmp/configure probably should not be setting M4


Yes, I think that setting M4=m4-not-needed should be done only for 
debugging purposes.  Otherwise, GMP should always look for m4 in its 
configure script, and set it to a valid value in the makefile.


Paolo

Re: GNAT build failure on cross

2008-08-01 Thread Paolo Bonzini


Arnaud Charlet wrote:

Any suggestions?


I would double check that you are indeed using the freshly built
corresponding native. Maybe your native installation didn't work as
expected and your building from an older compiler. That's the
most likely explanation.

Alternatively, there have been changes recently in the libada and ada
Makefiles by Paolo Bonzini that might be related.


I agree with Arnaud that the most likely explanation is 
not-recent-enough native tools.  But you can try reverting to r138299 to 
see if my patches are at fault.


Paolo

Re: configuring in-tree gmp/mpfr with "none"?

2008-08-04 Thread Paolo Bonzini


Jay wrote:

Because at some point, no released version worked on intel macs.


Long since passed and can be removed?


I don't think so, http://gmp.darwinports.com/ shows that it is still a 
problem with 4.2.2.  Besides, GMP's authors say that it is often a 
stress test for compilers, so using more C and less assembly can be a 
good thing (GCC's usage of GMP does not include manipulating really 
really huge numbers).



gmp/configure is where the blame really lies, but if gcc configured gmp 
"normally",
this wouldn't occur. Or, is cpu=none not so abnormal? Just that I hadn't seen 
it?


It's a GMP-only thing.

Given that this is a problem because of Python's apparently broken 
handling of signals, we cannot do anything about it.  Complain with the 
Python maintainers that they should reset the signals they ignore, 
before exec-ing another program.


Paolo

Re: Update libtool?

2008-08-06 Thread Paolo Bonzini




That said, updating in trunk is a different matter.  There, the question
IMHO is mostly which libtool version to update to.  The git version may
still have a regression or two, but 2.2.4 doesn't have the -fPIC on
HP/IA patch from Steve (which would be trivial to backport of course).
Alternatively GCC can wait for 2.2.6 (hopefully in the "couple of weeks
at most" time frame).


Updating to 2.2.6 would be okay for me.

Paolo

Re: Richard Sandiford appointed RTL maintainer

2011-06-28 Thread Paolo Bonzini


On 06/28/2011 03:52 PM, Vladimir Makarov wrote:

They are complicated, solving NP-problems in heuristic ways.


I totally trust that people like Eric or Richard would _not_ approve 
changes to those heuristics without contacting you or others.


On the other hand, I would totally trust them to approve expander 
patches, but they were said historically to fall outside the realm of 
RTL maintainership. :)


Paolo

Re: GSOC - Student Roundup

2011-07-07 Thread Paolo Bonzini


On 07/05/2011 06:58 PM, Dimitrios Apostolou wrote:

The level of my understanding of this part is still basic, I've now only
scratched the surface of Dataflow Analysis.


Well you're not looking at df proper, which is mostly a textbook 
implementation with some quirks; you're looking at RTL operand scanning, 
which should indeed have a big "here be dragons" sign on it.  But you're 
doing fine. :)


Paolo

Re: A visualization of GCC's passes, as a subway map

2011-07-12 Thread Paolo Bonzini


On 07/11/2011 07:56 PM, David Malcolm wrote:

Hope this is fun/helpful (and that I'm correctly interpreting the data!)


You are, and it shows some bugs even.  gimple_lcx is obviously destroyed 
by expand, and I find it unlikely that no pass ever introduces a 
critical edge...


Paolo

Re: C++ bootstrap of GCC - still useful ?

2011-07-12 Thread Paolo Bonzini

On 07/12/2011 08:54 AM, Arnaud Charlet wrote:
>> I'm not sure because I don't think we want to compile the C files of the Ada
>> >  runtime with the C++ compiler.  We want to do that only for the compiler.
> 
> Right, we definitely don't want to use the C++ compiler for building the
> Ada run-time.

But apparently they already are (when building the compiler), otherwise
the patch in http://gcc.gnu.org/ml/gcc/2009-06/txt4.txt would make
no sense:

Index: gcc/ada/env.c
===
--- gcc/ada/env.c   (revision 148953)
+++ gcc/ada/env.c   (working copy)
@@ -29,6 +29,11 @@
  *  *
  /

+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef IN_RTS
 #include "tconfig.h"
 #include "tsystem.h"
@@ -313,3 +318,7 @@
   clearenv ();
 #endif
 }
+
+#ifdef __cplusplus
+}
+#endif

Perhaps it is better to always build those files with cc, perhaps not.
Since there are two versions of the Ada RTL, the one in the compiler and
the one in libada, my questions are:

1) Do they share any object files when not cross-compiling?

2) If not, is using C++ for the former okay?

If the answers are "no" and "yes" respectively, I still think a patch
like the one I suggested, where the host files in gcc/ are uniformly
compiled with C++, is preferrable. You do need to force usage of a C
compiler when compiling libada:

Index: Makefile.in
===
--- Makefile.in (revision 169877)
+++ Makefile.in (working copy)
@@ -2451,6 +2451,7 @@ gnatlib: ../stamp-gnatlib1-$(RTSDIR) ../
$(MAKE) -C $(RTSDIR) \
CC="`echo \"$(GCC_FOR_TARGET)\" \
| sed -e 's,\./xgcc,../../xgcc,' -e 's,-B\./,-B../../,'`" \
+   ENABLE_BUILD_WITH_CXX=no \
INCLUDES="$(INCLUDES_FOR_SUBDIR) -I./../.." \
 CFLAGS="$(GNATLIBCFLAGS_FOR_C)" \
FORCE_DEBUG_ADAFLAGS="$(FORCE_DEBUG_ADAFLAGS)" \
@@ -2459,6 +2460,7 @@ gnatlib: ../stamp-gnatlib1-$(RTSDIR) ../
$(MAKE) -C $(RTSDIR) \
CC="`echo \"$(GCC_FOR_TARGET)\" \
| sed -e 's,\./xgcc,../../xgcc,' -e 's,-B\./,-B../../,'`" \
+   ENABLE_BUILD_WITH_CXX=no \
ADA_INCLUDES="" \
 CFLAGS="$(GNATLIBCFLAGS)" \
ADAFLAGS="$(GNATLIBFLAGS)" \

And of course extern "C" needs to be added to the headers, so that public
symbols used by compiled Ada source are not mangled. However, static and
private symbols need not be extern "C".

Paolo

Re: C++ bootstrap of GCC - still useful ?

2011-07-12 Thread Paolo Bonzini


On 07/12/2011 10:00 AM, Eric Botcazou wrote:

But your patch isn't necessary to do that, the C files are already compiled
with the C++ compiler as of today; the only issue is at the linking stage.


The problem is that the patches links gnattools unconditionally with 
g++.  It should depend on --enable-build-with-cxx instead.


Paolo

Re: A visualization of GCC's passes, as a subway map

2011-07-12 Thread Paolo Bonzini


On 07/12/2011 10:43 AM, Paulo J. Matos wrote:



Hope this is fun/helpful (and that I'm correctly interpreting the data!)


You are, and it shows some bugs even. gimple_lcx is obviously destroyed
by expand, and I find it unlikely that no pass ever introduces a
critical edge...



But the diagram shows gimple_lcx stopping at expand but continuing its
lifetime through RTL passes (so gimple_lcx according to the diagram is
_not_ destroyed by expand). So, I am left wondering if the bug is in the
diagram or GCC.


It shows bugs in GCC's pass description, to be clear.

Paolo

Re: A visualization of GCC's passes, as a subway map

2011-07-13 Thread Paolo Bonzini


On 07/12/2011 06:07 PM, David Malcolm wrote:

On this build of GCC (standard Fedora 15 gcc package of 4.6.0), the
relevant part of cfgexpand.c looks like this:

struct rtl_opt_pass pass_expand =
{
  {
   RTL_PASS,
   "expand",  /* name */

[...snip...]

   PROP_ssa | PROP_gimple_leh | PROP_cfg
 | PROP_gimple_lcx, /* properties_required */
   PROP_rtl, /* properties_provided */
   PROP_ssa | PROP_trees,   /* properties_destroyed */

[...snip...]

}

and gcc/tree-pass.h has:
   #define PROP_trees \
 (PROP_gimple_any | PROP_gimple_lcf | PROP_gimple_leh |  PROP_gimple_lomp)

and that matches up with both the diagram, and the entry for "expand" in
the table below [1].

So it seems that the diagram is correctly accessing the
"properties_destroyed" data for the "expand" pass; does PROP_gimple_lcx
need to be added somewhere?  (or should the diagram we taught to
specialcase some things, perhaps?)


Yes, PROP_gimple_lcx needs to be added to PROP_trees.  I cannot approve 
the patch, unfortunately.


Also, several passes are likely lacking PROP_crited in their 
properties_destroyed.  At least all those that can be followed by 
TODO_cleanup_cfg:


* pass_split_functions
* pass_call_cdcen
* pass_build_cfg
* pass_cleanup_eh
* pass_if_conversion
* pass_ipa_inline
* pass_early_inline
* pass_fixup_cfg
* pass_cse_sincos
* pass_predcom
* pass_lim
* pass_loop_prefetch
* pass_vectorize
* pass_iv_canon
* pass_tree_unswitch
* pass_vrp
* pass_sra_early
* pass_sra
* pass_early_ipa_sra
* pass_ccp
* pass_fold_builtins
* pass_copy_prop
* pass_dce
* pass_dce_loop
* pass_cd_dce
* pass_dominator
* pass_phi_only_cprop
* pass_forwprop
* pass_tree_ifcombine
* pass_scev_cprop
* pass_parallelize_loops
* pass_ch
* pass_cselim
* pass_pre
* pass_fre
* pass_tail_recursion
* pass_tail_calls

Paolo

Re: A visualization of GCC's passes, as a subway map

2011-07-13 Thread Paolo Bonzini

On 07/13/2011 12:54 PM, Richard Guenther wrote:

>  Yes, PROP_gimple_lcx needs to be added to PROP_trees.  I cannot approve the
>  patch, unfortunately.

Hm, why?  complex operations are lowered after a complex lowering pass
has executed.  they are still lowered on RTL, so I don't see why we need
to destroy them technically.

Because it's PROP_*gimple*_lcx. :)

Paolo

Re: A visualization of GCC's passes, as a subway map

2011-07-14 Thread Paolo Bonzini


On 07/14/2011 11:11 AM, Richard Guenther wrote:

>>  Hm, why?  complex operations are lowered after a complex lowering pass
>>  has executed.  they are still lowered on RTL, so I don't see why we need
>>  to destroy them technically.

>
>  Because it's PROP_*gimple*_lcx.:)

Shouldn't it then be PROP_*gimple*  instead of PROP_*trees*?;)


Heh, you have a point!

Paolo

Re: [RFC] Remove -freorder-blocks-and-partition

2011-07-25 Thread Paolo Bonzini


On 07/25/2011 06:42 AM, Xinliang David Li wrote:

FYI  the performance impact of this option with SPEC06 (built with
google_46 compiler and measured on a core2 box).  The base line number
is FDO, and ref number is FDO + reorder_with_partitioning.

xalancbmk improves>  3.5%
perlbench improves>  1.5%
dealII and bzip2 degrades about 1.4%.

Note the partitioning scheme is not tuned at all -- there is not even
a tunable parameter to play with.


Did you check what is pushed down to the cold section in these cases?

Paolo

Re: [RFC] Remove -freorder-blocks-and-partition

2011-07-26 Thread Paolo Bonzini

On 07/27/2011 06:51 AM, Xinliang David Li wrote:

>  If we could retain most of the speedups when the optimization works well
>  but avoid most of the slowdown in the benchmarks that are currently hurt,
>  we could improve the overall SPEC06 score.  And hopefully, this would
>  also be beneficial to other code.

Agree.  There are certainly problems in the partition pass, as for
bzip2 the icache misses actually go up with partition, which is not
expected. It needs further analysis.

It's probably too aggressive.  Icache misses go up because a) the 
overall size of the executable grows; b) cold parts are probably not 
cold enough in the case of bzip2.f

Paolo

Re: RFC: PATCH: Require and use int64 for x86 options

2011-07-29 Thread Paolo Bonzini


On 07/27/2011 06:42 PM, H.J. Lu wrote:

+   if (max == 64)
+ var_mask_1[var] = "1LL"


This must be ((HOST_WIDE_INT)1).

Paolo

Re: libgcc: strange optimization

2011-08-06 Thread Paolo Bonzini

On 08/04/2011 01:10 PM, Andrew Haley wrote:

>>  It's the sort of thing that gets done in threaded interpreters,
>>  where you really need to keep a few pointers in registers and
>>  the interpreter itself is a very long function.  gcc has always
>>  done a dreadful job of register allocation in such cases.

>
>  Sure, but what I have seen people use global register variables
>  for this (which means they get taken away from the register allocator).

Not always though, and the x86 has so few registers that using a
global register variable is very problematic.  I suppose you could
compile the threaded interpreter in a file of its own, but I'm not
sure that has quite the same semantics as local register variables.

Indeed, local register variables give almost the same benefit as globals 
with half the burden.  The idea is that you don't care about the exact 
register that holds the contents but, by specifying a callee-save 
register, GCC will use those instead of memory across calls.  This 
reduces _a lot_ the number of spills.

The problem is that people who care about this stuff very much don't
always read...@gcc.gnu.org  so won't be heard.  But in their own world
(LISP, Forth) nice features like register variables and labels as
values have led to gcc being the preferred compiler for this kind of
work.

/me raises hands.

For GNU Smalltalk, using

#if defined(__i386__)
# define __DECL_REG1 __asm("%esi")
# define __DECL_REG2 __asm("%edi")
# define __DECL_REG3 /* no more caller-save regs if PIC is in use!  */
#endif

#if defined(__x86_64__)
# define __DECL_REG1 __asm("%r12")
# define __DECL_REG2 __asm("%r13")
# define __DECL_REG3 __asm("%rbx")
#endif

...

  register unsigned char *ip __DECL_REG1;
  register OOP * sp __DECL_REG2;
  register intptr_t arg __DECL_REG3;

improves performance by up to 20% if I remember correctly.  I can 
benchmark it if desired.

It does not come for free, in some cases the register allocator does 
some stupid things due to the hard register declaration.  But it gets 
much better code overall, so who cares about the microoptimization.

Of course, if the register allocator did the right thing, or if I could 
use simply

  unsigned char *ip __attribute__(__do_not_spill_me__(20)));
  OOP *sp __attribute__(__do_not_spill_me__(10)));
  intptr_t arg __attrbite__(__do_not_spill_me__(0)));

that would be just fine.

Paolo

Re: libgcc: strange optimization

2011-08-08 Thread Paolo Bonzini


On 08/08/2011 10:06 AM, Richard Guenther wrote:

Like if

register unsigned char *ip;

would increase spill cost of ip compared to

unsigned char *ip;

?


Remember we're talking about a function with 11000 pseudos and 4000 
allocnos (not to mention a 1500 basic blocks).  You cannot really blame 
IRA for not doing the right thing.  And actually, ip and sp are live 
everywhere, so there's no hope of reserving a register for them, 
especially since all x86 callee-save registers have special uses in 
string functions.


If I understand the huge dumps correctly, the missing part is trying to 
use callee-save registers for spilling, rather than memory.  However, 
perhaps another way to do it is a specialized region management scheme 
for large switch statements, treating each switch arm as a separate 
region??  There are few registers live across the switch, and all of 
them are used either "a lot" or "almost never" (and always in cold blocks).


BTW, here are some measurements on x86-64:

1) with regalloc hints: 450060432 bytecodes/sec; 12819996 calls/sec
2) without regalloc hints: 263002439 bytecodes/sec; 9458816 sends/sec

Probably even worse on x86-32.

None of -fira-region=all, -fira-region=one, -fira-algorithm=priority had 
significant changes.  In fact, it's pretty much a "binary" result: I'd 
expect register allocation results to be either on par with (1) or 
similar to (2); everything else is mostly noise.


Paolo

Re: Just what are rtx costs?

2011-08-17 Thread Paolo Bonzini


On 08/17/2011 07:52 AM, Richard Sandiford wrote:

   cost = rtx_cost (SET_SRC (set), SET, speed);
   return cost>  0 ? cost : COSTS_N_INSNS (1);

This ignores SET_DEST (the problem I'm trying to fix).  It also means
that constants that are slightly more expensive than a register --
somewhere in the range [0, COSTS_N_INSNS (1)] -- end up seeming
cheaper than registers.


This can be fixed by doing

  return cost >= COSTS_N_INSNS (1) ? cost : COSTS_N_INSNS (1);


One approach I'm trying is to make sure that every target that doesn't
explicitly handle SET does nothing with it.  (Targets that do handle
SET remain unchanged.)  Then, if we see a SET whose SET_SRC is a
register, constant, memory or subreg, we give it cost:

 COSTS_N_INSNS (1)
 + rtx_cost (SET_DEST (x), SET, speed)
 + rtx_cost (SET_SRC (x), SET, speed)

as now.  In other cases we give it a cost of:

 rtx_cost (SET_DEST (x), SET, speed)
 + rtx_cost (SET_SRC (x), SET, speed)

But that hardly seems clean either.  Perhaps we should instead make
the SET_SRC always include the cost of the SET, even for registers,
constants and the like.  Thoughts?


Similarly, this becomes

  dest_cost = rtx_cost (SET_DEST (x), SET, speed);
  src_cost = MAX (rtx_cost (SET_SRC (x), SET, speed),
  COSTS_N_INSNS (1));
  return dest_cost + src_cost;

How does this look?

Paolo

should sync builtins be full optimization barriers?

2011-09-09 Thread Paolo Bonzini


Hi all,

sync builtins are described in the documentations as being full memory 
barriers, with the possible exception of __sync_lock_test_and_set. 
However, GCC is not enforcing the fact that they are also full 
_optimization_ barriers.  The RTL produced by builtins does not in 
general include a memory optimization barrier such as a set of 
(mem/v:BLK (scratch:P)).


This can cause problems with lock-free algorithms, for example this:

http://libdispatch.macosforge.org/trac/ticket/35

This can be solved either in generic code, by wrapping sync builtins 
(before and after) with an asm("":::"memory"), or in the single machine 
descriptions by adding a memory barrier in parallel to the locked 
instructions or with the ll/sc instructions.


Is the above analysis correct?  Or should the users put explicit 
compiler barriers?


Paolo

Re: should sync builtins be full optimization barriers?

2011-09-09 Thread Paolo Bonzini

On 09/09/2011 10:17 AM, Jakub Jelinek wrote:

>  Is the above analysis correct?  Or should the users put explicit
>  compiler barriers?

I'd say they should be optimization barriers too (and at the tree level
they I think work that way, being represented as function calls), so if
they don't act as memory barriers in RTL, the *.md patterns should be
fixed.  The only exception should be IMHO the __SYNC_MEM_RELAXED
variants - if the CPU can reorder memory accesses across them at will,
why shouldn't the compiler be able to do the same as well?

Agreed, so we have a bug in all released versions of GCC. :(

Paolo

Re: should sync builtins be full optimization barriers?

2011-09-09 Thread Paolo Bonzini


On 09/09/2011 04:22 PM, Andrew MacLeod wrote:



Yeah, some of this is part of the ongoing C++0x work... the memory model
parameter is going to allow certain types of code movement in optimizers
based on whether its an acquire operation, a release operation, neither,
or both.It is ongoing and hopefully we will eventually have proper
consistency.  The older __sync builtins are eventually going to invoke
the new__sync_mem routines and their new patterns, but will fall back to
the old ones if new patterns aren't specified.

In the case of your program, this would in fact be a valid
transformation I believe...  __sync_lock_test_and_set is documented to
only have ACQUIRE semantics.


Yes, that's true.  However, there's nothing special in the compiler to 
handle __sync_lock_test_and_set differently (optimization-wise) from say 
__sync_fetch_and_add.



I don't see anything in this pattern however that would enforce acquire
mode and prevent the reverse operation.. moving something from after to
before it... so there may be a bug there anyway.


Yes.


And I suspect most people actually expect all the old __sync routines to
be full optimization barriers all the time...  maybe we should consider
just doing that...


That would be very nice.  I would like to introduce that kind of data 
structure in QEMU, too. :)


Paolo

Re: should sync builtins be full optimization barriers?

2011-09-09 Thread Paolo Bonzini

On Sat, Sep 10, 2011 at 03:09, Geert Bosch  wrote:
> For example, for atomic objects accessed only from a single processor
> (but  possibly multiple threads), you'd not want the compiler to reorder
> memory accesses to global variables across the atomic operations, but
> you wouldn't have  to emit the expensive fences.

I am not 100% sure, but I tend to disagree.  The original bug report
can be represented as

   node->next = NULL [relaxed];
   xchg(tail, node) [seq_cst];

and the problem was that the two operations were swapped.  But that's
not a problem with the first access, but rather with the second.  So
it should be fine if the  [relaxed] access does not include a barrier,
because it relies on the [seq_cst] access providing it later.

Paolo

Re: should sync builtins be full optimization barriers?

2011-09-11 Thread Paolo Bonzini


On 09/11/2011 04:12 PM, Andrew MacLeod wrote:

tail->value = othervalue   // global variable write
atomic_exchange (&var, tail)   // acquire operation

although the optimizer moving the store of tail->value to AFTER the
exchange seems very wrong on the surface, it's really emulating what
another thread could possibly see.When another thread synchronizes
and reads 'var', an acquire operation doesn't cause outstanding stores
to be fully flushed, so the other process has no guarantee that the
store to tail->value has happened yet even though it gets the expected
value of 'var'.


You're right that using lock_test_and_set as an exchange is very wrong 
because of the compiler barrier semantics, but I think this is entirely 
a red herring in this case.  The same problem could happen with a 
fetch_and_add or even a lock_release operation.


Paolo

Re: should sync builtins be full optimization barriers?

2011-09-12 Thread Paolo Bonzini


On 09/11/2011 09:00 PM, Geert Bosch wrote:

So, if I understand correctly, then operations using relaxed memory
order will still need fences, but indeed do not require any
optimization barrier. For memory_order_seq_cst we'll need a full
barrier, and for the others there is a partial barrier.


If you do not need an optimization barrier, you do not need a processor 
barrier either, and vice versa.  Optimizations are just another factor 
that can lead to reordered loads and stores.


Paolo

Re: should sync builtins be full optimization barriers?

2011-09-12 Thread Paolo Bonzini


On 09/12/2011 01:22 AM, Andrew MacLeod wrote:

You're right that using lock_test_and_set as an exchange is very wrong
because of the compiler barrier semantics, but I think this is
entirely a red herring in this case.  The same problem could happen
with a fetch_and_add or even a lock_release operation.


My point is that if even once we get the right barriers in place, due to
its definition as acquire, this testcase could actually still fail, AND
the optimization is valid...


Ah, sure.


unless we decide to retroactively make
all the original sync routine set_cst.


I've certainly seen code using lock_test_and_set to avoid asm for xchg. 
 That would be very much against the documentation with respect to the 
values of the second parameter, and that's also why clang introduced 
__sync_swap.  However, perhaps it makes sense to make lock_test_and_set 
provide sequential consistency.


Probably not much so for lock_release, which is quite clearly a 
store-release.


Paolo

Re: should sync builtins be full optimization barriers?

2011-09-12 Thread Paolo Bonzini

On Mon, Sep 12, 2011 at 20:40, Geert Bosch  wrote:

> Assuming that statement is true, that would imply that even for relaxed
> ordering there has to be an optimization barrier. Clearly fences need to be
> used for any atomic accesses, including those with relaxed memory order.
>
> Consider 4 threads and an atomic int x:
>
> thread 1  thread 2  thread 3  thread 4
>       
>  x=1;      r1=x      x=3;      r3=x;
>  x=2;      r2=x      x=4;      r4=x;
>
> Even with relaxed memory ordering, all modifications to x have to occur in 
> some particular total order, called  the modification order of x.
>
> So, if r1==2,r2==3 and r3==4,r4==1, that would be an error. However,
> without fences, this can easily happen on an SMP machine, even one with
> a nice memory model such as the x86.

How?  (Honest question).  All stores are to the same location.  I
don't see how that can happen without processor fences, much less
without optimization fences.

Paolo

Re: should sync builtins be full optimization barriers?

2011-09-12 Thread Paolo Bonzini

On Tue, Sep 13, 2011 at 03:52, Geert Bosch  wrote:
> No, it is possible, and actually likely. Basically, the issue is write 
> buffers.
> The coherency mechanisms come into play at a lower level in the
> hierarchy (typically at the last-level cache), which is why we need fences
> to start with to implement things like spin locks.

You need fences on x86 to implement Petterson or Dekkar spin locks but
only because they involve write-read ordering to different memory
locations (I'm mentioning those spin lock algorithms because they do
not require locked memory accesses).  Write-write, read-read and for
the same location write-read ordering are guaranteed by the processor.
 Same for coherency which is a looser property.

However, accesses in the those spin lock algorithm are definitely
_not_ relaxed; not all of them, at least.

> No that's false. Even on systems with nice memory models, such as x86
> and SPARC with a TSO model, you need a fence to avoid that a write-load
> of the same location is forced to make it all the way to coherent memory
> and not forwarded directly from the write buffer or L1 cache.

Not sure about SPARC, but this is definitely false on x86.

Granted, even if you do not have to put fences those writes are likely
_not_ free.  The processor needs to do more than say on PPC, so I
wouldn't be surprised if conflicting memory accesses are quite more
expensive on x86 than PPC.  Recently, a colleague of mine tried
replacing optimization barriers with full barriers in one of two
threads implementing a ring buffer; that thread was now 30% slower,
but the other thread sped up by basically the same time.

Paolo

Re: should sync builtins be full optimization barriers?

2011-09-15 Thread Paolo Bonzini


On 09/15/2011 06:19 PM, Richard Henderson wrote:

I wouldn't go that far.  They *used* to be compiler barriers,
but clearly something broke at some point without anyone noticing.
We don't know how many versions are affected until we debug it.
For all we know it broke in 4.5 and 4.4 is fine.


4.4 is not necessarily fine, it may also be that an unrelated 4.5 change 
exposed a latent bug.


But indeed Richard Sandiford mentioned offlist that perhaps 
ALIAS_SET_MEMORY_BARRIER machinery broke.  Fixing the bug in 4.5/4.6/4.7 
will definitely shed more light.



There's no reference to a GCC bug report about this in the thread.
Did the folks over at the libdispatch project never think to file one?


I asked them to attach a preprocessed testcase somewhere, but they 
haven't done so yet. :(


Paolo

Re: should sync builtins be full optimization barriers?

2011-09-20 Thread Paolo Bonzini


On 09/15/2011 06:26 PM, Paolo Bonzini wrote:



There's no reference to a GCC bug report about this in the thread.
Did the folks over at the libdispatch project never think to file one?


I asked them to attach a preprocessed testcase somewhere, but they
haven't done so yet. :(


They now attached it, and the bug turns out to be a missing parenthesis 
in an #ifdef.  This made libdispatch compile the xchg as an asm rather 
than a sync builtin.  And of course the asm was wrong.


Apparently, Apple people on the mailing list were looking at the Apple 
trunk, but the reporter was obviously compiling from the public trunk.


Paolo

Re: Question on cse_not_expected in explow.c:memory_address_addr_space()

2011-09-30 Thread Paolo Bonzini


On 09/28/2011 02:14 PM, Georg-Johann Lay wrote:

This leads to unpleasant code. The machine can access all RAM locations by
direct addressing. However, the resulting code is:

foo:
ldi r24,lo8(-86) ;  6   *movqi/2[length = 1]
ldi r30,lo8(-64) ;  34  *movhi/5[length = 2]
ldi r31,lo8(10)
std Z+3,r24  ;  7   *movqi/3[length = 1]
.L2:
lds r24,2754 ;  10  *movqi/4[length = 2]
sbrs r24,7   ;  43  *sbrx_branchhi  [length = 2]
rjmp .L2
ldi r24,lo8(-69) ;  16  *movqi/2[length = 1]
ldi r30,lo8(-64) ;  33  *movhi/5[length = 2]
ldi r31,lo8(10)
std Z+3,r24  ;  17  *movqi/3[length = 1]
.L3:
lds r24,2754 ;  20  *movqi/4[length = 2]
sbrs r24,7   ;  42  *sbrx_branchhi  [length = 2]
rjmp .L3
ret  ;  39  return  [length = 1]

Insn 34 loads 2752 (0xAC0) to r30/r31 (Z) and does an indirect access (*(Z+3),
i.e. *2755) in insn 7.  The same happens in insn 33 (load 2752) and access
(insn 17).

Is there a way to avoid this? I tried -f[no-]rerun-cse-after-loop but without
effect, same for -Os/-O2 and trying to patch rtx_costs. cse_not_expected is
overridden in some places in the middle-end.


fwprop should take care of propagating the address.  Have you tried 
patching address_costs?  Might be as simple as this (untested):


Index: avr.c
===
--- avr.c   (revision 177688)
+++ avr.c   (working copy)
@@ -5986,8 +5986,8 @@ avr_address_cost (rtx x, bool speed ATTR
 return 18;
   if (CONSTANT_ADDRESS_P (x))
 {
-  if (optimize > 0 && io_address_operand (x, QImode))
-   return 2;
+  if (optimize > 0)
+   return io_address_operand (x, QImode) ? 2 : 3;
   return 4;
 }
   return 4;

Paolo

Re: IRA changes rules of the game

2011-10-21 Thread Paolo Bonzini


On 10/20/2011 07:46 PM, Paulo J. Matos wrote:


However, it failed to compile libgcc with:
../../../../../../../devHost/gcc46/gcc/libgcc/../gcc/libgcc2.c:272:1:
internal compiler error: in df_uses_record, at df-scan.c:3178

This feels like a GCC bug. I will try to get a better look at it tomorrow.


What's the SET it is failing on?

Paolo

Re: libgcc: why emutls.c in LIB2ADDEH instead of LIB2ADD?

2011-11-21 Thread Paolo Bonzini

On 11/21/2011 01:54 AM, Richard Henderson wrote:

>  Emulating TLS has nothing to do with exception-handling, nor is
>  there something that might throw while calling one of its
>  functions.
>
>  Ok to fix that?

Not without further study.  There was a reason we wanted these
in libgcc_eh.a.  I can't recall exactly why at the moment; it
should be in the archives...

Nope, the first version at 
http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00903.html already had it 
in LIB2ADDEH*.  Perhaps Jakub has some ideas too.

H-P, can you try bootstrapping your patch on cygwin and/or mingw too 
before applying it?

Paolo

Re: [PATCH 1/3] colorize: use isatty module

2012-01-03 Thread Paolo Bonzini


On 01/03/2012 09:48 AM, Jim Meyering wrote:

Paolo Bonzini wrote:


* bootstrap.conf: Add isatty module.
* gnulib: Update to latest.
* lib/colorize.h: Remove argument from should_colorize.
* lib/ms/colorize.h: Likewise.
* lib/colorize-impl.c: Factor isatty call out of here...
* lib/ms/colorize-impl.c: ... and here...
* src/main.c: ... into here.


Hi Paolo,
At least with gcc-4.7.0 20120102, a warning-enabled build now fails like this:

   colorize.c: In function 'init_colorize':
   colorize.c:37:6: error: function might be candidate for attribute 'const' 
[-Werror=suggest-attribute=const]
   cc1: all warnings being treated as errors


Thanks, my GCC is indeed older.

Perhaps GCC should be changed to avoid the warning on functions 
returning void.  If a void function can be const, it pretty much has to 
be empty, and so it is quite likely a placeholder for something that is 
not const.


Paolo

Re: Renaming Stage 1 and Stage 3

2012-06-11 Thread Paolo Bonzini

Il 11/06/2012 11:18, Richard Guenther ha scritto:
> > Instead of renaming Stage 3 to Stage 2 at that point we figured that
> > using different terminology would reduce confusion.  I am not wedded
> > to Stage A and B, though this seems to be the most straightforward
> > option (over colors, Alpha and Beta carrying a different meaning in
> > software development,...).
> >
> Eh - why not give them names with an actual meaning? "Development Stage"
> and "Stabilizing Stage"?  I realize those are rather long names, but you
> can always put short forms in tables, like Dev Stage and Stab Stage.

Or just "Development" and "Feature freeze"?

Paolo

LC_COLLATE (was Re: SVN Test Repo updated)

2005-02-17 Thread Paolo Bonzini

The sort alghorithm has nothing to do with ls, but with your selection of
LC_COLLATE.  But then, BSD (at least the variant used in MacOSX) is way
behind current l10n standards.
At least they do not break s/[A-Z]// which on "well-internationalized" 
OSes is case-insensitive with most locales other than C.

I still haven't dug enough to understand if the responsible for this is 
the POSIX specification for localization, the ANSI specification for 
strcoll, or somebody in the glibc team.  But I know that it was the 
most-reported sed "bug" before I explicitly flagged it as a non-bug in 
the manual.

I can only guess the outcry if Perl started obeying LC_COLLATE.
Paolo

Re: LC_COLLATE (was Re: SVN Test Repo updated)

2005-02-17 Thread Paolo Bonzini

I can only guess the outcry if Perl started obeying LC_COLLATE.
What do you mean, "started"? It's been doing that for years now.
   By default, Perl ignores the current locale.  The "use locale" pragma
   tells Perl to use the current locale for some operations:
and these do not include regex character ranges.  LC_COLLATE would only 
be used for sorting and for string comparisons.

Paolo

Project submission for GCC 4.1 - AltiVec rewrite

2005-02-19 Thread Paolo Bonzini

I had already submitted this to Mark, but since I have improved a few 
rough spots in the code I think it's better to make it public.

* Project Title
AltiVec rewrite.
* Project Contributors
Paolo Bonzini
* Dependencies
none
> * Delivery Date
March 15th or earlier (the implementation is complete and has no 
regressions).

> * Description
The project reimplements the AltiVec vector primitives in a saner way,
without putting the burden on the preprocessor and instead processing
the "overloading" in the C front-end.
This would benefit compilation speed on AltiVec vector code, and move
the big table of AltiVec overloading pairs from an installed header
file to the compiler (an 11000-line header file is reduced to 500 lines
plus 2500 in the compiler).
The changes are so far self contained in the PowerPC backend, but I
would expect that a hack that I am using will require to be changed upon
review.  Unfortunately, a previous RFC I posted on the gcc mailing list
had (almost) no answer.
I plan to take a look at apple-ppc-branch, which supposedly does not 
need this hack, or to ask for feedback when I submit the project.

The current implementation improves the existing implementation in that
anything but predicates will accept unparenthesized literals even in C.
This line:
  vec_add (x, (vector unsigned int) {1, 2, 3, 4})
now fails in C and works in C++, but with the new implementation would 
work in C as well.  On the other hand, using a predicate like this

  vec_all_eq (x, (vector unsigned int) {1, 2, 3, 4})
will still not work in C (it will *not* be a regression in C++, where it 
will be okay both without and with my changes).  It would have to be 
written as

  vec_all_eq (x, ((vector unsigned int) {1, 2, 3, 4}))
exactly as in the current implementation.
Paolo

Re: MMX built-ins performance oddities

2005-02-20 Thread Paolo Bonzini

- vector version is about 3% faster than above instead of 10% slower - wow!
So why is gcc 4.0 producing worse code when using intel style intrinsics 
and why isn't the union version using builtins as fast as using the vector 
version?
I can answer why unions are slower: that's because they are spilled to 
memory on every assignment -- GCC 4.0 knows how to replace structs with 
different scalar variables (one per item), but not unions.  GCC 3.4 knew 
about none of these possibilities.

About why vectors are faster, well, a lot of the vector support has been 
rewritten in GCC 4.0 so that may be the case.

I do not know exactly why builtins are still slower, but you may want to 
create a PR and add me on the CC list ([EMAIL PROTECTED]).

Paolo

Re: [RFA:] change back name of initial rtl dump suffix to ".rtl".

2005-02-21 Thread Paolo Bonzini

> Ah, that's triggered by -fdump-rtl-expand-detailed (it is revision 2.28, 
> for which I could not find an entry on gcc-patches).

Do you know of a reason why that isn't on by default?
Because -fdump-rtl-expand-detailed includes *two* copies of the RTL: one 
lacks the prologue and epilogue but is interleaved with trees, the other 
is the standard -fdump-rtl-expand dump.

> ISTR the name change was to avoid a switch named -fdump-rtl-rtl.
To invent an option name alias and use a minor repetition in it
as a reason for changing the old behavior is Bad.
It is not merely an option name alias.  It came together with a redesign 
of the way RTL dumps work, to integrate their management with tree dumps 
and to allow (in the future) to have various levels of detail in the RTL 
dumps as well.

I never had a problem with the rename because I use fname*.00.* (or the 
analogous completion key sequence) to invoke an editor on the RTL dump 
(or in general any other dump).

Paolo

Re: [RFA:] change back name of initial rtl dump suffix to ".rtl".

2005-02-22 Thread Paolo Bonzini

Gabriel Dos Reis wrote:
Paolo Bonzini <[EMAIL PROTECTED]> writes:
| > > ISTR the name change was to avoid a switch named -fdump-rtl-rtl.
| > To invent an option name alias and use a minor repetition in it
| > as a reason for changing the old behavior is Bad.
| 
| It is not merely an option name alias.  It came together with a
| redesign of the way RTL dumps work, to integrate their management with
| tree dumps and to allow (in the future) to have various levels of
| detail in the RTL dumps as well.

maybe an option with argument?
-fdump-rtl=detailed
-fdump-rtl=classic  # same as -fdump-rtl
-fdump-rtl-* (where the star is a path name) is the name of options that 
enable RTL dumps, e.g.  -fdump-rtl-lreg or -fdump-rtl-cse2; "expand" is 
just a path name, and since the final part of the dump file name is the 
pass name, the file name gets changed from "foobar.00.rtl" to 
"foobar.00.expand".

Paolo

Re: a mudflap experiment on freebsd

2005-02-23 Thread Paolo Bonzini

I think the decision to force the
user to specify -lmudflap should be revisited.
I agree.
The fixincludes build failed with link errors for undefined mudflap
functions.  The fixincludes Makefile.in does not use CFLAGS when linking.  I
added $(CFLAGS) to the 3 rules that contain a link command.
I think this can be committed as obvious, especially by a GWP person as 
you are...

I now get a configure failure in the intl directory.  configure is unable
to run programs compiled by $CC.  I gets an error from the dynamic linker
complaining that libgcc.so can't be found.  The problem here is that the
toplevel Makefile support for LD_LIBRARY_PATH is confused.  See the
SET_GCC_LIB_PATH support.
It is completely broken.  If you modify configure.ac, and make from 
within the GCC directory, the LD_LIBRARY_PATH will not even include the 
GCC directory.  I have a patch queued for 4.1 for this, but I want to 
see the PRs and try to reproduce the problem.

I don't think it is a requirement to only run make from within the 
toplevel, even if TARGET-* variables alleviate the problem.

! /* We must allocate one more entry here, as we use NOTE_INSN_MAX as the
!default field for line number notes.  */
! static const char *const note_insn_name[NOTE_INSN_MAX+1] = {
I think this also ought to be committed.
Paolo

Re: Benchmark of gcc 4.0

2005-02-24 Thread Paolo Bonzini

For GCC, I used in both cases the flags
-march=pentium4 -mfpmath=sse -O3 -fomit-frame-pointer -ffast-math
>
As for gcc4 vs gcc3.4,  degradataion on x86 architecture is most 
probably because of higher register pressure created with more 
aggressive SSA optimizations in gcc4.
Try these five combinations:
-O2 -fomit-frame-pointer -ffast-math
-O2 -fomit-frame-pointer -ffast-math -fno-tree-pre
-O2 -fomit-frame-pointer -ffast-math -fno-tree-pre -fno-gcse
-O3 -fomit-frame-pointer -ffast-math -fno-tree-pre
-O3 -fomit-frame-pointer -ffast-math -fno-tree-pre -fno-gcse
You may also want to try -mfpmath=sse,387 in case your benchmarks use 
sin, cos and other trascendental functions that GCC knows about when 
using 387 instructions.

Paolo

Re: GCC 4.1 Projects

2005-02-28 Thread Paolo Bonzini

Daniel Jacobowitz wrote:
On Sun, Feb 27, 2005 at 03:56:26PM -0800, Mark Mitchell wrote:
Daniel Jacobowitz wrote:
On Sun, Feb 27, 2005 at 02:57:05PM -0800, Mark Mitchell wrote:

Nathanael said it did not interfere with any of the other _projects_,
not that it would be disjoint from all Stage 1 _patches_.
Fair point.

I would certainly prefer that you hold off until Stage 2, as indicated 
by the documented I posted.
Could you explain what benefits from waiting?  None of the other large,
The primary benefit is just reduced risk, as you suggest.
The Stage 1 schedule looks full to me, and I'd like to see those patches 
go in soon so that we can start shaking out the inevitable problems. 
I'm much less worried about the long-term impact of Nathanael's patch; 
if it breaks something, it will get fixed, and then that will be that. 
But, that brief breakage might make things difficult for people putting 
in things during Stage 1, or compound the problem of having an unstable 
mainline.

I think that's not a useful criteria for scheduling decisions.
Let me be more concrete.  Paolo Bonzini posted a patch to move
in-srcdir builds to a host subdirectory.  This is a substantial build
infrastructure change, even though it will not affect a substantial
number of developers - assuming it works correctly. I consider it no
different "in kind" from Nathanael's changes.  He can approve that; so
a system where he can't approve his own tested patch is one in which
you are overriding his judgement.  ISTR that that is exactly what you
did not want to do with this scheduling exercise.
No offense intended to Paolo, of course!  I picked a recent example.
We're less than a week into stage 1, so I don't have much in the way of
samples to draw on.
No offense perceived, of course.  FWIW I fully agree with you, and my 
next queued patch is something to clean up the SET_LIB_PATH mess in the 
toplevel, and it does have a potential of breaking bootstrap (I'll post 
it as a call-for-testing, because it affects only ia64 in practice and I 
don't have one of them).  I just came across this, so I did not post it 
as a project to Mark.

Paolo

Re: GCC 4.1 Projects

2005-02-28 Thread Paolo Bonzini

Daniel Jacobowitz wrote:
On Sun, Feb 27, 2005 at 03:56:26PM -0800, Mark Mitchell wrote:
Daniel Jacobowitz wrote:
On Sun, Feb 27, 2005 at 02:57:05PM -0800, Mark Mitchell wrote:

Nathanael said it did not interfere with any of the other _projects_,
not that it would be disjoint from all Stage 1 _patches_.
Fair point.

I would certainly prefer that you hold off until Stage 2, as indicated 
by the documented I posted.
Could you explain what benefits from waiting?  None of the other large,
The primary benefit is just reduced risk, as you suggest.
The Stage 1 schedule looks full to me, and I'd like to see those patches 
go in soon so that we can start shaking out the inevitable problems. 
I'm much less worried about the long-term impact of Nathanael's patch; 
if it breaks something, it will get fixed, and then that will be that. 
But, that brief breakage might make things difficult for people putting 
in things during Stage 1, or compound the problem of having an unstable 
mainline.

I think that's not a useful criteria for scheduling decisions.
Let me be more concrete.  Paolo Bonzini posted a patch to move
in-srcdir builds to a host subdirectory.  This is a substantial build
infrastructure change, even though it will not affect a substantial
number of developers - assuming it works correctly. I consider it no
different "in kind" from Nathanael's changes.  He can approve that; so
a system where he can't approve his own tested patch is one in which
you are overriding his judgement.  ISTR that that is exactly what you
did not want to do with this scheduling exercise.
No offense intended to Paolo, of course!  I picked a recent example.
We're less than a week into stage 1, so I don't have much in the way of
samples to draw on.
No offense perceived, of course.  FWIW I fully agree with you, and my 
next queued patch is something to clean up the SET_LIB_PATH mess in the 
toplevel, and it does have a potential of breaking bootstrap (I'll post 
it as a call-for-testing, because it affects only ia64 in practice and I 
don't have one of them).  I just came across this, so I did not post it 
as a project to Mark.

Paolo

Re: request for timings - makedepend

2005-03-07 Thread Paolo Bonzini

and report (a) the numbers reported by the "time" command, (b) what
sort of machine this is and how old, and (c) whether or not you would
be willing to trade that much additional delay in an edit-compile-debug 
cycle for not having to write dependencies manually anymore.
Linux P4 3.4 GHz: real0m5.212s  user0m4.330s  sys 0m0.320s
MacOS G4 1.5 GHz: real   0m17.100s  user   0m12.740s  sys 0m2.720s
Maybe you can use "$?" in some way?  It would be fine for me to trade a 
slowdown (e.g. the full dependencies being built in twice the time) in 
the first compilation, with a much smaller edit-compile-debug delay.

Paolo

Re: [BUG mm] "fixed" i386 memcpy inlining buggy

2005-04-06 Thread Paolo Bonzini


The only thing that would avoid this is to either tell the compiler to
never put esi/edi in memory (which I think is not possibly across
different versions of gcc) or to always generate a single asm section
for all the different cases.
Use __asm__ ("%esi") and __asm__ ("%edi").  It is not guaranteed that 
they access the registers always (you can still have copy propagation 
etcetera); but, if your __asm__ statement constraints match the register 
you specify, then you can be reasonably sure that good code is produced.

Paolo

Re: Input and print statements for Front End?

2005-04-07 Thread Paolo Bonzini

I can't seem to find any info regarding an input or print statement, so  
i can read integers(my language only deals with integers) from the stdio 
and return integer results to stdio.
You need to map these to printf/scanf calls.
Paolo

Re: GCC 4.0 RC1 Available

2005-04-12 Thread Paolo Bonzini

Kaveh R. Ghazi wrote:
Nathanael removed the surrounding for-stmt but left
the break inside the if-stmt.
http://gcc.gnu.org/ml/gcc-patches/2003-11/msg02109.html
 

I think it is enough to remove it.  bash does not complain if it finds a 
stray break, it seems.

Ok to commit to mainline (and src)?  Mark, if you decide to fix it in 
4.0, I think it is better that you do it yourself also because of the 
time zone difference (I'll be out of home this evening, which is 
morning/afternoon for you).

Paolo
2005-04-12  Paolo Bonzini  <[EMAIL PROTECTED]>
   * configure: Regenerate.
config:
2005-04-12  Paolo Bonzini  <[EMAIL PROTECTED]>
   * acx.m4 (ACX_PROG_GNAT): Remove stray break.
Index: acx.m4
===
RCS file: /cvs/gcc/gcc/config/acx.m4,v
retrieving revision 1.11
diff -p -u -r1.11 acx.m4
--- acx.m4  28 Feb 2005 13:25:55 -  1.11
+++ acx.m4  12 Apr 2005 07:04:16 -
@@ -212,7 +212,6 @@ acx_cv_cc_gcc_supports_ada=no
errors=`(${CC} -c conftest.adb) 2>&1 || echo failure`
if test x"$errors" = x && test -f conftest.$ac_objext; then
  acx_cv_cc_gcc_supports_ada=yes
-  break
fi
rm -f conftest.*])

Re: My opinions on tree-level and RTL-level optimization

2005-04-18 Thread Paolo Bonzini

I think Roger simply mis-spoke because in his original message, he
said what you said: the important issue is having the alias
information available in RTL.  Much (but not all: eg., SUBREG info) of
that information is best imported down from the tree level.
Well, paradoxical subregs are just a mess: optimizations on paradoxical 
subregs are better served at the tree level, because it is just 
obfuscation of e.g. QImode arithmetic.

Indeed, my patch removed an optimization on paradoxical subregs, and 
kept an optimization on non-paradoxical subregs.

Take this code:
long long a, b, c, d;
int x;
...
c = a * b;
d = (int) x * (a * b);
In my view, tree-level optimization will catch (a * b) as a redundant 
expression.  RTL-level optimization will catch that the high-part of 
"(int) x" is zero.

Roger proposed lowering 64-bit arithmetic to 32-bit in tree-ssa!  How 
would you do it?  Take

long long a, b, c;
c = a + b;
Would it be
c = ((int)a + (int)b)
+ ((int) (a >> 32) + (int) (b >> 32)
   + ((unsigned int) a < (unsigned int) b)) << 32;
Or will you introduce new tree codes and uglifying tree-ssa?  Seriously...
This is a very inaccurate characterization of CSE.  Yes, it does those
things, but eliminating common subexpressions is indeed the major task
it performs.
It was.  Right now, the only thing that fold_rtx tries to simplify is
   (mult:SI (reg:SI 58) 8)
to
   (ashiftrt:SI (reg:SI 58) 3)
Only to find out it is not a valid memory_operand...  I have a patch to 
completely disable calling fold_rtx recursively, only equiv_constant. 
That was meant to be part 3/n of the cleanup fold_rtx series.  I was 
prepared to take responsibility for every pessimization resulting from 
these cleanups, and I expected to be sure I'd find a better way to do 
the same thing.

A 7000-lines constant propagator...
I think there's a serious conceptual issue in making the tree level too
machine-dependent.  The *whole point* of doing tree-level optimizations
is to do machine-*independent* optimizations.  Trees are machine-independent
and RTL is machine-dependent.  If we go too far away from that, I think
we miss the point.
No, the whole point of doing tree-level optimizations is to be aware of 
high-level concepts before they are lowered.  No need to worry about 
support for QImode-size arithmetic.  No need to worry if 64-bit 
multiplication had to be lowered.

Besides, the RTL optimizers are not exactly a part of GCC to be proud
of if "ugliness" is a measure.
Really?
The biggest and less readable files right now are combine.c, reload.c, 
reload1.c.  cse.c is big (though not extreme) but unreadable.

OTOH, stuff like simplify-rtx.c or especially fold-const.c is big but 
readable.

Of course GCC will always need a low-level IR.  But, combine is
instruction selection in the worst possible way; 

It served GCC well for decades, so I hardly think that's a fair statement.
Never heard about dynamic programming?
reload is register allocation in the worst possible way, 

Reload is not supposed to do register allocation.  To the extent that
it does, I agree with you.  But what this has to do with the issue of
tree vs. RTL optimization is something I don't follow.  Surely you
aren't suggesting doing register allocation at the tree level?
No, he's suggesting cleaning up stuff, so that it is easier to stop 
doing things in the worst possible way.  He's suggesting to be realistic 
once code has run completely out of control.

Luckily some GWP people do care about cleaning up.  Richard Henderson 
did a lot of work on cleaning up RTL things left from olden times (think 
eh, nested functions, addressof, save_expr,...), Zack did some work on 
this ground in the past as well, Bernd is maybe the only guy who could 
pursue something such as reload-brench...

I hate to make "clubs" out of a community, but it looks like only some 
people care of the state of the code...  Steven has done most of the 
work for removing the define_function_unit processor descriptions.  I 
removed ~5000 lines of code after tree-ssa went in (including awful 
stuff such as protect_from_queue, which made sense maybe in 1990, and 
half of stmt.c).  Kazu is also in the CSE-cleanup game.  Maybe, link in 
my case, it's only because I have limited time to spend on GCC and think 
that cleaning up is a productive way to use this time.  But anyway I 
think it is worth the effort.

Paolo

Re: GCC 4.0 RC1 Available

2005-04-18 Thread Paolo Bonzini

Kaveh R. Ghazi wrote:
When this patch went into 4.0, Paolo didn't regenerate the top level
configure, although the ChangeLog claims he did:
http://gcc.gnu.org/ml/gcc-cvs/2005-04/msg00842.html
 

You're right.  I was being conservative and typed the "cvs ci" filenames 
manually, but in this case there was no need because I worked off a 
fresh checkout.  Sorry.

The patch should also be applied to mainline, since the "break"
problem exists there too.  I'm not sure why it wasn't, but perhaps
your "OK for 4.0.0" didn't specify mainline and Paolo was being
conservative.  I think we should fix it there also.
 

Yes, I was.  But it looks like build machinery maintainers are being 
busy and toplevel patches are largely unnoticed.

Paolo

< 1 2 3 4 5 6 7 8 9 10 >

201 - 300 of 1076 matches

Mail list logo