Re: Ideas for Google Summer of Code

2009-03-30 Thread Paolo Bonzini
 So we can do Intel, ATI and NVIDIA GPU backends.  NVIDIA already has
 an implementation of OpenCL working.
 http://www.nvidia.com/object/cuda_opencl.html.  Would there be any
 sharing involved with them??

If you mean between backends, 1) do not underestimate the time needed to
write a new GCC backend; 2) probably nothing can be shared (see for
example the zero-sharing between PPU and SPU).  I think that your 2) and
3) projects are way more viable.

 I am working on my proposal now and I will post it to this list before
 final submission (I've got to hurry, they are due April 3rd).  I will
 mainly focus on this pdf:
 http://www.khronos.org/developers/library/overview/opencl_overview.pdf,

Note that this *is* different from the backends you mentioned above, and
as I said I think it is more viable.

Make sure that what you propose is implementable without an OpenCL C
compiler (I think it is), or discuss to what extent the library will be
functional.  For example, do you need another runtime library
implementing the intrinsics used by kernels?

Thanks,

Paolo


[cond-optab] update after first round of testing (results for all targets)

2009-03-26 Thread Paolo Bonzini
I've finished the first round of testing on all targets and will be
sending patches soon.

Overall, I think the results are quite satisfying.

For the current bunch of files, I get the same code on the following
targets:

  m32c crx mmix xstormy16 fr30 v850 m32r iq2000 picochip mcore spu ia64
  m68hc11 alpha frv e500* arm

  * I'm treating e500 as a different target than powerpc

I get the same code except for unordered comparisons, which are
improved, on the following targets:

  mips sparc

I get the same code except for small scheduling changes with some option
combinations on the following targets:

  m68k i386 rs6000

I get the same code with small improvements in instruction selection or
delay slot scheduling on the following targets:

  vax avr cris h8300

I get slightly better code because of better optimization (especially if
conversion) on the following targets:

  arc xtensa mn10300 score bfin

I get large improvements on the following target:

  pdp11

I get overall a slight decrease in code quality, which is however offset
by patches to expand that I've already posted, on the following targets:

  pa s390


I have not yet converted sh.  I'll do so today.  Next step is simulator
testing for targets that, well, have a simulator.

I'll be posting the target conversion patches soon.  Indications about
the options that I tested will be found there.

The intention is to merge early in stage1 either as a series of commits
or just one or anything in the middle.

As usual, if people want me to switch to a public branch, just tell me.

Paolo


Re: GCC 4.4.0 Status Report (2009-03-13)

2009-03-22 Thread Paolo Bonzini
On Sun, Mar 22, 2009 at 15:41, Richard Kenner
ken...@vlsi1.ultra.nyu.edu wrote:
 I must admit that this interpretation is quite new to me.
 It certainly wasn't when EGCS reunited with gcc.

 I disagree.  reuniting with GCC means reuniting with the FSF.

... but not raising a white flag.

Paolo


Re: GCC 4.4.0 Status Report (2009-03-13)

2009-03-22 Thread Paolo Bonzini
 Then you had the wrong understanding.  The FSF has ALWAYS had the right to
 overrule technical decisions on ANY of their projects.  The point is that
 this is a right they very rarely exercise.

Of course, just I (and others) don't see why they should do it in this
case.  Delaying a *branch* is different from, say, using a proprietary
version control or bug tracking system.

Paolo


Re: GCC 4.4.0 Status Report (2009-03-13)

2009-03-22 Thread Paolo Bonzini
 Btw, I cannot find anything related to this discussion (about whether
 and what power the FSF has to force their maintainers to do anything)
 in the official FSF documentation (http://www.gnu.org/prep/maintain/).

Well, as the copyright owner and the appointer of maintainers it is
pretty obvious that the FSF *can* do whatever they want.  Obviously,
since they are intelligent people they will usually just ask you to
follow the GNU project guidelines (including the strong suggestions
about using C).

Paolo


Re: Proposed gfortran development branch

2009-03-20 Thread Paolo Bonzini

 Note that merging the branch will be painful (as in, please dissect
 the branch into the individual patches again to make bisecting the
 trunk SVN possible).  Also the SC vetoed these kind of 'integration'
 branches in the past (to not encourage starting an effective stage1
 on a branch).

I think that gfortran is managed in a sufficiently different way than
the rest of GCC (e.g. they started having reviewers long before the rest
of GCC did, and they appoint their own maintainers practically
autonomously) that I don't think there is a reason to care.

What you see in practice is Novell and Google doing stage1 work, and the
volunteer gfortraners stuck.

Besides, the rule against integration branches, while being extremely
well founded, is bound to become obsolete.  GCC developers could start
using distributed version control and publishing their work on git.or.cz
or github -- and after they pull from each other,  all of the
distributed repositories will be integration branches.  Should the SC
prohibit developing GCC with Mercurial or git?

Paolo


Re: GCC 4.4.0 Status Report (2009-03-13)

2009-03-20 Thread Paolo Bonzini

 I don't understand this. Why does the SC have little power in this
 matter?  Surely you could decide to ship GCC 4.4 with the old license,
 as the official GCC maintainer?  But you *choose* not to use this
 power (perhaps for good reasons, but I'm unconvinced).

 The  GCC maintainers work on behalf of the FSF and in some matters defer
 to the FSF.  It's that simple.

Yes, but it's not written anywhere that release and especially branching
policies are one of this matters.

Personally, what I'd like to see is a clear justification of why a
license change motivated by plugins needs to be on a branch that will
never have plugins.  There has been already a plugin branch or two for a
while, obviously with the old license, and the FSF said nothing.  The
only reason I can see, would be to avoid that merges *to* a plugin
branch include new 4.5 features.

Paolo


Re: Automatic Parallelization Graphite - future plans

2009-03-19 Thread Paolo Bonzini

 The most visible ongoing effort is the conversion from target macros
 to target hooks (which is incomplete). The goal was to allow hot
 swapping of backends.  This is still the most obvious, most complete,
 and least unappealing (from a technical POV) approach IMHO. But Kaveh
 showed at one point that the compile time penalty of even just the
 partial conversion done so far is a few percentage points (somewhere
 between 3% and 5%, I don't recall the details). And also it's not nice
 and easy work so nobody is working on it actively AFAIK.

It occurred to me at some point that using an indirect function call is
useless.  It would be much better to have, instead of the current
targetm.foo syntax, something like TARGET(foo); this would expand to
target_foo and be further remapped to the target hooks via aliases.
Just by swapping targ* files you could choose whether to use function
pointers if the target does not support aliases (or in the future if
multiple backends are desired), or regular functions in the other case.

Another problem is the mess of GO_IF_LEGITIMATE_ADDRESS and
REG_OK_FOR_{BASE,INDEX}_P.  These should be expressed as RTL constructs
and constraints in my opinion.  I had started a little work on that but
never got very far.

Paolo


[cond-optab] update

2009-03-19 Thread Paolo Bonzini
I now went through all backends except sh and made the required changes.
 So far all I tested is that gcc compiles with one target per port. :-)
 Plus, i386-linux bootstraps and regtests okay.

Right now I aim at 100% identical assembly, maybe I'll have to relax
that.  Besides obvious register allocation differences (which did not
happen for i386 on simple testcases, so it is possible to avoid them),
I'm not sure I can achieve that on cc0 targets because of the tst
patterns, but probably combine can be taught to try them if it is not
already doing it.

Each port took no more than 30-45 minutes to convert.  It is very
mechanical: you basically duplicate the cmp patterns into cbranch and
cstore patterns and eliminate all occurrences of the *_compare_op
variables from the emitters.  Then you go through mov*cc and add*cc
patterns, and replace *_compare_op there too (with elements of the
comparison passed in operand 1).  Then you zap all code you do not need.

The positive surprises: PA was already very clean.  bfin was very
different from the others but easy.

The only ports for which I substantially rewrote some of the code in a
non-mechanical way are m32r and sparc, and mcore somewhat.  The only
ports that grew are cris, h8300 and i386.

Overall over 5000 lines were deleted.

Here is the diffstat:

config/picochip 1 file changed, 1 insertion(+), 112 deletions(-)
config/fr30 3 files changed, 7 insertions(+), 185 deletions(-)
config/score 8 files changed, 18 insertions(+), 152 deletions(-)
config/crx 3 files changed, 22 insertions(+), 97 deletions(-)
config/cris 1 file changed, 29 insertions(+), 17 deletions(-)
config/bfin 4 files changed, 31 insertions(+), 234 deletions(-)
config/m68hc11 3 files changed, 33 insertions(+), 242 deletions(-)
config/stormy16 4 files changed, 34 insertions(+), 99 deletions(-)
config/iq2000 4 files changed, 39 insertions(+), 272 deletions(-)
config/arc 3 files changed, 41 insertions(+), 318 deletions(-)
config/v850 1 file changed, 42 insertions(+), 199 deletions(-)
config/m32c 4 files changed, 50 insertions(+), 150 deletions(-)
config/pa 4 files changed, 53 insertions(+), 520 deletions(-)
config/mn10300 1 file changed, 54 insertions(+), 110 deletions(-)
config/frv 3 files changed, 55 insertions(+), 197 deletions(-)
config/xtensa 3 files changed, 58 insertions(+), 75 deletions(-)
config/mmix 4 files changed, 63 insertions(+), 261 deletions(-)
config/mcore 3 files changed, 67 insertions(+), 307 deletions(-)
config/pdp11 3 files changed, 68 insertions(+), 447 deletions(-)
config/vax 4 files changed, 81 insertions(+), 56 deletions(-)
config/avr 1 file changed, 82 insertions(+), 155 deletions(-)
config/h8300 3 files changed, 87 insertions(+), 73 deletions(-)
config/mips 5 files changed, 93 insertions(+), 148 deletions(-)
config/spu 3 files changed, 95 insertions(+), 181 deletions(-)
config/s390 4 files changed, 96 insertions(+), 126 deletions(-)
config/arm 4 files changed, 97 insertions(+), 350 deletions(-)
config/alpha 4 files changed, 103 insertions(+), 277 deletions(-)
config/rs6000 5 files changed, 126 insertions(+), 346 deletions(-)
config/m32r 4 files changed, 155 insertions(+), 405 deletions(-)
config/m68k 4 files changed, 156 insertions(+), 273 deletions(-)
config/ia64 4 files changed, 190 insertions(+), 243 deletions(-)
config/i386 4 files changed, 286 insertions(+), 207 deletions(-)
config/sparc 4 files changed, 327 insertions(+), 861 deletions(-)

Overall:

 128 files changed, 2990 insertions(+), 8261 deletions(-)

Paolo



Re: help for arm avr bfin cris frv h8300 m68k mcore mmix pdp11 rs6000 sh vax

2009-03-16 Thread Paolo Bonzini
Martin Guy wrote:
 On 3/14/09, Paolo Bonzini bonz...@gnu.org wrote:
 Hans-Peter Nilsson wrote:
   The answer to the question is no, but I'd guess the more
   useful answer is yes, for different definitions of truncate.

 Ok, after my patches you will be able to teach GCC about this definition
  of truncate.
 
 I expect it's a bit too extreme an example, but I've just found (to my
 horror) that the MaverickCrunch FPU truncates all its shift counts to
 6-bit signed (-32(right) to +31(left)), including on 64-bit integers,
 which is not very helpful to compile for.
 ...unless it happens to come easy to handle shift count is truncated
 to less than size of word in your new framework

Uhm, well, no. :-)

This could already be handled by faking a 63 bit truncation and using a
splitter to expand those into something like this (I only know integer
ARM assembly, so I'm making this up):

   AND R1, R0, #31
   MOV R2, R2, SHIFT R1
   ANDS R1, R0, #32
   MOVNE R2, R2, SHIFT #31
   MOVNE R2, R2, SHIFT #1

or

   ANDS R1, R0, #32
   MOVNE R2, R2, SHIFT #-32
   SUB R1, R1, R0  ; R1 = (x = 32 ? 32 - x : -x)
   MOV R2, R2, SHIFT R1

(which requires a scratch register, so it cannot be done postreload...
this might be a problem)

But my new stuff won't change anything.

Paolo


Re: GCC 4.4.0 Status Report (2009-03-13)

2009-03-16 Thread Paolo Bonzini
NightStrike wrote:
 On Fri, Mar 13, 2009 at 1:58 PM, Joseph S. Myers
 jos...@codesourcery.com wrote:
 Given the SC request we need to stay in Stage 4 rather than trying to work
 around it.
 
 What if GCC went back to stage 3 until the issue is resolved, thus
 opening the door for a number of stage3-type patches that don't affect
 1) licensing and 2) plugin frameworks, but are merely bug fixes which
 would have long been shaken out by now.

No, not at all.  The only benefit we're having from this is that GCC 4.4
should be quite stable already in GCC 4.4.0, let's not destroy this one too.

Paolo


Re: Dose gcc provide any function to build def-use chain in RTL form

2009-03-16 Thread Paolo Bonzini
villa gogh wrote:
 hi
 now i'm trying to construct def-use chain after the PASS_LEAF_REGS.
 for the ssa form structure has been destoried during the former
 passes.
 I have found that gcc provides a way to build the def-use chain in the
 PASS_REGRENAME, but it only contains the defs and uses all in one
 basic block.

No, don't look at those.  Instead look at fwprop.c which uses use-def
chains -- DU chains are the same but they are computed with

  df_chain_add_problem (DF_DU_CHAIN);

instead of

  df_chain_add_problem (DF_UD_CHAIN);

before df_analyze.

fwprop accesses use-def chains by using DF_REF_CHAIN (use); def-use
chains are the same but the DF_REF_CHAIN macro is used with a def
argument instead.

Paolo


Re: help for arm avr bfin cris frv h8300 m68k mcore mmix pdp11 rs6000 sh vax

2009-03-14 Thread Paolo Bonzini
Hans-Peter Nilsson wrote:
 Date: Fri, 13 Mar 2009 12:34:49 +0100
 From: Paolo Bonzini bonz...@gnu.org
 
 I would like to know whether for avr,bfin,cris,frv,h8300,pdp11,rs6000
 (which define SHIFT_COUNT_TRUNCATED as 0) and for mcore,sh,vax (which
 do not define it at all) it is right that shift counts are never
 truncated.
 
 The answer to the question is no, but I'd guess the more
 useful answer is yes, for different definitions of truncate.

Ok, after my patches you will be able to teach GCC about this definition
of truncate.

Paolo


help for arm avr bfin cris frv h8300 m68k mcore mmix pdp11 rs6000 sh vax

2009-03-13 Thread Paolo Bonzini
These are all the !SHIFT_COUNT_TRUNCATED targets.

For 4.5 I would like to improve our RTL canonicalization so that no
out-of-range shifts are ever in the RTL representation.

This in turn means that the description given by SHIFT_COUNT_TRUNCATED
must be exact.  Right now !SHIFT_COUNT_TRUNCATED means I don't know,
I want it to mean it is never truncated.

I would like to know whether for avr,bfin,cris,frv,h8300,pdp11,rs6000
(which define SHIFT_COUNT_TRUNCATED as 0) and for mcore,sh,vax (which
do not define it at all) it is right that shift counts are never
truncated.

In addition, for arm and m68k I'd like to know whether bitfield
instructions truncate the bit position the same as shifts (8 bits for
arm, 6 bits for m68k).

This information is particularly important for targets that do not
have a simulator in src.

Thanks in advance!

Paolo


Re: help for arm avr bfin cris frv h8300 m68k mcore mmix pdp11 rs6000 sh vax

2009-03-13 Thread Paolo Bonzini
Ian Lance Taylor wrote:
 Paolo Bonzini bonz...@gnu.org writes:
 
 This in turn means that the description given by SHIFT_COUNT_TRUNCATED
 must be exact.  Right now !SHIFT_COUNT_TRUNCATED means I don't know,
 I want it to mean it is never truncated.
 
 You need to do more work to make that happen, as SHIFT_COUNT_TRUNCATED
 applies to both the shift instructions and the bitfield instructions.
 On some processors one or the other is truncated; SHIFT_COUNT_TRUNCATED
 may currently only be set to 1 if both are truncated.  (E.g., I believe
 that m68k truncates shifts but not bitfield instructions.)

Yes, I've also split TARGET_SHIFT_TRUNCATION_MASK and
TARGET_EXTRACT_TRUNCATION_MASK, but for the latter a conservative
default can be used since it's used only in one optimization in combine.

[trimmed CC list]

Paolo


Re: help for arm avr bfin cris frv h8300 m68k mcore mmix pdp11 rs6000 sh vax

2009-03-13 Thread Paolo Bonzini

 The Blackfin does not truncate shift counts.  The documentation
 specifies that e.g. for Dx = Dy instructions, shift counts greater
 than 31 produce a result of zero.  Other shift instructions use a sign
 extended part of the shift count to shift either left or right.  I
 don't know is probably the best answer we can give the compiler.

In my plan, the truncation of shifts is used to canonicalize RTL created
with out of range shift counts.  This is useful because such out of
range RTL can appear because of unrolling or inlining.  Then the answer
should be based on this: would a typical C programmer expect a left
and a right shift from this:

int f(int a)
{
  return 0x4000  a;
}

int x, y;
int main()
{
  x = f(1);
  y = f(-1);
}

If the C program above can be reasonably considered undefined with
Blackfin, saying shifts are not truncated is okay.  This is because
the variable left/right shifts can still be described as rtl like

  (set A (if_then_else (lt B (const_int 0))
   (lshiftrt A (minus (const_int 0) B))
   (lshift A B)))

so that the actual arguments are LSHIFT/LSHIFTRT are positive.

Paolo


Re: help for arm avr bfin cris frv h8300 m68k mcore mmix pdp11 rs6000 sh vax

2009-03-13 Thread Paolo Bonzini
 /* Immediate shift counts are truncated by the output routines (or was it
the assembler?).  Shift counts in a register are truncated by SH.  Note
that the native compiler puts too large ( 32) immediate shift counts
into a register and shifts by the register, letting the SH decide what
to do instead of doing that itself.  */
 /* ??? The library routines in lib1funcs.asm truncate the shift count.
However, the SH3 has hardware shifts that do not truncate exactly as gcc
expects - the sign bit is significant - so it appears that we need to
leave this zero for correct SH3 code.  */

So you have that in the RTL stream we should canonicalize a  32 to
a, but a  (b  31) is not the same as a  b?

Also, how is the sign bit is significant?  Does it determine whether the
value is left- or right-shifted?

Finally, is SH2A the same as SH3?

Thanks!

Paolo


Re: help for arm avr bfin cris frv h8300 m68k mcore mmix pdp11 rs6000 sh vax

2009-03-13 Thread Paolo Bonzini

 Hm.  In fold-const.c we try to make sure to produce the same result
 as the target would for constant-folding shifts.  Thus, Paolo, I think
 what fold-const.c does is what we should assume for
 !SHIFT_COUNT_TRUNCATED.  No?

Unfortunately it is not so simple.  fold-const.c is actually wrong, as
witnessed by this program

  static inline int f (int s) { return 2  s; }
  int main () { printf (%d\n, f(33)); }

which prints 4 at -O0 and 0 at -O2 on i686-pc-linux-gnu.

This might mean either that it is easier than I thought (i.e. that all
the subtleties of the targets could be ignored), but I want to play it
safe and actually take the opportunity to fix the above problem (my
current patch does fix it).

Paolo


Re: help for arm avr bfin cris frv h8300 m68k mcore mmix pdp11 rs6000 sh vax

2009-03-13 Thread Paolo Bonzini

 Hm.  In fold-const.c we try to make sure to produce the same result
 as the target would for constant-folding shifts.  Thus, Paolo, I think
 what fold-const.c does is what we should assume for
 !SHIFT_COUNT_TRUNCATED.  No?
 Unfortunately it is not so simple.  fold-const.c is actually wrong, as
 witnessed by this program

  static inline int f (int s) { return 2  s; }
  int main () { printf (%d\n, f(33)); }

 which prints 4 at -O0 and 0 at -O2 on i686-pc-linux-gnu.
 
 But this is because i?86 doesn't define SHIFT_COUNT_TRUNCATED, no?

Yes, so fold-const.c is *not* modeling the target in this case.

But on the other hand, this means we can get by with documenting the
effect of a conservative truncation mask: no wrong code bugs, just
differences between optimization levels for undefined programs.  I'll
check that the optimizations done based on the truncation mask are all
conservative or can be made so.

So, I'd still need the information for arm and m68k, because that
information is about the bitfield instructions.  For rs6000 it would be
nice to see what they do for 64-bits (for 32-bit I know that PowerPCs
truncate to 6 bits, not 5).  But for the other architectures, we can be
conservative.

Paolo


Re: help for arm avr bfin cris frv h8300 m68k mcore mmix pdp11 rs6000 sh vax

2009-03-13 Thread Paolo Bonzini

 Note, one thing I encountered when doing the SSE5 work at AMD, is
 SHIFT_COUNT_TRUNCATED really needs a mode argument (and ideally should be 
 moved
 into the gcc_target structure).

In fact I'm reusing the TARGET_SHIFT_TRUNCATION_MASK element that is
already there and accepts a mode.

Paolo


Re: Revision 144098 (d.d. Wed Feb 11 08:56:41 2009 UTC (4 weeks ago)) is not a regression bug fix.

2009-03-12 Thread Paolo Bonzini
1) As multiple people said, it *was* a regression bug fix.  It actually
fixed two regressions.  (That it fixed the second was discovered only
after I committed it).  I'm sorry that it caused problems for you (even
though it's actually lucky for GCC), but I can't help saying that it
might have been the other way and it might have improved your weather
forecasting app by 10-20% or more, as it did on one or two SPEC benchmarks.

2) I apologize for the bad quality of the patch.  But I tested it on
bootstrap, SPEC2000, and of course the testcase, and it had no problems.
 This testing takes more than 24 hours on my machine and it is more than
is requested usually--and I did for correctness, not for performance
testing.

In addition, all but one of the fixes that H.J. made (and for which I
have to thank him) were unrecognizable insns due to a misunderstanding
of how peephole2 worked; I thought it recognized the produced
instructions, instead apparently it's the only optimization in GCC where
this does not happen.  I have a patch to fix this in GCC 4.5.

3) I look forward to seeing the result of the tests H.J. asked you to
do, so that at least we can find which peephole2 is responsible.  If you
want to revert it now, go ahead.  I don't see any problems with that and
I can approve the reversal of my patch.  I'll try again for 4.5 and
propose the patch for 4.5.1.

Alternatively, let's use these 48 hours constructively to finish the
above test, look at the code and try and understand the cause of the
failure.  I'll do my part by looking at the code *now*.

4) I would have appreciated being CCed on the message.

That said,

 I run a weather forecasting system 4 times daily to test it out.

We can only thank you for this.  Please keep the same attitude towards
the people that try to improve the compiler.

Paolo


Re: Revision 144098 (d.d. Wed Feb 11 08:56:41 2009 UTC (4 weeks ago)) is not a regression bug fix.

2009-03-12 Thread Paolo Bonzini

 In addition, all but one of the fixes that H.J. made (and for which I
 have to thank him) were unrecognizable insns due to a misunderstanding
 of how peephole2 worked

I stand corrected; *all* of the fixes.  The patch hadn't had a
correctness problem until your message, only ice-on-valids.  This does
not make the patch better, but I like to set things straight.

Paolo


Re: Revision 144098 (d.d. Wed Feb 11 08:56:41 2009 UTC (4 weeks ago)) is not a regression bug fix.

2009-03-12 Thread Paolo Bonzini
Toon Moene wrote:
 H.J. Lu wrote:
 
 If you can provide a testcase, I can take a look. If it isn't easy to
 find
 a testcase, please disable the second pattern:

 (define_peephole2
   [(set (match_operand 0 register_operand )
 (match_operand 1 register_operand ))
(set (match_dup 0)
(match_operator 3 commutative_operator
  [(match_dup 0)
   (match_operand 2 memory_operand )]))]
   operands[0] != operands[1]
 ((MMX_REG_P (operands[0])  MMX_REG_P (operands[1]))
|| (SSE_REG_P (operands[0])  SSE_REG_P (operands[1])))
   [(set (match_dup 0) (match_dup 2))
(set (match_dup 0)
 (match_op_dup 3 [(match_dup 0) (match_dup 1)]))]
   )

 to see if it makes a difference.
 
 Thanks.  Test case is hard, but this is easy to try.  Expect an answer
 from me tomorrow (e.g. 12 UTC).

In case it does *not* make a difference, please try this patch:

Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 144464)
+++ config/i386/i386.md (working copy)
@@ -20788,12 +20788,12 @@
 ;; refers to the destination of the load!

 (define_peephole2
-  [(set (match_operand:SI 0 register_operand )
-(match_operand:SI 1 register_operand ))
+  [(set (match_operand:P 0 register_operand )
+(match_operand:P 1 register_operand ))
(parallel [(set (match_dup 0)
-   (match_operator:SI 3 commutative_operator
+   (match_operator:P 3 commutative_operator
  [(match_dup 0)
-  (match_operand:SI 2 memory_operand )]))
+  (match_operand:P 2 memory_operand )]))
   (clobber (reg:CC FLAGS_REG))])]
   operands[0] != operands[1]
 GENERAL_REGNO_P (REGNO (operands[0]))

Thanks,

Paolo


Re: Revision 144098 (d.d. Wed Feb 11 08:56:41 2009 UTC (4 weeks ago)) is not a regression bug fix.

2009-03-12 Thread Paolo Bonzini

 Attached you'll find the (preprocessed) source of the routine that
 printed the Infinity's (of course, I cannot be completely certain that
 it actually resulted in the wrong code, but at least it might be studied
 to see if it helps to find the culprit).

No, this function is sane (the peephole *is* called a lot by this
function, but all is in due order).  I looked at the dumps and assembly
for -O2, -O3 and -O3 -fno-schedule-insns (*), and all is as expected.

Interestingly enough, you *should* expect a speedup when this is resolved...

The next guess then is that the RHXU and RHYV arrays are wrong.  From
these, ZHXY is computed, and ZHXY is multiplied into each of the
outputs.  Can you send the routine that computes those, or is it too big?

Paolo

(*) it would have helped to know the compilation flags and target, of
course.


Re: Revision 144098 (d.d. Wed Feb 11 08:56:41 2009 UTC (4 weeks ago)) is not a regression bug fix.

2009-03-12 Thread Paolo Bonzini
Toon Moene wrote:
 Paolo Bonzini wrote:
 
 Attached you'll find the (preprocessed) source of the routine that
 printed the Infinity's (of course, I cannot be completely certain that
 it actually resulted in the wrong code, but at least it might be studied
 to see if it helps to find the culprit).

 No, this function is sane (the peephole *is* called a lot by this
 function, but all is in due order).  I looked at the dumps and assembly
 for -O2, -O3 and -O3 -fno-schedule-insns (*), and all is as expected.
 
 Yeah, it was probably too much to hope for.

No, you were right, and that's great.  -ffast-math makes a difference,
because it enables more vectorization.

It goes as this:

(insn 494 493 495 44 statin.f:703 (set (reg:SF 371)
(vec_select:SF (reg:V4SF 367)
(parallel [
(const_int 0 [0x0])
]))) 1408 {*vec_extractv4sf_0} (expr_list:REG_DEAD
(reg:V4SF 367)
(nil)))

registers 371 and 367 are coalesced into xmm0.  Then the vec_select is
split to just

(set (reg:SF 21 [orig: 371]) (reg:SF 21 [orig: 367]))

and these are indeed !=, but they have the same hard register number so
the peephole should not apply in this case.  Here is a minimized testcase:

subroutine statin(x,y,pstratr,pconvecr,zhxy,zhxhy,ztmp)
integer :: x,y
real pstratr(x,y),pconvecr(x,y),zhxy(x,y)
real ztmp(4)
do j = 1,y
  do i = 1,x-2
   zttotrainr = zttotrainr + (pstratr(i,j) + pconvecr(i,j))*zhxy(i,j)
   ztstratr   = ztstratr   + pstratr(i,j)
   ztconvecr  = ztconvecr  + pconvecr(i,j)
   ztsenf = ztsenf + zhxy(i,j)
   ztlatf = ztlatf + zhxy(i,j)
   ztcldtop   = ztcldtop   + zhxy(i,j)
  enddo
enddo
ztmp(1)=zttotrainr
ztmp(2)=ztstratr
ztmp(3)=ztconvecr
ztmp(4)=ztsenf*ztlatf*ztcldtop
end

The following patch should fix it, you're welcome to run it through
HIRLAM.  I'm bootstrapping it in the meanwhile.

Index: gcc/config/i386/i386.md
===
--- gcc/config/i386/i386.md (revision 144464)
+++ gcc/config/i386/i386.md (working copy)
@@ -20795,7 +20795,7 @@
  [(match_dup 0)
   (match_operand:SI 2 memory_operand )]))
   (clobber (reg:CC FLAGS_REG))])]
-  operands[0] != operands[1]
+  !rtx_equal_p (operands[0], operands[1])
 GENERAL_REGNO_P (REGNO (operands[0]))
 GENERAL_REGNO_P (REGNO (operands[1]))
   [(set (match_dup 0) (match_dup 4))
@@ -20811,7 +20811,7 @@
(match_operator 3 commutative_operator
  [(match_dup 0)
   (match_operand 2 memory_operand )]))]
-  operands[0] != operands[1]
+  !rtx_equal_p (operands[0], operands[1])
 ((MMX_REG_P (operands[0])  MMX_REG_P (operands[1]))
|| (SSE_REG_P (operands[0])  SSE_REG_P (operands[1])))
   [(set (match_dup 0) (match_dup 2))

Paolo


Re: Revision 144098 (d.d. Wed Feb 11 08:56:41 2009 UTC (4 weeks ago)) is not a regression bug fix.

2009-03-12 Thread Paolo Bonzini

 Will REGNO (operands[0]) == REGNO (operands[1]) work here?

Yes.  I wanted to be conservative in case one day subregs or who knows
what are allowed.  I'll defer to maintainers or other people (Steven?),
either way is fine by me.

Paolo


cond-optab patch series

2009-03-11 Thread Paolo Bonzini
Hi, I'll be posting soon a series of patches labeled [cond-optab].
The aim of the series is to have all ports use cbranch+cstore+cmov
optabs instead of cmp/bcc/scc/movcc.  As a starter, the first patches
I'll post will be cleaning up and centralizing the generation of cmp,
scc and bcc opcodes.

The reasons are as follows:

1) more maintainability, less code duplication.  The preliminary
series I'll send remove 2 lines for each added line

2) more flexibility in RTL generation of jumps.  As a result...

3) less md-code, more machine independent code. ability to make all
the branch selection code written for i386 work on ARM and SPARC too.

Unless there is demand, I don't plan to put this on a branch.  Reviews
and bootstraps are welcome though.

Paolo


Re: No address_cost calls when inlining ?

2009-03-10 Thread Paolo Bonzini

 I want the version of foo because the store with an address as
 destination is costly on my architecture, which is why I defined
 TARGET_ADDRESS_COST and added a cost when I get this scenario.
 However, in the compilation of this code, it seems that, when the
 function is inlined, the address_cost function does not seem to be
 called anymore. Any ideas why ?

This is (a variant of) PR33699.

Paolo



Re: [patch][4.5] Make regmove cfglayout-safe

2009-03-10 Thread Paolo Bonzini
Paolo Bonzini wrote:
 I also wondered about this.  I think the original idea is that splits
 can call into dojump.c.
 
 A more likely possibility is -fnon-call-exceptions.

Of course this is the main cause.  But splitting one jump to multiple
jumps is supported and actually even documented.  It will happen for
example in this testcase:

int f(float x) { if (x != x) return 5; else abort (); }


on i386 which produces

fucomip %st(0), %st
jp  .L8
je  .L6

It is possible to change this to an expander in the i386 md of course.
I don't think any other backend is relying on it, but I will make a more
thorough check if I end up submitting something like the attached patch.

Paolo
2009-03-10  Paolo Bonzini  bonz...@gnu.org

* lower-subreg.c (decompose_multiword_subregs): Extract code...
* cfgbuild.c (rtl_split_blocks_for_eh): ... here.
* basic-block.h (rtl_split_blocks_for_eh): Declare it.

* recog.c (split_insn): Return bool.  Check that the splitter
produces no barriers and no labels.
(split_all_insns): Use the result.  Call rtl_split_blocks_for_eh
instead of find_many_sub_basic_blocks.
* reload1.c (fixup_abnormal_edges): Use it.
* passes.c (init_optimization_passes): Move cfglayout mode
further down.

Index: gcc/passes.c
===
--- gcc/passes.c(branch combine-cfglayout)
+++ gcc/passes.c(working copy)
@@ -757,8 +757,8 @@ init_optimization_passes (void)
   NEXT_PASS (pass_if_after_combine);
   NEXT_PASS (pass_partition_blocks);
   NEXT_PASS (pass_regmove);
-  NEXT_PASS (pass_outof_cfg_layout_mode);
   NEXT_PASS (pass_split_all_insns);
+  NEXT_PASS (pass_outof_cfg_layout_mode);
   NEXT_PASS (pass_lower_subreg2);
   NEXT_PASS (pass_df_initialize_no_opt);
   NEXT_PASS (pass_stack_ptr_mod);
Index: gcc/recog.c
===
--- gcc/recog.c (branch combine-cfglayout)
+++ gcc/recog.c (working copy)
@@ -29,6 +29,7 @@ along with GCC; see the file COPYING3.  
 #include insn-config.h
 #include insn-attr.h
 #include hard-reg-set.h
+#include except.h
 #include recog.h
 #include regs.h
 #include addresses.h
@@ -71,7 +72,6 @@ get_attr_enabled (rtx insn ATTRIBUTE_UNU
 
 static void validate_replace_rtx_1 (rtx *, rtx, rtx, rtx, bool);
 static void validate_replace_src_1 (rtx *, void *);
-static rtx split_insn (rtx);
 
 /* Nonzero means allow operands to be volatile.
This should be 0 if you are generating rtl, such as if you are calling
@@ -2671,19 +2671,23 @@ reg_fits_class_p (rtx operand, enum reg_
 }
 
 /* Split single instruction.  Helper function for split_all_insns and
-   split_all_insns_noflow.  Return last insn in the sequence if successful,
-   or NULL if unsuccessful.  */
+   split_all_insns_noflow.  Return whether new control flow insns
+   were added.  */
 
-static rtx
+static bool
 split_insn (rtx insn)
 {
   /* Split insns here to get max fine-grain parallelism.  */
   rtx first = PREV_INSN (insn);
   rtx last = try_split (PATTERN (insn), insn, 1);
   rtx insn_set, last_set, note;
+  bool new_cfi = false;
+  bool was_cfi;
 
   if (last == insn)
-return NULL_RTX;
+return false;
+
+  was_cfi = control_flow_insn_p (insn);
 
   /* If the original instruction was a single set that was known to be
  equivalent to a constant, see if we can say the same about the last
@@ -2706,22 +2710,25 @@ split_insn (rtx insn)
   /* try_split returns the NOTE that INSN became.  */
   SET_INSN_DELETED (insn);
 
-  /* ??? Coddle to md files that generate subregs in post-reload
- splitters instead of computing the proper hard register.  */
-  if (reload_completed  first != last)
+  while (first != last)
 {
   first = NEXT_INSN (first);
-  for (;;)
+  gcc_assert (!BARRIER_P (first)  !LABEL_P (first));
+
+  /* ??? Coddle to md files that generate subregs in post-reload
+ splitters instead of computing the proper hard register.  */
+  if (reload_completed  INSN_P (first))
+   cleanup_subreg_operands (first);
+  if ((first != last || !was_cfi)
+   control_flow_insn_p (first))
{
- if (INSN_P (first))
-   cleanup_subreg_operands (first);
- if (first == last)
-   break;
- first = NEXT_INSN (first);
+ gcc_assert (flag_non_call_exceptions
+  can_throw_internal (first));
+ new_cfi = true;
}
 }
 
-  return last;
+  return new_cfi;
 }
 
 /* Split all insns in the function.  If UPD_LIFE, update life info after.  */
@@ -2730,12 +2737,10 @@ void
 split_all_insns (void)
 {
   sbitmap blocks;
-  bool changed;
   basic_block bb;
 
   blocks = sbitmap_alloc (last_basic_block);
   sbitmap_zero (blocks);
-  changed = false;
 
   FOR_EACH_BB_REVERSE (bb)
 {
@@ -2753,41 +2758,17 @@ split_all_insns (void

Re: -mfpmath=sse,387 is experimental ?

2009-03-09 Thread Paolo Bonzini
Timothy Madden wrote:
 Hello
 
 Is -mfpmath=both for i386 and x86-64 still experimental in gcc 4.3, as
 the in the online manual page ?

Yes.  It might (*might*) be better in GCC 4.4 thanks to the new register
allocator, but it's unlikely that the manual page will be changed before
the release.

Paolo


Re: GCC-only software

2009-03-09 Thread Paolo Bonzini

 Well, the problem is that I don't know where to find the unofficial
 documentation, so it is hard to figure out the questions to be asked.

Well, the unofficial documentation is the source code. :-

Paolo


Re: Setting -frounding-math by default

2009-03-09 Thread Paolo Bonzini
Sylvain Pion wrote:
 Andrew Thomas Pinski wrote:
 The fact is that Roger's patch introduced a regression (this word
 should be clear enough here), in that some users now have their old
 code broken, and they are forced to add the -frounding-math option
 (after having lost some time finding about this non trivial issue).
 This is a long term hindrance.

 Actually before roger's patch the default is the same. Just there was
 no way to turn it off.
 
 Actually, there are 2 things controlled by -frounding-math :
 1) constant propagation of FP operations
 2) generic transformations like (-a)*b - -(a*b)

I think 2) is taken care of by -fassociative-math, or it should at least.

Paolo


Re: Setting -frounding-math by default

2009-03-09 Thread Paolo Bonzini
 I think 2) is taken care of by -fassociative-math, or it should at least.

 I don't think it is (I haven't checked), and I don't see why it should.
 This transformation has nothing to do with associativity : unless I'm
 mistaken, it is always valid when rounding is to the nearest or towards
 zero.

(-a) * b = -(a * b) is definitely reassociation (-a is -1 * a); no
reassociation has to be valid in any rounding mode, which means two
things: 1) it can be done even when other rounding-mode-dependent
optimizations are disabled via flag_rounding_math (good); 2) it would
also enable other optimization that you might not want (bad).

Paolo


Re: bitwise dataflow

2009-03-07 Thread Paolo Bonzini
 1. Dataflow framework to propagate bitwise register properties.
   (Integrated with the current dataflow framework.)
 2. Forward bitwise dataflow analysis: constant bit propagation.
 3. Backward bitwise dataflow analysis: dead bit propagation.
 4. Target applications: improve dce and see.  (Others?)

   For each instruction I in the function body
 For each register R in instruction I
   def_constant_bits(I, R) = collect constants from AND/OR/...
 operations.

There's already nonzero_bits (i.e. maybe nonzero) and
num_sign_bit_copies in rtlanal.c.  You can add to this one_bits and it
should be enough to do the simplifications you want.

You can get initial info from those routines, do the dataflow.  Then
there are rtx_hooks members to get a REG's nonzero bits/# of sign bit
copies (and you can add one for one_bits): just set them to a function
in your pass that returns info from the dataflow.  Then you can walk
through all the functions, recursively simplifying the RHS of each set
(you can look at propagate_rtx in fwprop.c for an example of
simplifying the RHS).  The code in simplify-rtx.c will take care of
using the nonzero_bits (et al.) information; other optimizations can be
added there.

Do not forget to check the cost of the replacement, otherwise your pass
might end up doing constant propagation (for constants, all zero and one
bits are known!!!).

Paolo



Re: New no-undefined-overflow branch

2009-03-06 Thread Paolo Bonzini

 So while trapping variants can certainly be introduced it looks like
 this task may be more difficult.

I don't think you need to introduce trapping tree codes.  You can
introduce them directly in the front-end as

   s = x +nv y
   (((s ^ x)  (s ^ y))  0) ? trap () : s

   d = x -nv y
   (((d ^ x)  (x ^ y))  0) ? trap () : d

   (b == INT_MIN ? trap () : -nv b)

   (int)((long long) a * (long long) b) == a *nv b ? trap () : a *nv b

Making sure they are compiled efficiently is another story, but
especially for the sake of LTO I think this is the way to go.

Paolo


Re: New no-undefined-overflow branch

2009-03-06 Thread Paolo Bonzini
Richard Guenther wrote:
 On Fri, Mar 6, 2009 at 3:29 PM, Paolo Bonzini bonz...@gnu.org wrote:
 So while trapping variants can certainly be introduced it looks like
 this task may be more difficult.
 I don't think you need to introduce trapping tree codes.  You can
 introduce them directly in the front-end as

   s = x +nv y
 
 I think this should be
 
   s = x + y
   (((s ^ x)  (s ^ y))  0) ? trap () : s
 
 otherwise the compiler can assume that for the following check
 the addition did not overflow.

Ah yeah I've not yet looked at the patches and I did not know which one
was which.  I actually wrote x + y first and then went back to carefully
check them. :-P

 Making sure they are compiled efficiently is another story, but
 especially for the sake of LTO I think this is the way to go.
 
 I agree.  Btw, for the addition case we generate
 
 leal(%rsi,%rdi), %eax
 xorl%eax, %esi
 xorl%eax, %edi
 testl   %edi, %esi
 jns .L2
 .value  0x0b0f
 .L2:
 rep
 ret
 
 which isn't too bad.

Well, for x86 it requires the addends to die.

This is unfortunately four insns, and combine has a limit of three.  but
maybe you could make combine recognize the check and turn it to an addv
pattern (with the add result unused!); and then CSE or maybe combine as
well would, well, eliminate the duplicate ADD...

If this does not work, on ARM you can also hope for something like this:

 ADDR0, R1, R2
 XORS   R0, R2, R3
 XORSMI R1, R2, R3
 SWIMI  #trap

But hey, whatever you get, it's anyway faster than a libcall. :-)

Of course there are better choices for x+CONSTANT; using (b == INT_MIN ?
trap () : -b) for negation is one example.

Paolo


Re: New no-undefined-overflow branch

2009-03-06 Thread Paolo Bonzini
Joseph S. Myers wrote:
 On Fri, 6 Mar 2009, Paolo Bonzini wrote:
 
 I don't think you need to introduce trapping tree codes.  You can
 introduce them directly in the front-end as
 
 Multiple front ends want the same thing.  This is why it would be better 
 to introduce the codes in GENERIC and have the language-independent 
 gimplifier contain the code to lower them, even if they don't become part 
 of GIMPLE.

I see your point.  What I'm worried of, is that this codes would be
tested more lightly and, until folding is a middle-end thing only, the
risk of unwanted optimization on -ftrapv code would be high.

You can have common code shared by front-ends.  They could apply it at
GENERICization time (Fortran, Ada) or directly while parsing (C, C++).

(int)((long long) a * (long long) b) == a *nv b ? trap () : a *nv b
 
 This is not a solution for trapping multiplication in the widest supported 
 type.

There's always range checking, I was pointing out optimization
possibilities; the above one can be optimized like

  (h,l) = a*b
  if (h != l  31) trap ();// signed shift

Paolo


Re: New no-undefined-overflow branch

2009-03-06 Thread Paolo Bonzini
Joseph S. Myers wrote:
 On Fri, 6 Mar 2009, Geert Bosch wrote:
 
 this task may be more difficult.  So lowering them early during
 gimplification looks like a more reasonable plan IMHO.
 Right, that was my intention. Still, I'll need to add code to
 handle the new tree codes in fold(), right?
 
 If you add new trapping codes to GENERIC I'd recommend *not* making fold() 
 handle them.

Constant folding should be done for them, though.

 Either lower 
 the codes in gimplification, or handle them explicitly in a few GIMPLE 
 optimizations e.g. when constants are propagated in, but avoid general 
 folding for them.

Definitely the former.

Paolo


Re: New no-undefined-overflow branch

2009-03-06 Thread Paolo Bonzini

 If this does not work, on ARM you can also hope for something like this:

  ADDR0, R1, R2
  XORS   R0, R2, R3
  XORSMI R1, R2, R3
  SWIMI  #trap
 
 On ARM you can just check for overflow directly...
 
   ADDSR0, R1, R2
   SWIVS   #trap

Of course, I was thinking explicitly of what happens with no MD support.

Paolo


Re: __builtin_return_address for ARM

2009-02-25 Thread Paolo Bonzini
Uwe Kleine-König wrote:
 Hello,
 
 currently[1] __builtin_return_address for ARM only works with level == 0.
 
 For ftrace in the linux kernel it would be great to implement that for
 level  0 (provided that framepointers or unwind information are
 available of course).  On the linux-arm-kernel ML Mikael Pettersson[2]
 said that __builtin_return_address(N) where N0 should never have been
 introduced into gcc..  Is that the general view for
 __builtin_return_address or would a patch be accepted?

My personal opinion is that Mikael Pettersson is right, but since the
damage is done why not extend it to more architectures.

I am not an ARM maintainer though.

Paolo


Re: Native support for vector shift

2009-02-24 Thread Paolo Bonzini

 Currently, we have to use intrinsics to support such shift. Isn't syntax of 
 vector
 shift intuitive enough to be supported natively? Someone may argue it breaks 
 the
 C language. But vector is a GCC extension anyway. Support for vector 
 add/sub/etc
 already break C syntax. Any thought? Sorry if this issue had been raised in 
 past.

I see no reason why this could not be added provided that it is 1)
adequately documented 2) implemented when not supported in hardware too
(tree-ssa-vect-generic.c) 3) possibly implemented for both C and C++.

Regarding 2, note that this

 V4H tst(V4H a, V4H b){
   return a  b;
 }

would have to be emulated on all x86 targets prior to SSE5.

Another much desired feature would be OpenCL C-style masking and swizzling.

Paolo


Re: Native support for vector shift

2009-02-24 Thread Paolo Bonzini
 It shouldn't be too hard to add the support.  I suspect the person who did 
 the
 initial support may have been on a machine without vector shifts.
 
 Nope, because it was originally done by Aldy who did the VMX support
 which had vector shifts.

OTOH the support for vector lowering was weaker than now in 3.x and it
was harder to add more lowering.  It shouldn't be very hard now to add
all kinds of shifts and also auto-splatting.

Paolo


Re: libiberty testsuite builds with wrong compiler

2009-02-23 Thread Paolo Bonzini
Jack Howarth wrote:
   The same issue in the libiberty testsuite run can be seen with
 the Apple regress server log at 
 http://gcc.gnu.org/regtest/HEAD/native-lastbuild.txt.gzip.
 If you search for test-demangle, you will find...

I'm sure there is a bugzilla entry for that.

Paolo


Re: About strict-aliasing warning

2009-02-13 Thread Paolo Bonzini

 -Wstrict-aliasing
 This option is only active when -fstrict-aliasing is active. It
 warns about code which might
 break the strict aliasing rules that the compiler is using for
 optimization. The warning does
 not catch all cases, but does attempt to catch the more common
 pitfalls. It is included in -Wall.
 It is equivalent to -Wstrict-aliasing=3 
 
 and -O2 would active -fstrict-aliasing by default, which should also
 active this options.

No, the text above means that -fstrict-aliasing is a *necessary*
condition to get aliasing warnings, not a sufficient condition.

Do you have suggestions for how to clarify the text?

Paolo


Re: IRA conflict graph alternative selection

2009-02-13 Thread Paolo Bonzini
Jeff Law wrote:
 We'd want to encode [early insn alternative selection]
 information in the conflict graph so that IRA would
 allocate registers so as to fit the constraints of the early insn
 alternative selection.  Right?   In the case where the graph is
 uncolorable, do we allow IRA to override the alternative selection, or
 do we insert copies to simplify the conflict graph or some mixture of both?

Inserting compensation code, for example copies, can be seen as some
kind of pre-reload as it was used on new-ra branch; the problem with
pre-reload was that it was built on cp reload1.c pre-reload.c, so it
was not much less complicated than reload.

Paolo


Re: proposal for improved management bugzilla priorities/release criteria

2009-02-12 Thread Paolo Bonzini
 However, I don't agree that P2 regressions aren't a factor.  If we have
 a ton of crashing on wrong-code, etc., regressions that adds up to a
 release that won't work well for people.

 In which case the important ones should be P1 ...

 No, that misses the point.  A mass of bugs, each itself not too
 critical, can still make a release that is of substandard quality.
 Think of the integral of perceived quality over the intended user-base.

Yes, that was the meaning more-or-less of my 50 P2 criteria.

Still, I would like to hear an opinion on what to do with regard to
long standing bugs that are clearly not going to be fixed in stage3/4.
 This was the main point of my message.

Paolo


Re: possible buffer overflow in calls.c?

2009-02-11 Thread Paolo Bonzini

 Assuming you have a copyright assignment, just send a patch to
 gcc-patches with the explanation.  This is code which will never be used
 for any popular target.

The patch is probably small enough that it does not require assignment,
given the description in his original message.

Paolo


Re: Difference between vec_shl_vector_mode and ashlvector_mode3

2009-02-10 Thread Paolo Bonzini
Bingfeng Mei wrote:
 Hello,

 Could anyone explain to me what is difference between
 vec_shl_vector_mode and ashlvector_mode3 patterns? It seems to me
 that both shift a vector operand 1 with scalar operand 2.  I tried to
 understand some targets' implemenation, e.g., ia64 as follows, and
 cannot grasp their difference. Does the whole vector shift of
 vec_shl means treating a vector as a long scalar?  Thanks in advance.

vec_shl_mode is indeed treating a vector as a long scalar, while
lshrmode3 is for SIMD shifts.  Only for shifts, the second argument
can be an integer mode specifying that the shift count has to be the
same for all SIMD elements.

Notice that in the vec_shr_mode you pasted, the shift is carried out
in DImode

   [(set (match_operand:VECINT 0 gr_register_operand )
 (lshiftrt:DI (match_operand:VECINT 1 gr_register_operand )
  (match_operand:DI 2 gr_reg_or_6bit_operand )))]
   

while in the lshrmode3 it is carried out in the vector mode:

   [(set (match_operand:VECINT24 0 gr_register_operand =r)
   (lshiftrt:VECINT24
 (match_operand:VECINT24 1 gr_register_operand r)
 (match_operand:DI 2 gr_reg_or_5bit_operand rn)))]

Paolo


Re: proposal for improved management bugzilla priorities/release criteria

2009-02-09 Thread Paolo Bonzini

 - The more conservative one is to use more aggressively the release
 milestone field.  Hard-to-fix bugs would be left as P2, but bumped to
 the next major release at the beginning of stage 3.

 Advantages: no need for churn in the bug database---very easy to implement
 Disadvantages: the milestone field is not visible in search lists (maybe
 this can be changed)?
 
 I think using the milestone will get us more confused only.  We already
 have the issue that what we make a blocker (P1) for 4.4 is not a blocker
 for, say, 4.3.4.  Unless we want to start duplicating bugs for each open
 branch I'd rather not touch our target milestone policy.

Right now the target milestone is useless, as it could be computed
algorithmically:

   [4.2/4.3/4.4 regression] - milestone is next 4.2 release
   [4.3/4.4 regression] - milestone is next 4.3 release
   [4.4 regression] - milestone is next 4.4 release
   else - milestone not used

The situation that you mentioned (P1 for 4.4 but not for 4.3.4) would be
handled by having [4.3/4.4 regression] with milestone of 4.4.  This is
why I think the more aggressive usage of the milestone field would be
advantageous.

 I think the only reasonable release criteria is zero P1 regressions over
 some period.  50 P2 regressions doesn't make a release blocker, neither
 is 49 P2 regressions a clear sign for a non-blocked release.

I agree.

Paolo


Re: Constant folding and Constant propagation

2009-02-08 Thread Paolo Bonzini
Jean Christophe Beyler wrote:
 Ok, thanks for all this information and if you can dig that up it
 would be nice too.  I'll start looking at that patch and PR33699 to
 see if I can adapt them to my needs.

Here it is.

Paolo
/* Copy propagation on RTL for GNU compiler.
   Copyright (C) 2006 Free Software Foundation, Inc.

This file is part of GCC.

GCC is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free
Software Foundation; either version 2, or (at your option) any later
version.

GCC is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
for more details.

You should have received a copy of the GNU General Public License
along with GCC; see the file COPYING.  If not, write to the Free
Software Foundation, 51 Franklin Street, Fifth Floor, Boston, MA
02110-1301, USA.  */

#include config.h
#include system.h
#include coretypes.h
#include tm.h
#include rtl.h
#include obstack.h
#include basic-block.h
#include insn-config.h
#include recog.h
#include alloc-pool.h
#include timevar.h
#include tree-pass.h

/* The basic idea is to keep a table of registers with the same
   value, and replace expressions encountered so that they use the
   equivalent register.  We do this processing on extended basic
   blocks.

   Note this can turn a conditional or computed jump into a nop or
   an unconditional jump.  This is left to be cleaned up by CSE,
   for now.

   At the start of each basic block, an assignment places a register
   in a distinct group number.  During scan, when the code copies
   one register (or a related expression, see below) into another, we
   copy the quantity number.  When a register is loaded in any other
   way, we allocate a new quantity number to describe the value
   generated by this operation. `reg_group[regno].num' records what
   quantity register REGNO is currently thought of as containing.

Other expressions:

   A CLOBBER rtx in an instruction invalidates its operand for further
   reuse.

Related expressions:

   Registers that differ only by an additive integer are called related.
   Related registers share the same quantity number.  */


/* Per-group information tracking.  */
struct group_table_elem
{
  rtxreg;
  HOST_WIDE_INT  delta;

  /* Basic block in which it was defined.  */
  basic_blockbb;

  struct group_table_elem *prev, *next;
};

/* Per-register data, including register-group mapping.  */
struct reg_data
{
  /* Current group.  */
  struct group_table_elem *entry;

  /* Pointer into the table of group heads, indexed by register number.
 Always 0 for unknown values.  */
  int  num;
};


/* Length of group_head vector.  */
static int max_group;

/* Next quantity number to be allocated.
   This is 1 + the largest number needed so far.  */
static int next_group;

/* The table of all groups, indexed by group number.  */
static struct group_table_elem **group_head;


/* The table of all pseudos, indexed by regno.  */
static struct reg_data *reg_group;


/* Allocation pool.  */
static alloc_pool group_elem_pool;

/* Basic block being processed.  */
static basic_block current_bb;


/* We store the state of each basic block (not visited, part of current EBB,
   finished) in its AUX field.  These two functions return the state.  */

static inline bool
bb_visited (basic_block bb)
{
  return bb-aux != 0;
}

static inline bool
bb_active (basic_block bb)
{
  return bb-aux == (void *) 1L;
}


/* Routines to manage the data structures of this pass.  */

/* Remove the register DEST from the equivalence tables.  */

static struct group_table_elem *
remove_from_group (rtx dest)
{
  struct group_table_elem *ent = reg_group[REGNO (dest)].entry;
  int num = reg_group[REGNO (dest)].num;

  reg_group[REGNO (dest)].num = 0;
  reg_group[REGNO (dest)].entry = NULL;

  gcc_assert (ent);
  if (ent-prev)
ent-prev-next = ent-next;
  else
group_head[num] = ent-next;

  if (ent-next)
ent-next-prev = ent-prev;

  ent-prev = ent-next = NULL;
  return ent;
}



/* Check if the entry for register REG in the equivalence tables is up to date.
   Remove it and return NULL if it is not.  Otherwise, return the entry, or NULL
   if it is absent.  */

static inline struct group_table_elem *
reg_group_entry (rtx reg)
{
  struct group_table_elem *ent = reg_group[REGNO (reg)].entry;
  if (ent  !bb_active (ent-bb))
{
  remove_from_group (reg);
  return NULL;
}
  else
return ent;
}


/* Return the head of the equivalence class for REG (removing stale entries).  
*/

static struct group_table_elem *
reg_group_head (rtx reg)
{
  int num = reg_group[REGNO (reg)].num;

  struct group_table_elem *canon_ent = group_head[num];
  while (canon_ent  !bb_active (canon_ent-bb))
{
  remove_from_group 

Re: Constant folding and Constant propagation

2009-02-07 Thread Paolo Bonzini
Steven Bosscher wrote:
 On Fri, Feb 6, 2009 at 7:32 PM, Adam Nemet ane...@caviumnetworks.com wrote:
 I think you really need the Joern's optmize_related_values patch.  Also see
 PR33699.
 
 I wouldn't recommend that patch, but yes: Something that performs that
 optimization ;-)

Yes, something doing that using LCM was on my list of things to do
after fwprop to do what CSE currently does, but better.  I never got
round to implementing it, I think.  I had an LCM-based replacement for
canon_reg that I wanted to use as a basis, maybe I can dig it up.

Paolo



proposal for improved management bugzilla priorities/release criteria

2009-02-06 Thread Paolo Bonzini
The current system for managing bugzilla priorities has a major problem,
in that it does not identify bugs that reasonably cannot be fixed before
the release.

The current set of priorities is in practice like this:

- P1: most wrong code bugs, and other absolutely blocking problems
- P2: problems worth a look on important platforms
- P3: uncategorized
- P4: problems worth a look on less important platforms
- P5: other

The problem with this set is that while P1 bugs will absolutely be fixed
before the release (and backported usually), P2 bugs are a one-catch-all
group for everything else that's worth looking at.  It is impossible to
distinguish stuff that will probably be fixed before the release (and
presumably backported to all branches), and what instead requires new
stage1/stage2 material.

As a result, the release criteria we have are not really a measure of
the quality of the release, and especially are not really a measure of
the work being done towards a release.

I propose two solutions to this problem.

- The more conservative one is to use more aggressively the release
milestone field.  Hard-to-fix bugs would be left as P2, but bumped to
the next major release at the beginning of stage 3.

Advantages: no need for churn in the bug database---very easy to implement
Disadvantages: the milestone field is not visible in search lists (maybe
this can be changed)?

- Alternatively, we could add a new priority P-- for uncategorized
bugs, and split P2/P3 like this: P2 bugs will be fixed in stage 3/4, P3
bugs will most likely be postponed to stage 1/2.

Advantages: quicker impression from the bug searches, especially during
bug triage
Disadvantages: need to rethink bugzilla queries


I think any of these two approaches would provide a serious added value
to judging a release quality.  Meeting the release criteria (no more
than 50 P2 regressions) in the past included the release managers
downgrading bugs from P2 to P4, which is in my opinion cheating.  In the
proposed scheme, this would be less necessary, because the release
criteria could take into account a broader view, such as respectively
for the two approaches:

- At most 60 P2 regressions, of which at most 15 should have release
milestone 4.4.0.

- No more than 15 P2 regressions and 45 P3 regressions.

Any opinions?

Paolo



Re: GCC OpenCL ?

2009-02-01 Thread Paolo Bonzini

 I am just starting to think about adding OpenCL support into future versions 
 of
 GCC, as it looks like a useful way of programming highly parallel type 
 systems,
 particularly with hetrogeneous processors.  At this point, I am wondering what
 kind of interest people have in working together on OpenCL in the GCC 
 compiler?

I might be working on parallelization (though in LLVM) for the next one
or two years.  If I have some free time to put into GCC, I'd love to
port my work to it and to collaborate with people already working on OpenCL.

 Off hand, I think the first stage is to get OpenCL to work in a homogeneous
 multi-core system before diving into the hetrogeneous systems.

Yes, also because for example we have no access to the GPUs' instruction
set.  These papers details an experience in porting CUDA (the
predecessor to OpenCL) to multicore systems:

  http://www.gigascale.org/pubs/1278.html
  http://www.gigascale.org/pubs/1417.html
  http://impact.crhc.illinois.edu/mcuda.php

Paolo


Re: GCC OpenCL ?

2009-02-01 Thread Paolo Bonzini

 Although the OpenCL infrastructure doesn't confine itself to it, this
 compute-on-the-graphic-processor type of parallellism mostly concerns
 itself with let's do the FFT (or DGEMM) really fast on this processor
 and then return to the user.

Not really, it's not about FFT/DGEMM only -- the parallel stuff can be
expressed in a high-level language, and the communication cost is
actually something you have to consider seriously.

 If it isn't (surely not for us meteorology types) this approach is of
 limited use.

I'm pretty sure you meteorology guys can benefit quite from it.

Paolo


Re: New GCC Runtime Library Exception: not fit for purpose

2009-01-29 Thread Paolo Bonzini
Joern Rennecke wrote:
 Quoting Ian Lance Taylor i...@google.com:
 I'm not sure what your point is here.  newlib is not under the GPL in
 any case.  It is not affected by the gcc runtime library license.
 
 The old runtime library exception allowed you to distribute binaries that
 both include pieces of the gcc runtime and arbitrary pieces of newlib,
 without requiring the distribution to be under the terms of the GPL.
 I.e. your could link non-GPL code against both the gcc runtime and newlib
 and distribute it.
 The new license does not allow this unless all parts included from newlib
 are written in a high level language AND use the gcc runtime.

If they do not use the GCC runtime, why should those parts be affected
by the GCC runtime license?  If anything, the loophole in the exception
is that, if you rewrite libgcc, then you can use a non-eligible
compilation process and still distribute the result under a proprietary
license.

Paolo


Re: sizeof in initializer expression not working as expected

2009-01-29 Thread Paolo Bonzini
Bruce Korb wrote:
 Hi,
 
 I was trying to figure out how come a memory allocation was short.
 I think I've stumbled onto the issue.  evt_t is a 48 byte structure
 and tpd_uptr is a uintptr_t.  sz initializes to 52 (decimal).
 The value would be correct if I were not trying to multiply the
 size of the pointer by 4.  The result should be 64.

I think all you can do is the usual preprocessed testcase submission to
bugzilla.

Paolo


Re: x86-64 and large code model questions/bugs

2009-01-28 Thread Paolo Bonzini

 He'll get much better code by putting the program into a -fPIC .so,
 loading it from a small stub and then unmap the stub.
 large model generates really very bad code because all jumps
 will be indirect.

Is it also true with -fpie?

Paolo


Re: x86-64 and large code model questions/bugs

2009-01-28 Thread Paolo Bonzini
Andi Kleen wrote:
 On Wed, Jan 28, 2009 at 09:39:39AM +0100, Paolo Bonzini wrote:
 He'll get much better code by putting the program into a -fPIC .so,
 loading it from a small stub and then unmap the stub.
 large model generates really very bad code because all jumps
 will be indirect.
 Is it also true with -fpie?
 
 Not sure what you mean?

Right, sorry.  I meant can you also use -fpie and use a linker script
to relocate the text section, if you want to place it high but the code
is not gigantic?

 AFAIK -fpie code is the same quality as -fPIC. Both
 are much much better than large model.

Exactly.

Paolo


Re: Serious code generation/optimisation bug (I think)

2009-01-27 Thread Paolo Bonzini
James Dennett wrote:
 On Mon, Jan 26, 2009 at 11:52 PM,  zol...@bendor.com.au wrote:
 I was debugging a function and by inserting the debug statement crashed
 the system. Some investigation revealed that gcc 4.3.2 arm-eabi (compiled
 from sources) with -O2 under some circumstances assumes that if a pointer
 is dereferenced, it can not be NULL therefore explicite tests against
 NULL can be later eliminated.
 
 That's an optimization permitted by the language standard, but
 possibly unhelpful on your particular target.

Not really, he's just using the sloppiness allowed by MMU-less targets,
but he doesn't care about the value passed to Debug if tst == NULL.

However, -fno-delete-null-pointer-checks will do.

Paolo


Re: Serious code generation/optimisation bug (I think)

2009-01-27 Thread Paolo Bonzini

 However, -fno-delete-null-pointer-checks will do.
 
 Not for PTA though ;)

Care to expand?

Paolo


Re: Serious code generation/optimisation bug (I think)

2009-01-27 Thread Paolo Bonzini
 Not for PTA though ;)
 Care to expand?
 
 PTA tracks points-to-NULL as pointing to nothing.
 This probably should be conditional on -fdelete-null-pointer-checks.
 Otherwise *NULL and *anything won't alias.

Yes, you're right.  I'll see if I can construct a testcase and a patch.

BTW, I was thinking of not doing the optimization anyway on volatile
pointers.  What do you think?

Paolo



Re: Serious code generation/optimisation bug (I think)

2009-01-27 Thread Paolo Bonzini
Richard Guenther wrote:
 On Tue, Jan 27, 2009 at 11:35 AM, Paolo Bonzini bonz...@gnu.org wrote:
 Not for PTA though ;)
 Care to expand?
 PTA tracks points-to-NULL as pointing to nothing.
 This probably should be conditional on -fdelete-null-pointer-checks.
 Otherwise *NULL and *anything won't alias.
 Yes, you're right.  I'll see if I can construct a testcase and a patch.
 
 Thanks.

It is now PR38984.  Andrew's point about -fnon-call-exceptions is also
worth pondering (and that's an understatement).

 BTW, I was thinking of not doing the optimization anyway on volatile
 pointers.  What do you think?
 
 It should be taken care of automatically by loading it from memory
 before each dereference, no?

In this case, I meant more specifically the NULL test optimization that
the OP stumbled in.  That would be a GCC extension, not taking advantage
for volatile pointers of the undefinedness that the standard guarantees.

Paolo


Re: A question about SRA and out-of-bounds array accesses

2009-01-20 Thread Paolo Bonzini

 
 int *
 x(void)
 {
  register int *a asm(unknown_register);  /* { dg-error invalid register 
 } */
  int *v[1] = {a};
  return v[1];
 }
 

 I think simply scalarizing for the above testcase is ok - the behavior
 is undefined anyway.

What about moving the error to the frontend?

Paolo


Re: Feature request concerning opcodes in the function prolog

2009-01-12 Thread Paolo Bonzini

 movl.s %edi, %edi
 pushl  %ebp
 movl.s %esp, %ebp

Have you thought about making .s an assembler command-line flag, so that
this flag could be passed automatically by the compiler under mingw?

Paolo


Re: Feature request concerning opcodes in the function prolog

2009-01-12 Thread Paolo Bonzini
 For my purposes it is not really suitable, because we have to make sure that
 the push %ebp and mov %esp, %ebp are there, no matter what the compiler
 arguments are(-fomit-frame-pointer). So just adding the mov %edi, %edi isn't
 enough, and while I'm at it I can add the .s to the insns anyway. (see the
 archives for more details)

Yes, I mentioned the commandline option because you talked about 31-c0
vs. 33-c0 for xor %eax, %eax somewhere else in the thread.

Paolo


Re: libmudflap and emutls question

2009-01-07 Thread Paolo Bonzini

 Which version of gcc did you use? gcc 4.1 (maybe and 4.2) will report
 error. But gcc 4.3 compiles OK. I tested using x86_64 native gcc from
 Debian unstable.  __emutls_get_address is defined in libgcc even the
 target has real TLS.

Uff... not my day.  I used 4.2 (emutls was posted in 4.2 time but
committed in 4.3 only).  But I didn't think of the simplest solution:
use greps together with strings(1): strings ./conftest | grep
__emutls_get_address.

Paolo


Re: Compiler turns off warnings unexpectedly

2009-01-02 Thread Paolo Bonzini

 I have here an (attached) testcase which unexpectedly turns off
 warnings. Compiling it using `gcc test.c -c -Wall` (or test.i) gives:

 test.c: In function 'pam_sm_authenticate':
 test.c:6: warning: implicit declaration of function 'undef'
 This works on the trunk but fails on the 4.3 branch.

 gcc 4.1 also produces the expected output (implicit declaration undef2),
 so it seems like a recent regression.
 
 I guess filling a bug at http://gcc.gnu.org/bugzilla about this
 regression would help.

As would adding the reduced testcase to the testsuite for trunk to
ensure we don't regress again. :-)

Will do so next week if nobody beats me to it.

Paolo


Re: Official GCC git repository

2008-12-22 Thread Paolo Bonzini
Rafael Espindola wrote:
 Because the right one should have been

 $ git config --add remote.origin.fetch '+refs/heads/*:refs/remotes/origin/*'

 
 That is what git clone adds, but with that git branch -r will not
 list the remote branches.

Uhm, it does here (I don't have a GCC repo, it's another one):

$ git branch -r
  mirror/cpp
  mirror/exc-handling-alternate-fix
  mirror/filesystem
  mirror/ipv6
  mirror/magritte
  mirror/master
  mirror/omnibrowser
  mirror/opengl
  mirror/opengl-nurbs
  mirror/poll-for-win32
  mirror/pool-resolution
  mirror/roe
  mirror/sdl
  mirror/seaside
  mirror/stable-2.1
  mirror/stable-2.2
  mirror/stable-2.3
  mirror/stable-3.0
  origin/HEAD
  origin/master
  origin/stable-2.1
  origin/stable-2.2
  origin/stable-2.3
  origin/stable-3.0
  stephen/master
  stephen/pool-resolution
  stephen/stable-3.0

You can see that it also lists branches for different remotes (with
distributed version control you need many of them, maybe one per
contributor).

Have you tried (after changing the .git/config line for
remote.origin.fetch) doing a git fetch origin to refresh the list of
available branches for the origin remote?  If it works now, you probably
want to remove the files in .git/refs/remotes/*.

Paolo


Re: Official GCC git repository

2008-12-21 Thread Paolo Bonzini
Rafael Espindola wrote:
 git config --add remote.origin.fetch '+refs/remotes/*:refs/remotes/*'
 This will put the remote branch heads in refs/remotes, you might want to
 put them in refs/remotes/origin instead.

 $ git config --add remote.origin.fetch 
 '+refs/remotes/*:refs/remotes/origin/*'
 
 One small problem I have with this. When I do git branch lto
 origin/lto the generated config entry says:
 
 [branch lto]
 remote = origin
 merge = refs/heads/lto
 
 and git pull will fail.  Manually updating it to
 
 [branch lto]
 remote = origin
 merge = refs/remotes/lto

Because the right one should have been

$ git config --add remote.origin.fetch '+refs/heads/*:refs/remotes/origin/*'

?

Paolo   


Re: libjava and raw_cxx

2008-12-12 Thread Paolo Bonzini
Andreas Schwab wrote:
 Why is the libjava directory configured with raw_cxx?
 
 Makefile.def:151:target_modules = { module= libjava; raw_cxx=true; };
 
 The problem with this is that it keeps the libtool test for dynamic
 linker characteristics from working properly, due to the undefined
 reference to __gxx_personality_v0 which is defined in libstdc++.

If we weren't using libtool, it would be better to eliminate this and
instead special case the linker in libjava's Makefile.

But using libtool, it is basically a catch-22 (you need C++ in
configure, but then C++ goes in the libtool script, and then you cannot
eliminate it from the makefile).

If it bothers you (does it cause a PR?), I think it's easiest to define
a cache variable somewhere so that the test is forced to pass.  Anyway
you know you do not need to build C++ executables (only Java) in libjava.

Paolo


Re: libjava and raw_cxx

2008-12-12 Thread Paolo Bonzini
Andreas Schwab wrote:
 Paolo Bonzini bonz...@gnu.org writes:
 
 If it bothers you (does it cause a PR?),
 
 It causes a program to fail to run during build.
 
 ./gcj-dbtool -n classmap.db || touch classmap.db
 /usr/local/gcc/gcc-20081202/Build/powerpc64-suse-linux/libjava/.libs/gcj-dbtool:
  error while loading shared libraries: libgcj.so.10: cannot open shared 
 object file: No such file or directory
 
 Anyway you know you do not need to build C++ executables (only Java)
 in libjava.
 
 See above.

But that's not a C++ program, it's a Java program.

Paolo


Re: libjava and raw_cxx

2008-12-12 Thread Paolo Bonzini
 If it bothers you (does it cause a PR?),
 It causes a program to fail to run during build.

 ./gcj-dbtool -n classmap.db || touch classmap.db
 /usr/local/gcc/gcc-20081202/Build/powerpc64-suse-linux/libjava/.libs/gcj-dbtool:
  error while loading shared libraries: libgcj.so.10: cannot open shared 
 object file: No such file or directory

 Anyway you know you do not need to build C++ executables (only Java)
 in libjava.
 See above.
 But that's not a C++ program, it's a Java program.
 
 Yes, this is true.  But even though the test that sets
 shlibpath_overrides_runpath is run for every compiler, only one result
 is then used for all link commands, and that happens to be the result of
 the C++ test.

That's the bug then I'd say... Ralf what do you think?

Paolo


Re: question on optimizing calls to library functions

2008-12-11 Thread Paolo Bonzini

 The main difference that springs to mind: SIN is built-in, MATMUL is a 
 library 
 function. In gcc/builtin.defs, one finds 

Not just that: SIN is a pure (or const, depending on -frounding-math)
function, which can be subject to CSE and DCE.  I don't see anything
suggesting that for MATMUL in intrinsic.c.  In fact, since MATMUL
receives the return array by reference and writes to it, it would be
very wrong to make MATMUL const or pure.

Paolo


Re: Cygwin support

2008-11-17 Thread Paolo Bonzini
 To get around this you'd have to either
 link a separate copy of the plugin for each executable, or access the
 symbols in the executable indirectly through GetProcAddress and function
 pointers.

Hacking the compiler (or postlinker!) to emit a special constructor that
does the necessary GetProcAddress invocations seems not too hard...

Paolo


Re: GNU Hurd changes vs. GCC: ``regression fixes and docs only''

2008-11-06 Thread Paolo Bonzini
Thomas Schwinge wrote:
 Hello!
 
 We, the GNU Hurd people, would like to get GCC in a compilable/usable
 shape for us again, without needing to do the patching that was needed
 since the 4.2 release.  I have already some weeks ago sent the needed
 patches to the gcc-patches mailing list, where they have been acked by
 Paolo and Matthias.
 
 Now that my GCC copyright assignment papers are on file and my sourceware
 account has been enabled for accessing the GCC repository, I could in
 theory install the patches.  However, as I read on the homepage, GCC
 trunk is currently in ``regression fixes and docs only'' mode.  Asking on
 OFTC's #gcc channel, after having stated that ``changes that are entirely
 port specific generally have some leeway; that is, if the change can only
 affect the Hurd target, then the Hurd maintainer may approve fixes for
 serious bugs even in regression-only mode''

I'm a build system maintainer, so I can approve fixes for those bugs if
they only touch the build system and the fixes are clearly specific to
Hurd.  Global reviewers can do the same with the other fixes.

Paolo


Re: bootstrap4 vs. compare?

2008-11-02 Thread Paolo Bonzini

 though that is probably inadequate.

Especially because Makefile.in is automatically generated. :-)

 It's not the default goal that matters, but if bootstrap4 is a goal at all.
 Or if compare3 is a goal.

I have a (correct) patch which I'll apply in a day or two.

Thanks,

Paolo


Re: -fno-ira removal

2008-10-23 Thread Paolo Bonzini
 The following ports haven't been converted yet:
 
 arc m32c m68hc11 mmix pdp11 score vax

DJ has reported problems on the list for m32c.

Regarding ARC and MMIX we might expect some action from Joern and H-P
respectively, but nobody is probably going to do the work for the others

Paolo


Re: Support for NT based OS on ARM.

2008-10-09 Thread Paolo Bonzini
Farlie A wrote:
 Hi,
 
 Would you be willing to consider supporting the PE object formats on the
 ARM based port of ReactOS?

If you are willing to contribute code for this, that's possible indeed.
 Otherwise, no one will probably do the work.

Paolo


Re: Apple, iPhone, and GPLv3 troubles

2008-09-25 Thread Paolo Bonzini
 This means that you couldn't use *GCC* if you
 did something the FSF found objectionable, closing an easy
 work-around.
 
 This doesn't work, because it breaks out of the basic framework of
 copyright law.  Nobody signs anything or accepts any terms in order to
 use gcc.  The FSF wants to stop people from distributing proprietary
 binary plugins to gcc.  The copyright on gcc does not apply to those
 plugins.

Also, even if you could develop a license similar to the GPL but with an
additional restriction to this end, this would not be the GPL anymore,
because GPLv3 limits the non-permissive additional terms to the ones
listed in Section 7:

  a) [disclaiming warranty]

  b) [requiring preservation of legal notices]

  c) [requiring that modified versions of such material be marked]

  d) [limiting the use for publicity purposes of names]

  e) [declining to grant rights for use of some trademarks]

  f) [the Apache License's patent indemnification clause]

  All other non-permissive additional terms are considered further
  restrictions within the meaning of section 10.  If the Program as you
  received it, or any part of it, contains a notice stating that it is
  governed by this License along with a term that is a further
  restriction, you may remove that term.

Even without considering the feasibility of adding such a bullet, it
would not be a good PR move for the FSF to add a bullet g to the above
list for the sake of GCC.  The possibility of modifying the permissive
terms of the runtime libraries (which are distributed with every binary
produced by the compiler) does seem like a good way to control the
*usage* of the compiler as opposed to its distribution.

Paolo


Re: Apple, iPhone, and GPLv3 troubles

2008-09-24 Thread Paolo Bonzini
 Off-topic, but I feel this is important, since Apple contributed to gcc,
 and it is licensed under GPLv3 now.

The license of GCC does not matter, unless the iPhone includes a copy of
GCC's binaries for a recent-enough version.  In which case, of course,
Apple would be violating the GPLv3 and you should tell the FSF.


[offtopic parts rot13'd]

 V fgvyy ubcr vg pna or fbyirq

Jung fubhyq or fbyirq?  Npghnyyl, jung pbhyq or fbyirq?

 OGJ, gur TCYi2 unf abar bs gur pynhfrf gung pnhfr gur gebhoyr Nccyr vf
 snpvat jvgu gur TCYi3 naq gur vCubar, fb vg vf BX.

Fbzr znl guvax gung gur TCYi3 vf gur bar gung vf BX...

Naljnl, va pnfr lbh jrera'g njner, vg'f abg whfg gur TCYi3 gung vf
nssrpgrq.  Orpnhfr bs gur AQN gurl fubiry qbja lbhe guebng jura lbh
qbjaybnq gur FQX, Nccyr rssrpgviryl qrpvqrq gb ybpx nal xvaq bs bcra
fbhepr fbsgjner (abg whfg serr fbsgjner) bhg bs gur vCubar.  Ab ovt
qrny, gung zrnaf gung V jba'g jevgr cebtenzf sbe gur vCubar V qba'g bja.

Paolo


Apple-employed maintainers (was Re: Apple, iPhone, and GPLv3 troubles)

2008-09-24 Thread Paolo Bonzini
Peter O'Gorman wrote:
 Yuhong Bao wrote:
 and Apple uses GCC (which is now under GPLv3) and Mac OS X on it.
 Unfortunately, the iPhone is incompatible with GPLv3, if you want more see
 the link I mentioned.
 
 Apple does not use a GPLv3 version of GCC.

Ah, actually I think I now see the OP's point.  Apple is scared of the
GPLv3 because the iPhone might violate it, so they are not contributing
to anything that falls under the GPLv3.

It is indeed in-topic.  There are four Darwin maintainers listed in
MAINTAINERS:

darwin port Dale Johannesen [EMAIL PROTECTED]
darwin port Mike Stump  [EMAIL PROTECTED]
darwin port Eric Christopher[EMAIL PROTECTED]
darwin port Stan Shebs  [EMAIL PROTECTED]

and three of them are not allowed to read the GCC patches mailing list.
   They might do something if CCed, but not necessarily so.  Same for
Objective-C/C++:

objective-c/c++ Mike Stump  [EMAIL PROTECTED]
objective-c/c++ Stan Shebs  [EMAIL PROTECTED]


Now I wonder:

1) does it make sense to keep a maintainer category that is known to be
inactive?

2) who should then get maintainership of darwin?  note that there are
some patches for darwin like this one:
http://article.gmane.org/gmane.comp.gcc.patches/172498

It's sad, but I think that there is need for the SC to take action on this.

Paolo



Re: Apple-employed maintainers (was Re: Apple, iPhone, and GPLv3 troubles)

2008-09-24 Thread Paolo Bonzini
 Well at least that explains their total inactivity in the last year. Is Dale
 the one still allowed to read the gcc-patches mailing list?

No, that would be Stan just because he's not at Apple.

It must be said also that Mike Stump accepted to review/discuss
Darwin/ObjC patches that he was CCed on, but most people don't know that
they need to do so.

As a side note, Mike also wrote this last February:

 The SC knows of the issue

Still, after six months it would be nice to have a clearer idea of what
will happen with respect to Darwin/ObjC, especially since the previous
statement (which I suppose was as clear as Mike could do) was buried
under an unrelated thread.

Paolo


Re: Apple, iPhone, and GPLv3 troubles

2008-09-24 Thread Paolo Bonzini
Steven Bosscher wrote:
 On Wed, Sep 24, 2008 at 4:06 PM, Ian Lance Taylor [EMAIL PROTECTED] wrote:
 Apple's dislike of GPLv3 is a problem for gcc, yes.
 
 Well, excuse me for being a-political, but I don't see this problem.
 The relationship between GCC and Apple has never been really good
 AFAIK, but that hasn't hampered either to be quite successful.

I agree with you, but if you don't look at GCC as a whole -- but rather
at the small intersection represented by FSF GCC on Darwin -- it *has*
hampered it.

Apple GCC is basically a fork nowadays, and it is often impossible to
compile Leopard application using FSF GCC (in turn because of the lack
of Objective-C 2.0 support).  Sometimes I wonder why Darwin is still
part of FSF GCC, just like it is not supported in binutils or gdb... I
guess just for the sake of GCC developers that are working on a Mac.

Even outside *-*-darwin*, what caused the development of two separate
Objective-C runtimes, the one in FSF GCC being a big chainball for the
removal of dead code from the compiler?  Note that basically all
Objective-C code in existence either does not care about the runtime, or
has support for both runtimes; so it would not be a problem to deprecate
libobjc if Apple contributed their own implementation.  (There is now a
third runtime, named Étoilé).

Paolo

ps: of course, there is no offense intended for poor Mike who's CCed in
this thread.


Re: C/C++ FEs: Do we really need three char_type_nodes?

2008-09-20 Thread Paolo Bonzini
Diego Novillo wrote:
 On Fri, Sep 19, 2008 at 12:55, Jakub Jelinek [EMAIL PROTECTED] wrote:
 On Fri, Sep 19, 2008 at 12:36:12PM -0400, Diego Novillo wrote:
 When we instantiate char_type_node in tree.c:build_common_tree_nodes
 we very explicitly create a char_type_node that is signed or unsigned
 based on the value of -funsigned-char, but instead of make
 char_type_node point to signed_char_type_node or
 unsigned_char_type_node, we explicitly instantiate a different type.
 C++ e.g. requires that char (c) is mangled differently from unsigned char
 (h) and signed char (a), it is a distinct type.
 
 Thanks, that answer my question.

But does it need to be streamed out differently?  I mean, char_type_node
could be streamed out as signed_char_type_node or
unsigned_char_type_node, because the mangling has already been done.

Paolo


Re: worst case register classes (Was: Re: IRA_COVER_CLASSES for m32c)

2008-09-12 Thread Paolo Bonzini

 I think our mxp is more 'interesting'. [snip]

I think it's more like 'insane', :-) and a miracle that a retargetable
compiler can be ported to it.

Paolo


Re: extra instructions lost from -O0 to -O1

2008-09-12 Thread Paolo Bonzini
Thomas A.M. Bernard wrote:
 Well I found another way to solve the problem by updating the dce for
 not taking out my instructions.
 
 I inserted setallocate as a native operator in the back-end which
 comes from a GIMPLE node and map to the RTL pattern. Earlier in the
 discussion, it's been discussed that the dce was taking out the
 instruction when flag -O1 was engaged. To solve that, in
 'tree-ssa-dce.c', I flagged this node with the function,
 mark_stmt_necessary. And it works fine so far. The instruction is not
 omitted anymore by the dce :-)

Do not add it as a GIMPLE node.  Add it as a builtin function, so that
the tree-level DCE will treat like every other call and not remove it.

IOW, do not add new kinds of node.  Use builtins for trees, and unspecs
for RTL.

Paolo


Re: extra instructions lost from -O0 to -O1

2008-09-11 Thread Paolo Bonzini
Thomas A.M. Bernard wrote:
 I have tried unspec_volatile without success though. As
 follow,
 
 (define_insn setallocate
[(setallocate
(unspec_volatile:DI [ (match_operand:DI 0 general_operand
 r)]
UNSPEC_ALLOCATE)
)]
  
  allocate %0\t\t#TCB_INSTRUCTIONS[(set_attr type multi)])

This more or less should work, except that it completely subsumes the
SETALLOCATE rtx code that you've added.  The pattern should just be

 [(unspec_volatile:DI [(match_operand:DI 0 general_operand r)]
  UNSPEC_ALLOCATE)]

Paolo


Re: Passing LDFLAGS to stage2 and stage3 gcc

2008-09-09 Thread Paolo Bonzini
Rainer Emrich wrote:
 [EMAIL PROTECTED] schrieb:
 Rainer Emrich [EMAIL PROTECTED] wrote:
 So I wan't to pass LDFLAGS=-Wl, -rpath, /somedir to stage3 to link gcc, 
 cpp,
 etc. with the rpath information.
 I do this by editing LDFLAGS_FOR_TARGET in the top-level Makefile.in,
 and also passing LDFLAGS, BOOT_LDFLAGS, and HOST_LDFLAGS assignments
 as arguments to make.  I'm not cross-compiling, though, so you may
 have to adjust that somewhat.
 
 Paul,
 
 thank's for the hint.

HOST_LDFLAGS does not exist, and you should be able to pass
LDFLAGS_FOR_TARGET on the command-line.  Anyways, for your needs all you
have to do is

   make BOOT_LDFLAGS=-Wl,-rpath,/somedir

Paolo


Re: [PATCH] Update libtool to latest git tip

2008-09-08 Thread Paolo Bonzini

 Well, libtool-2.2.6 is finally released (twice even).
 
 Actual approval depends on your answer to this question, but the patch is 
 technically okay.  Can you commit it to the src repository too? There is 
 some regeneration to do there too.
 
 I know that GCC is now in stage 3, and that we missed the end of stage 1
 by a week, but I would still like to update gcc's libtool to 2.2.6.

It fixed a Darwin bug, right?

Paolo


Re: [PATCH] Update libtool to latest git tip

2008-09-08 Thread Paolo Bonzini
Peter O'Gorman wrote:
 On Mon, Sep 08, 2008 at 08:29:37PM +0200, Paolo Bonzini wrote:
 Well, libtool-2.2.6 is finally released (twice even).

 Actual approval depends on your answer to this question, but the patch is 
 technically okay.  Can you commit it to the src repository too? There is 
 some regeneration to do there too.
 I know that GCC is now in stage 3, and that we missed the end of stage 1
 by a week, but I would still like to update gcc's libtool to 2.2.6.
 It fixed a Darwin bug, right?

 
 Yes, though I do not know if Jack actually filed a PR for it, it was
 about debugging libstdc++ on darwin.

Post an updated patch and, next week, I'll apply it.

Paolo


Re: PR37363: PR36090 and PR36182 all over again

2008-09-07 Thread Paolo Bonzini

 As H-P says, the predicates on move expanders are generally ignored.
 emit_move_insn  subroutines deliberately don't check them.

It's even worse; force_reg is effectively hardcoding movXX's operand 1
to be a general_operand.  (But my point was that force_reg does use
LEGITIMATE_CONSTANT_P through general_operand).

 Not necessarily; anything that's found in a non-legitimate constant must
 be handled by force_reg, and force_reg also tries using force_operand if
 what it gets is not a general_operand.  But maybe it's necessary to add a

   if (GET_CODE (value) == CONST)
 value = XEXP (value, 0);

 in force_operand.
 
 As you say, force_operand currently does nothing with constants.
 My understanding is that that really is by design (in the loosest
 possible sense of the word).  As H-P says, it's then the move expander's
 responsibility to handle the thing.

force_reg is weird:

1) it tries the move expander if operand 1 is a general_operand first;

2) it tries force_operand if it is not;

3) it tries the move expander if force_operand fails.

It would make sense to have something like: 1) check the move expander's
predicate; 2) try force_operand; 3) abort.  But I agree that it is not a
lightweight to change it, and I wouldn't propose it -- especially now.
OTOH every message in this thread is highlighting something fishy.

 would in some cases be accurate.)  I think using an unspec in rs6000
 would solve some of the port-specific issues.  In particular, I don't
 think 36090 would have happened with an unspec representation.

I agree.  So your plan would be to change rs6000 to an unspec, and drop
the problematic hunk in simplify-rtx.c?  That would be okay with me, but
it's not a small change for rs6000.

Paolo


Re: PR37363: PR36090 and PR36182 all over again

2008-09-07 Thread Paolo Bonzini

 Only with a LEGITIMATE_CONSTANT_P catching it...

Of course.

 So, can we agree on some or all of:
 
 1. This (PR37363/PR36182) and PR36090 (in both ports) and
whatever other port will be affected should be solved by a
stricter LEGITIMATE_CONSTANT_P check, and where
canonicalization is undefined (and a new definition can't get
consensus agreed upon), the port has to check itself for
whatever RTL expression it accepts.
 
 2. Change the LEGITIMATE_CONSTANT_P documentation.
 
 3. Change the default of LEGITIMATE_CONSTANT_P to a helper
function, maybe trivial_constant_expression_p above.

Agreed, but I don't see t_c_e_p in GCC sources (if you meant my function
using the predicate, it cannot work because the predicate might in turn
call LEGITIMATE_CONSTANT_P).  It could be

  if (GET_CODE (x) != CONST)
return true;

  x = XEXP (x, 0);
  return GET_CODE (x) == PLUS  GET_CODE (XEXP (x, 1)) == CONST_INT
  (GET_CODE (XEXP (x, 0)) == SYMBOL_REF
 || GET_CODE (XEXP (x, 0)) == LABEL_REF);

(i.e. the test in cse.c) or something like that.

Would you change simplify-rtx.c to test LEGITIMATE_CONSTANT_P before
wrapping something with a CONST?

Alternatively, I wouldn't mind see rs6000 use unspecs for GOT/TOC
offsets as other ports do; this would allow removing the optimization in
simplify_plus_minus, which would fix CRIS too (because I'm worried that
other targets might be affected, not just CRIS).  Of course, if that
gives known pessimizations on rs6000 it would not be a good thing to do,
and probably no one would volunteer to do that change anyway, so...

Paolo


Re: PR37363: PR36090 and PR36182 all over again

2008-09-06 Thread Paolo Bonzini
Hans-Peter Nilsson wrote:
 Date: Fri, 5 Sep 2008 14:57:00 +0200
 From: Hans-Peter Nilsson [EMAIL PROTECTED]
 
 Maybe as part of a change from target macro to target hook, with
 LEGITIMATE_CONSTANT_P as a default would fit, even at this
 stage?
 
 Sorry, I mean CONSTANT_P, not LEGITIMATE_CONSTANT_P.  Or maybe a
 new macro or hook

What about replacing the problematic uses of gen_rtx_CONST with
plus_constant (x, 0)?  plus_constant knows when to make a CONST rtx.

There are just a handful of places where this would be needed: instead
of the check after the wrong comment in cse.c, and everywhere
gen_rtx_CONST is used in simplify-rtx.c.

Paolo


Re: PR37363: PR36090 and PR36182 all over again

2008-09-06 Thread Paolo Bonzini
Paolo Bonzini wrote:
 Hans-Peter Nilsson wrote:
 Date: Fri, 5 Sep 2008 14:57:00 +0200
 From: Hans-Peter Nilsson [EMAIL PROTECTED]
 Maybe as part of a change from target macro to target hook, with
 LEGITIMATE_CONSTANT_P as a default would fit, even at this
 stage?
 Sorry, I mean CONSTANT_P, not LEGITIMATE_CONSTANT_P.  Or maybe a
 new macro or hook
 
 What about replacing the problematic uses of gen_rtx_CONST with
 plus_constant (x, 0)?  plus_constant knows when to make a CONST rtx.
 
 There are just a handful of places where this would be needed: instead
 of the check after the wrong comment in cse.c, and everywhere
 gen_rtx_CONST is used in simplify-rtx.c.

Here is a prototype patch, untested.

Paolo
2008-09-06  Paolo Bonzini  [EMAIL PROTECTED]

* explow.c (plus_constant): Don't exit early if c == 0, to allow
canonicalizing CONSTs.
* cse.c (fold_rtx): Use plus_constant instead of wrapping with CONST.
* simplify-rtx.c (simplify_plus_minus): Likewise.

Index: cse.c
===
--- cse.c   (revision 134435)
+++ cse.c   (working copy)
@@ -3161,10 +3161,8 @@ fold_rtx (rtx x, rtx insn)
   FIXME: those ports should be fixed.  */
if (new != 0  is_const
 GET_CODE (new) == PLUS
-(GET_CODE (XEXP (new, 0)) == SYMBOL_REF
-   || GET_CODE (XEXP (new, 0)) == LABEL_REF)
 GET_CODE (XEXP (new, 1)) == CONST_INT)
- new = gen_rtx_CONST (mode, new);
+ new = plus_constant (XEXP (new, 0), XEXP (new, 1));
+   else
+ new = plus_constant (new, 0);
   }
   break;
 
Index: simplify-rtx.c
===
--- simplify-rtx.c  (revision 140055)
+++ simplify-rtx.c  (working copy)
@@ -3625,7 +3625,7 @@ simplify_plus_minus (enum rtx_code code,
tem = simplify_binary_operation (ncode, mode, tem_lhs, 
tem_rhs);
 
if (tem  !CONSTANT_P (tem))
- tem = gen_rtx_CONST (GET_MODE (tem), tem);
+ tem = plus_constant (tem, 0);
  }
else
  tem = simplify_binary_operation (ncode, mode, lhs, rhs);
@@ -3690,7 +3690,7 @@ simplify_plus_minus (enum rtx_code code,
GET_CODE (ops[i].op) == GET_CODE (ops[i - 1].op))
 {
   ops[i - 1].op = gen_rtx_MINUS (mode, ops[i - 1].op, ops[i].op);
-  ops[i - 1].op = gen_rtx_CONST (mode, ops[i - 1].op);
+  ops[i - 1].op = plus_constant (ops[i - 1].op, 0);
   if (i  n_ops - 1)
ops[i] = ops[i + 1];
   n_ops--;
@@ -5247,7 +5247,7 @@ simplify_subreg (enum machine_mode outer
GET_MODE_BITSIZE (innermode) = (2 * GET_MODE_BITSIZE (outermode))
GET_CODE (XEXP (op, 1)) == CONST_INT
(INTVAL (XEXP (op, 1))  (GET_MODE_BITSIZE (outermode) - 1)) == 0
-   INTVAL (XEXP (op, 1))  GET_MODE_BITSIZE (innermode)  
+   INTVAL (XEXP (op, 1))  GET_MODE_BITSIZE (innermode)
byte == subreg_lowpart_offset (outermode, innermode))
 {
   int shifted_bytes = INTVAL (XEXP (op, 1)) / BITS_PER_UNIT;
Index: explow.c
===
--- explow.c(revision 134435)
+++ explow.c(working copy)
@@ -83,9 +83,6 @@ plus_constant (rtx x, HOST_WIDE_INT c)
   rtx tem;
   int all_constant = 0;
 
-  if (c == 0)
-return x;
-
  restart:
 
   code = GET_CODE (x);


Re: PR37363: PR36090 and PR36182 all over again

2008-09-06 Thread Paolo Bonzini
 I'm not sure about this bit. Couldn't [snip cse.c code]
 simply be replaced by:
 
   /* We can't simplify extension ops unless we know the
original mode.  */
   if ((code == ZERO_EXTEND || code == SIGN_EXTEND)
  mode_arg0 == VOIDmode)
   break;
 
   new = simplify_unary_operation (code, mode,
 const_arg0 ? const_arg0 : folded_arg0,
 mode_arg0);
 ?
 
 (Sorry if I'm repeating earlier discussion here.)

I think so -- I was just trying to resemble the existing code as much as
possible (stage3), but it's probably better to clean up instead.  What
do you thing about the simplify-rtx.c part instead?

Paolo
2008-09-06  Paolo Bonzini  [EMAIL PROTECTED]

* cse.c (fold_rtx): Let simplify_unary_operation handle CONSTs.
* explow.c (plus_constant): Don't exit early if c == 0, to allow
canonicalizing CONSTs.
* simplify-rtx.c (simplify_plus_minus): Likewise.

Index: cse.c
===
--- cse.c   (revision 134435)
+++ cse.c   (working copy)
@@ -3138,33 +3138,20 @@ fold_rtx (rtx x, rtx insn)
 {
 case RTX_UNARY:
   {
-   int is_const = 0;
-
/* We can't simplify extension ops unless we know the
   original mode.  */
if ((code == ZERO_EXTEND || code == SIGN_EXTEND)
 mode_arg0 == VOIDmode)
  break;
 
-   /* If we had a CONST, strip it off and put it back later if we
-  fold.  */
+   /* If we had a CONST, strip it off and let simplify_unary_operation
+  put it back if it can simplify something.  */
if (const_arg0 != 0  GET_CODE (const_arg0) == CONST)
- is_const = 1, const_arg0 = XEXP (const_arg0, 0);
+ const_arg0 = XEXP (const_arg0, 0);
 
new = simplify_unary_operation (code, mode,
const_arg0 ? const_arg0 : folded_arg0,
mode_arg0);
-   /* NEG of PLUS could be converted into MINUS, but that causes
-  expressions of the form
-  (CONST (MINUS (CONST_INT) (SYMBOL_REF)))
-  which many ports mistakenly treat as LEGITIMATE_CONSTANT_P.
-  FIXME: those ports should be fixed.  */
-   if (new != 0  is_const
-GET_CODE (new) == PLUS
-(GET_CODE (XEXP (new, 0)) == SYMBOL_REF
-   || GET_CODE (XEXP (new, 0)) == LABEL_REF)
-GET_CODE (XEXP (new, 1)) == CONST_INT)
- new = gen_rtx_CONST (mode, new);
   }
   break;
 
Index: simplify-rtx.c
===
--- simplify-rtx.c  (revision 140055)
+++ simplify-rtx.c  (working copy)
@@ -3625,7 +3625,7 @@ simplify_plus_minus (enum rtx_code code,
tem = simplify_binary_operation (ncode, mode, tem_lhs, 
tem_rhs);
 
if (tem  !CONSTANT_P (tem))
- tem = gen_rtx_CONST (GET_MODE (tem), tem);
+ tem = plus_constant (tem, 0);
  }
else
  tem = simplify_binary_operation (ncode, mode, lhs, rhs);
@@ -3690,7 +3690,7 @@ simplify_plus_minus (enum rtx_code code,
GET_CODE (ops[i].op) == GET_CODE (ops[i - 1].op))
 {
   ops[i - 1].op = gen_rtx_MINUS (mode, ops[i - 1].op, ops[i].op);
-  ops[i - 1].op = gen_rtx_CONST (mode, ops[i - 1].op);
+  ops[i - 1].op = plus_constant (ops[i - 1].op, 0);
   if (i  n_ops - 1)
ops[i] = ops[i + 1];
   n_ops--;
@@ -5247,7 +5247,7 @@ simplify_subreg (enum machine_mode outer
GET_MODE_BITSIZE (innermode) = (2 * GET_MODE_BITSIZE (outermode))
GET_CODE (XEXP (op, 1)) == CONST_INT
(INTVAL (XEXP (op, 1))  (GET_MODE_BITSIZE (outermode) - 1)) == 0
-   INTVAL (XEXP (op, 1))  GET_MODE_BITSIZE (innermode)  
+   INTVAL (XEXP (op, 1))  GET_MODE_BITSIZE (innermode)
byte == subreg_lowpart_offset (outermode, innermode))
 {
   int shifted_bytes = INTVAL (XEXP (op, 1)) / BITS_PER_UNIT;
Index: explow.c
===
--- explow.c(revision 134435)
+++ explow.c(working copy)
@@ -83,9 +83,6 @@ plus_constant (rtx x, HOST_WIDE_INT c)
   rtx tem;
   int all_constant = 0;
 
-  if (c == 0)
-return x;
-
  restart:
 
   code = GET_CODE (x);


Re: PR37363: PR36090 and PR36182 all over again

2008-09-06 Thread Paolo Bonzini

 if plus_constant _knows_ that something can be wrapped in a CONST,
 simplify_binary_operation should have given us the CONST to begin with.
 Also, the only cases that plus_constant can handle are CONST,
 SYMBOL_REF and LABEL_REF, all of which satisfy CONSTANT_P.
 So the new form ought to be dead on two counts.

Yes, and in the other case too:

  ops[i - 1].op = gen_rtx_MINUS (mode, ops[i - 1].op, ops[i].op);
  ops[i - 1].op = plus_constant (ops[i - 1].op, 0);

plus_constant won't understand the MINUS, and won't generate a CONST.

Still, having a new target hook for this seems overkill.  For example,
since ports do have to deal with complicated constants when they expand
moves, and since some of them already look inside CONSTs in their
LEGITIMATE_CONSTANT_P, another possibility to throw in the air is
something like (better names welcome...)

rtx
avoid_terrible_constants (rtx x)
{
  if (!CONSTANT_P (x))
x = gen_rtx_CONST (x);

  /* If the target's move expanders will take care of it,
 it must not be that bad.  */
  icode = optab_handler (mov_optab, GET_MODE (x))-insn_code;
  if (*insn_data[icode].operand[1].predicate (x, GET_MODE (x)))
return x;

  return NULL;
}

In case of cris, the predicate goes into general_operand, which does

  if (CONSTANT_P (op))
return ((GET_MODE (op) == VOIDmode || GET_MODE (op) == mode
 || mode == VOIDmode)
 (! flag_pic || LEGITIMATE_PIC_OPERAND_P (op))
 LEGITIMATE_CONSTANT_P (op));

H-P can check for the problematic case inside his LEGITIMATE_CONSTANT_P
(*), or add a move expander for it.

  (*) but then does this mean the documentation for L_C_P is obsolete,
  and returning 1 is not necessarily a good thing to do for targets with
  sections?  Maybe there is a better definition that can be the default?

Anyway, at least how to use this function is pretty obvious:

tem_rhs = GET_CODE (rhs) == CONST ? XEXP (rhs, 0) : rhs;
tem = simplify_binary_operation (ncode, mode,
tem_lhs, tem_rhs);

-   if (tem  !CONSTANT_P (tem))
- tem = gen_rtx_CONST (GET_MODE (tem), tem);
+   if (tem)
+ tem = avoid_terrible_constants (tem);
  }
else
  tem = simplify_binary_operation (ncode, mode, lhs, rhs);


...


CONSTANT_P (ops[i].op)
GET_CODE (ops[i].op) == GET_CODE (ops[i - 1].op))
 {
-  ops[i - 1].op = gen_rtx_MINUS (mode, ops[i - 1].op, ops[i].op);
-  ops[i - 1].op = gen_rtx_CONST (mode, ops[i - 1].op);
-  if (i  n_ops - 1)
-   ops[i] = ops[i + 1];
-  n_ops--;
+  rtx x;
+  x = gen_rtx_MINUS (mode, ops[i - 1].op, ops[i].op);
+  x = avoid_terrible_constants (x);
+  if (x)
+   {
+  ops[i - 1].op = x;
+  if (i  n_ops - 1)
+   ops[i] = ops[i + 1];
+  n_ops--;
+}
 }

   if (n_ops  1


I'm absolutely unsure that this is the way to go; but it has two
advantages: 1) not leaking really bad constants outside simplify-rtx.c;
2) it makes clear how to fix bugs -- you restrict
LEGITIMATE_CONSTANT_P/LEGITIMATE_PIC_OPERAND_P or add a move expander.

Paolo


Re: PR37363: PR36090 and PR36182 all over again

2008-09-06 Thread Paolo Bonzini

 In case of cris, the predicate goes into general_operand, which does

   if (CONSTANT_P (op))
 return ((GET_MODE (op) == VOIDmode || GET_MODE (op) == mode
  || mode == VOIDmode)
  (! flag_pic || LEGITIMATE_PIC_OPERAND_P (op))
  LEGITIMATE_CONSTANT_P (op));

 H-P can check for the problematic case inside his LEGITIMATE_CONSTANT_P
 (*), or add a move expander for it.
 
 I think you're mixing up CRIS and rs6000, the latter which
 generated something it had to handle but which was munged,
 PR36090.  CRIS is mainstream in that sense.  (You'd have to get
 buy-in from David Edelsohn on a LEGITIMATE_CONSTANT_P definition
 in rs6000 if PR36090 resurfaces.)

This is from CRIS:

(define_expand movsi
  [(set
(match_operand:SI 0 nonimmediate_operand )
(match_operand:SI 1 cris_general_operand_or_symbol ))]
  ...

(define_special_predicate cris_general_operand_or_symbol
  (ior (match_operand 0 general_operand)
   (and (match_code const, symbol_ref, label_ref)
...

 Did you mean this as a short-term or long-term solution?
 (Mind, we already have a proposed short-term solution.)

As a long term solution.  Though not in that exact shape -- I wanted to
have discussion on it and converge together to a real solution.

   (*) but then does this mean the documentation for L_C_P is obsolete,
   and returning 1 is not necessarily a good thing to do for targets with
   sections?  Maybe there is a better definition that can be the default?
 
 Again, LEGITIMATE_CONSTANT_P is the wrong macro, it's for
 checking constants which are appropriate as immediate operands
 (to non-move insns), not for being at-all-legitimate.

LEGITIMATE_CONSTANT_P is just what is used by general_operand.  I'm
proposing another use of *the predicate for mov's operand 1*, not of
LEGITIMATE_CONSTANT_P.  With the above questions, I was expressing my
doubts on the doc for LEGITIMATE_CONSTANT_P in general.

 Signalling that they are not legitimate means they can still be
 handled by a move.

That's why I used the predicate.

 2) it makes clear how to fix bugs -- you restrict
 LEGITIMATE_CONSTANT_P/LEGITIMATE_PIC_OPERAND_P or add a move expander.
 
 Contradicting current use, where anything that's found in a
 non-LEGITIMATE_CONSTANT_P/LEGITIMATE_PIC_OPERAND_P must be
 handled by a move expander!

Not necessarily; anything that's found in a non-legitimate constant must
be handled by force_reg, and force_reg also tries using force_operand if
what it gets is not a general_operand.  But maybe it's necessary to add a

  if (GET_CODE (value) == CONST)
value = XEXP (value, 0);

in force_operand.

 To wit: a new bug would surface: you could here form something
 that wasn't LEGITIMATE_CONSTANT_P but which was handled by a
 move expander, and you'd force this into an insn which *isn't* a
 move.  N.B. the insn in PR36182 wasn't a move.

Shouldn't the insn fail recognization, then?

 (FWIW, I'll add a LEGITIMATE_CONSTANT_P to CRIS just to cover my
 bases.  It won't solve the basic problem, because that could
 just cause that invalid CONST contents in PR37363 and PR36182 to
 end up in a move insn instead.)

I don't think so, because general_operand would pass the CONST to your
LEGITIMATE_CONSTANT_P, and hence cause it to be rejected.

Paolo



Re: PR37363: PR36090 and PR36182 all over again (was: Re: Call for testers, ppc64-linux)

2008-09-05 Thread Paolo Bonzini

 I got negative feedback on that patch (no, not regression
 results :) on IRC from David Edelsohn and understandably you
 held off your testing because of this, as for one the patch
 affects the rs6000 backend.

What kind of negative feedback?

 For CRIS (as well as other targets IIUC) the cause of PR37363 is
 that there's code that wraps a MINUS of two symbol_ref's in a
 CONST without checking that the two symbol_ref's make up a valid
 address.  After that, the CONST effectively acts as a barrier
 for target hooks (no need to look, we know that thing there is
 a valid constant expression).

The three possibilities I see are:

1) removing the wrapping CONST?

2) using the patch in
http://gcc.gnu.org/bugzilla/attachment.cgi?id=15620action=view which
however just papers around this problem.

3) adding a check that the MINUS is a legitimate address, and only wrap
it in CONST if it is.

Paolo


Re: PR37363: PR36090 and PR36182 all over again

2008-09-05 Thread Paolo Bonzini

 3) adding a check that the MINUS is a legitimate address, and only wrap
 it in CONST if it is.
 
 s/address/constant/; it's not clear that it's used as an address
 at that point; it's just two expressions that gcc tries to
 reduce.

Right.

 But I get the point; I'm leaning towards something like
 strengthening that it's a legitimate constant.  See
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36182#c12 and
 other comments in that PR.  But... should we really redefine
 LEGITIMATE_CONSTANT_P and its documentation at this stage?

We can do it incrementally.  For now, only redefine
LEGITIMATE_CONSTANT_P on CRIS and in the documentation, and use it in
simplify_plus_minus.  For 4.5, we can look at other places using
gen_rtx_CONST and strengthen them too.

Paolo


Re: [PATCH] Use lwsync in PowerPC sync_* builtins

2008-09-04 Thread Paolo Bonzini
David Edelsohn wrote:
 On Wed, Sep 3, 2008 at 6:53 PM, Anton Blanchard [EMAIL PROTECTED] wrote:
 The only thing lwsync wont order is a store followed by a load. Since
 the lwsync will always be paired with a store (the stwcx), we will order
 all accesses before it and provide a release barrier.
 
 Anton,
 
 My one other concern is developers using the builtins for applications on
 embedded PowerPC processors.  lwsync will not order accesses to device
 memory space, AFAICT.

Don't you need eieio+sync for that?  GCC does not generate the eieio now.

Paolo


<    5   6   7   8   9   10   11   12   13   14   >