Re: [RL78] Questions about code-generation

2014-03-21 Thread Jeff Law

On 03/21/14 18:35, DJ Delorie wrote:


I've found that "removing uneeded moves through registers" is
something gcc does poorly in the post-reload optimizers.  I've written
my own on some occasions (for rl78 too).  Perhaps this is a good
starting point to look at?


much needless copying, which strengthens my suspicion that it's
something in the RL78 backend that needs 'tweaking'.


Of course it is, I've said that before I think.  The RL78 uses a
virtual model until reload, then converts each virtual instructions
into multiple real instructions, then optimizes the result.  This is
going to be worse than if the real model had been used throughout
(like arm or x86), but in this case, the real model *can't* be used
throughout, because gcc can't understand it well enough to get through
regalloc and reload.  The RL78 is just to "weird" to be modelled
as-is.

I keep hoping that gcc's own post-reload optimizers would do a better
job, though.  Combine should be able to combine, for example, the "mov
r8,ax; cmp r8,#4" types of insns together.
The virtual register file was the only way I could see to make RL78 
work.  I can't recall the details, but when you described the situation 
to me the virtual register file was the only way I could see to make the 
RL78 work in the IRA+reload world.


What would be quite interesting to try would be to continue to use the 
virtualized register set, but instead use the IRA+LRA path.  Presumably 
that wouldn't be terribly hard to try and there's a reasonable chance 
that'll improve the code in a noticeable way.


The next obvious thing to try, and it's probably a lot more work, would 
be to see if IRA+LRA is smart enough (or can be made so with a 
reasonable amount of work) to eliminate the virtual register file 
completely.


Just to be clear, I'm not planning to work on this; my participation and 
interest in the RL78 was limited to providing a few tips to DJ.


Jeff


Re: [RL78] Questions about code-generation

2014-03-21 Thread DJ Delorie

> Is it possible that the virtual pass causes inefficiencies in some
> cases by sticking with r8-r31 when one of the 'normal' registers
> would be better?

That's not a fair question to ask, since the virtual pass can *only*
use r8-r31.  The first bank has to be left alone else the
devirtualizer becomes a few orders of magnitude harder, if not
impossible, to make work correctly.

> In some cases, the normal optimization steps remove a lot, if not all, 
> of the unnecessary register passing, but not always.

I've found that "removing uneeded moves through registers" is
something gcc does poorly in the post-reload optimizers.  I've written
my own on some occasions (for rl78 too).  Perhaps this is a good
starting point to look at?

> much needless copying, which strengthens my suspicion that it's 
> something in the RL78 backend that needs 'tweaking'.

Of course it is, I've said that before I think.  The RL78 uses a
virtual model until reload, then converts each virtual instructions
into multiple real instructions, then optimizes the result.  This is
going to be worse than if the real model had been used throughout
(like arm or x86), but in this case, the real model *can't* be used
throughout, because gcc can't understand it well enough to get through
regalloc and reload.  The RL78 is just to "weird" to be modelled
as-is.

I keep hoping that gcc's own post-reload optimizers would do a better
job, though.  Combine should be able to combine, for example, the "mov
r8,ax; cmp r8,#4" types of insns together.


RE: [RFC, MIPS] Relax NaN rules

2014-03-21 Thread Rich Fuhler
> From: Maciej W. Rozycki [ma...@codesourcery.com]
> Sent: Friday, March 21, 2014 16:21
> To: Joseph S. Myers
> Cc: Rich Fuhler; Matthew Fortune; Richard Sandiford; dal...@aerifal.cx; 
> Andrew Pinski (pins...@gmail.com); gcc@gcc.gnu.org; Moore, Catherine 
> (catherine_mo...@mentor.com)
> Subject: RE: [RFC, MIPS] Relax NaN rules
>
>
> Coprocessor loads (LWC1/LDC1/MTC1/MTHC1/DMTC1) and stores
> (SWC1/SDC1/MFC1/MFHC1/DMFC1) are not arithmetic and never trap on any bit
> patterns.  I reckon GCC already takes advantage of this and stores
> integers temporarily in FPRs in some cases.
>
>  Maciej

Thanks Maciej, I blame it on the 387 - corrupted me for life...

RE: [RFC, MIPS] Relax NaN rules

2014-03-21 Thread Maciej W. Rozycki
On Fri, 21 Mar 2014, Joseph S. Myers wrote:

> > I ask this for another reason as well: since we're adding IFUNC 
> > capability to MIPS, we may need to harden the dynamic loader to protect 
> > $f12 and $f14. If signaling NaN was raised on the load, then we have 
> > more problems to deal with...
> 
> I haven't looked at the details of what MIPS hardware does with signaling 
> NaN loads - but in general uses of signaling NaNs work better if loads 
> don't trigger the NaN, only arithmetic, conversions and other operations 
> that IEEE 754 specifies should trigger it, so loads and stores always 
> preserve the original bit-patterns.  (Cf. the bugs on x86 where a union 
> gets copied via its double member, even though some other member is 
> active, and so gets corrupted because the signaling NaN gets converted to 
> quiet along the way.)

 Coprocessor loads (LWC1/LDC1/MTC1/MTHC1/DMTC1) and stores 
(SWC1/SDC1/MFC1/MFHC1/DMFC1) are not arithmetic and never trap on any bit 
patterns.  I reckon GCC already takes advantage of this and stores 
integers temporarily in FPRs in some cases.

  Maciej


RE: [RFC, MIPS] Relax NaN rules

2014-03-21 Thread Joseph S. Myers
On Fri, 21 Mar 2014, Rich Fuhler wrote:

> Hi Joseph, as I remember from conversations last year, there is also an 
> issue if the programmer specifically enables the FPU exceptions. If the 
> FPU, kernel emulator, or bare-metal emulator (CS3's for example) did 
> raise a signaling NaN, then the intermixing couldn't be done. As I 
> remembering this correctly?

I'm not clear on what the question is.  But if, say, a program uses only 
quiet NaNs, and cares about the "invalid" exception, but gets used on 
wrongly configured hardware or with libraries with an inconsistent notion 
of what a quiet NaN is, then it may get spurious "invalid" exceptions from 
some of the other pieces thinking they've received a signaling NaN.

> I ask this for another reason as well: since we're adding IFUNC 
> capability to MIPS, we may need to harden the dynamic loader to protect 
> $f12 and $f14. If signaling NaN was raised on the load, then we have 
> more problems to deal with...

I haven't looked at the details of what MIPS hardware does with signaling 
NaN loads - but in general uses of signaling NaNs work better if loads 
don't trigger the NaN, only arithmetic, conversions and other operations 
that IEEE 754 specifies should trigger it, so loads and stores always 
preserve the original bit-patterns.  (Cf. the bugs on x86 where a union 
gets copied via its double member, even though some other member is 
active, and so gets corrupted because the signaling NaN gets converted to 
quiet along the way.)

-- 
Joseph S. Myers
jos...@codesourcery.com


RE: [RFC, MIPS] Relax NaN rules

2014-03-21 Thread Rich Fuhler
> From: Matthew Fortune
> Sent: Tuesday, March 18, 2014 08:06
> To: Joseph Myers
> Cc: Richard Sandiford; ma...@codesourcery.com; dal...@aerifal.cx; Andrew 
> Pinski (pins...@gmail.com); gcc@gcc.gnu.org; Rich Fuhler; Moore, Catherine 
> (catherine_mo...@mentor.com)
> Subject: RE: [RFC, MIPS] Relax NaN rules
>
> Joseph Myers  writes:
> > > 1) There is no way to mark a module as "don't care/not relevant". At a
> > > minimum this could be done via inspection of the GNU FP ABI attribute
> > > and when its value is 'Any' then NaNs don't matter. Better still would
> > > be that modules with floating point only require a certain NaN state
> > > if they use functions like __builtin_[s}nan. This would partially
> > > reduce the impact of the strict NaN checks.
> >
> > In general you can't tell whether a module cares.  It could have an 
> > initializer
> > 0.0 / 0.0, without having any function calls involving floating point (so in
> > principle being independent of hard/soft float, but not of NaN format).  Or 
> > it
> > could be written with knowledge of the ABI to do things directly with bit
> > patterns (possibly based on a configure test rather than __mips_nan2008).
> > The concept of a don't-care module is meaningful, but while heuristics can
> > reliably tell that a module does care (e.g. GCC generated an encoding of a
> > NaN bit-pattern, whether from __builtin_nan or 0.0 / 0.0) they can't so
> > reliably tell that it doesn't care (although if it doesn't contain NaN bit-
> > patterns, or manipulate representations of floating-point values through
> > taking their addresses or using unions, you can probably be sure enough to
> > mark it as don't-care - note that many cases where there are calls with
> > floating-point arguments and results, but no manipulation of bit-patterns 
> > and
> > no NaN constants, would be don't-care by this logic).
>
> Thanks Joseph. I guess I'm not really pushing to have don't-care supported as 
> it would take a lot of effort to determine when code does and does not care, 
> you rightly point > out more cases to deal with too. I'm not sure if the 
> benefit would then be worth it or not as there would still be modules which 
> do and do not care about old and new NaNs > so it doesn't really relieve any 
> pressure on toolchains or linux distributions. The second part of the 
> proposal is more interesting/useful as it is saying I don't care about the 
> impact of getting NaN encoding wrong and a tools vendor/linux distribution 
> then gets to make that choice. Any comments on that aspect?
>
> Regards,
> Matthew

Hi Joseph, as I remember from conversations last year, there is also an issue 
if the
programmer specifically enables the FPU exceptions. If the FPU, kernel emulator,
or bare-metal emulator (CS3's for example) did raise a signaling NaN, then the
intermixing couldn't be done. As I remembering this correctly?

I ask this for another reason as well: since we're adding IFUNC capability to 
MIPS,
we may need to harden the dynamic loader to protect $f12 and $f14. If signaling
NaN was raised on the load, then we have more problems to deal with...

-rich 

Re: [RL78] Questions about code-generation

2014-03-21 Thread Richard Hulme

On 11/03/14 01:40, DJ Delorie wrote:

I'm curious.  Have you tried out other approaches before you decided
to go with the virtual registers?


Yes.  Getting GCC to understand the "unusual" addressing modes the
RL78 uses was too much for the register allocator to handle.  Even
when the addressing modes are limited to "usual" ones, GCC doesn't
have a good way to do regalloc and reload when there are limits on
what registers you can use in an address expression, and it's worse
when there are dependencies between operands, or limited numbers of
address registers.


Is it possible that the virtual pass causes inefficiencies in some cases 
by sticking with r8-r31 when one of the 'normal' registers would be better?


For example, I'm having a devil of a time convincing the compiler that 
an immediate value can be stored directly in any of the normal 16-bit 
registers (e.g. 'movw hl, #123').  I'm beginning to wonder whether it's 
the unoptimized code being fed in that's causing problems.


Taking a slight variation on my original test code (removing the 
'volatile' keyword and accessing an 8-bit memory location):




#define SOE0L (*(unsigned char *)0xF012A)

void orTest()
{
   SOE0L |= 3;
}



produces (with -O0)

  28_test:
  29  C9 F0 2A 01   movwr8, #298
  30 0004 C9 F2 2A 01   movwr10, #298
  31 0008 AD F2 movwax, r10
  32 000a BD F4 movwr12, ax
  33 000c FA F4 movwhl, r12
  34 000e 8Bmov a, [hl]
  35 000f 9D F2 mov r10, a
  36 0011 6A F2 03  or  r10, #3
  37 0014 AD F0 movwax, r8
  38 0016 BD F4 movwr12, ax
  39 0018 DA F4 movwbc, r12
  40 001a 8D F2 mov a, r10
  41 001c 48 00 00  mov [bc], a
  42 001f D7ret

In some cases, the normal optimization steps remove a lot, if not all, 
of the unnecessary register passing, but not always.


The conditions on the movhi_real insn allow an immediate value to be 
stored in (for example) HL directly, and yet I cannot find a single 
instance in my project where it isn't in the form of


movwr8, #298
movwax, r10
movwhl, ax

and no manner of re-arranging the conditions (that I've found) will 
cause the correct code to be generated.  It's determined to put the 
immediate value into rX, and then copy that into ax (which is also 
unnecessary).


I see the same problem with 'cmp' when the value to be compared is in 
the A register:


mov r8, a
cmp r8, #3

The A register is the one register that can be almost guaranteed to be 
usable with any instruction, and copying it to R8 (or wherever) to 
perform the comparison not only wastes two bytes for the move but also 
makes the cmp instruction a byte longer, so five bytes are used instead 
of two.


I looked at the code produced for IA64 and ARM targets, and although I'm 
not as familiar with those instruction sets, they didn't appear to do as 
much needless copying, which strengthens my suspicion that it's 
something in the RL78 backend that needs 'tweaking'.


The suggestions made regarding 'volatile' were very helpful and I've 
made some good savings elsewhere by adding support for different 
addressing modes and more efficient instructions but there are still a 
number of (theoretically) easy pickings that should (I feel) be possible 
before more complicated optimizations need to be looked at.


As ever, any suggestions are very gratefully received.  I hope to be 
able to post some patches once I'm comfortable that I haven't missed 
anything obvious or done something stupid.


Regards,

Richard.



GIMPLE tree dumping of, for example, GIMPLE_OMP_PARALLEL's CHILD_FN

2014-03-21 Thread Thomas Schwinge
Hi!

Certain GIMPLE codes, such as OpenMP ones, have a structured block
attached to them, for exmaple, gcc/gimple.def:GIMPLE_OMP_PARALLEL:

/* GIMPLE_OMP_PARALLEL  represents

   #pragma omp parallel [CLAUSES]
   BODY

   BODY is a the sequence of statements to be executed by all threads.
[...]
   CHILD_FN is set when outlining the body of the parallel region.
   All the statements in BODY are moved into this newly created
   function when converting OMP constructs into low-GIMPLE.
[...]
DEFGSCODE(GIMPLE_OMP_PARALLEL, "gimple_omp_parallel", 
GSS_OMP_PARALLEL_LAYOUT)

Using -ftree-dump-all, I can see this structured block (BODY) getting
dumped, but it then "disappears" in the ompexp pass', and "reappears" (as
function main._omp_fn.0) in the next ssa pass' dump.

If I'm correctly understanding the GCC sources as well as operating GDB,
in the gimple pass we get main._omp_fn.0 dumped because
gcc/cgraphunit.c:analyze_functions iterates over all functions
(analyze_function -> dump_function).  In the following passes,
presumably, this is not done anymore: omplower, lower, eh, cfg.  In
ompexp, the GIMPLE_OMP_PARALLEL is expanded into a
»__builtin_GOMP_parallel (main._omp_fn.0)« call, but the main._omp_fn.0
is not dumped (and there is no BODY anymore to dump).  In the next ssa
pass, main._omp_fn.0 again is being dumped, by means of
gcc/passes.c:do_per_function_toporder (execute_pass_list ->
execute_one_pass -> execute_function_dump -> dump_function_to_file), as I
understand it.  What do I need to modify to get main._omp_fn.0 included
in the dumps before the ssa pass, too?

Example:

int
main(void)
{
#pragma omp parallel
  {
extern void foo(void);
foo ();
  }
  return 0;
}

p2.c.003t.original:

;; Function main (null)
{
  #pragma omp parallel
{
  {
{
  extern void foo (void);
  foo ();
}
  }
}
  return 0;
}

p2.c.004t.gimple:

main ()
{
  int D.1749;
  #pragma omp parallel
{
  {
extern void foo (void);
foo ();
  }
}
  D.1749 = 0;
  return D.1749;
}

main._omp_fn.0 (void * .omp_data_i)
{
  :
  :
  foo ();
  return;

}

p2.c.006t.omplower:

;; Function main (main, funcdef_no=0, decl_uid=1743, symbol_order=0)
main ()
{
  int D.1749;
  {
#pragma omp parallel [child fn: main._omp_fn.0 (???)]
  {
extern void foo (void);

foo ();
  }
  #pragma omp return
  }
  D.1749 = 0;
  return D.1749;
}

p2.c.007t.lower, p2.c.010t.eh:

;; Function main (main, funcdef_no=0, decl_uid=1743, symbol_order=0)
main ()
{
  int D.1749;
  #pragma omp parallel [child fn: main._omp_fn.0 (???)]
  foo ();
  #pragma omp return
  D.1749 = 0;
  goto ;
  :
  return D.1749;
}

p2.c.011t.cfg:

;; Function main (main, funcdef_no=0, decl_uid=1743, symbol_order=0)
[...]
main ()
{
  int D.1749;
  :
  #pragma omp parallel [child fn: main._omp_fn.0 (???)]
  :
  foo ();
  #pragma omp return
  :
  D.1749 = 0;
  return D.1749;

}

p2.c.012t.ompexp:

;; Function main (main, funcdef_no=0, decl_uid=1743, symbol_order=0)
OMP region tree
bb 2: gimple_omp_parallel
bb 3: GIMPLE_OMP_RETURN
Introduced new external node (foo/2).
Merging blocks 2 and 6
Merging blocks 2 and 4
main ()
{
  int D.1749;
  :
  __builtin_GOMP_parallel (main._omp_fn.0, 0B, 0, 0);
  D.1749 = 0;
  return D.1749;
}

p2.c.015t.ssa:

;; Function main._omp_fn.0 (main._omp_fn.0, funcdef_no=1, decl_uid=1751, 
symbol_order=1)
main._omp_fn.0 (void * .omp_data_i)
{
  :
  :
  foo ();
  return;
}

;; Function main (main, funcdef_no=0, decl_uid=1743, symbol_order=0)
main ()
{
  int _3;
  :
  __builtin_GOMP_parallel (main._omp_fn.0, 0B, 0, 0);
  _3 = 0;
  return _3;
}


Grüße,
 Thomas


pgpv_AdE6Cp9f.pgp
Description: PGP signature


Re: Integration of ISL code generator into Graphite

2014-03-21 Thread Tobias Grosser

On 03/21/2014 12:04 PM, Roman Gareev wrote:

Hi Tobias,

thank you for all your comments! I've tried to consider them in the
improved version of my proposal, which can be found at the following
link 
https://drive.google.com/file/d/0B2Wloo-931AoeUlYOHhETVBvY3M/edit?usp=sharing
.


- In unreleased isl 0.13.0, support for compute out feature


I haven't found information about this feature and isl 0.13.0. Could
you please give me a link to be referred to in the proposal?


Section 1.4.1 of the isl manual documents the following functions:

void isl_ctx_set_max_operations(isl_ctx *ctx,
unsigned long max_operations);
unsigned long isl_ctx_get_max_operations(isl_ctx *ctx);
void isl_ctx_reset_operations(isl_ctx *ctx);


- Improved code generation quality


I also haven't found code quality comparison between CLooG and ISL
code generator. Do you mean, that ISL code generator can improve code
quality with unrolling, full/partial tile separation, fine-grained
code size adjustments?


We have an unpublished paper on this. We should probably make this a 
tech report at some point.



- "New internal representaion will be generated by ISL. Its structure is
planned to be similar to the CLAST tree, but can be changed ..."

What does this  mean? The isl_ast representation is already defined. Are you
saying that isl may generate an ast that is different in structure to the
clast tree currently generated? Or are you saying we
still need to define the isl_ast and its nodes itself?


I wanted to say that ISL will generate ISL AST from the polyhedral
representation. This ISL AST (with pointers to original basic blocks
instead of statments) will be internal representation for Graphite,
that should be traversed and transformed into the GIMPLE CFG. I
eliminated the mention of this internal representation in the improved
version of the proposal.


Good.

I think this proposal is already very nice. I have some last comments in 
case you want to really polish it:


o 26-31 May: Get familiar with CLooG generation?

Why is this necessary?


Also, for the remaining time-line, I think you could start working on 
making this more detailed.


Instead of having separate testing and fixing bug weeks, I think it 
would be optimal to have for each weak one topic that you plan to finish 
(including testing and fixing), as well as a brief idea how to do it. 
You already have a good idea of what is needed, but you could detail 
this further by looking through the exiting source code (or by looking 
into Polly's IslCodeGeneration.cpp) to see what is needed.


In which week are you e.g. planning to write the code to generate 
isl_ast_expr?


Which pieces of the code are you planning to reuse?

Are you planning to split of pieces of the code that can be shared by 
CLooG and isl, before fully removing CLooG?


Cheers,
Tobias






Re: Integration of ISL code generator into Graphite

2014-03-21 Thread Roman Gareev
Hi Tobias,

thank you for all your comments! I've tried to consider them in the
improved version of my proposal, which can be found at the following
link 
https://drive.google.com/file/d/0B2Wloo-931AoeUlYOHhETVBvY3M/edit?usp=sharing
.

> - In unreleased isl 0.13.0, support for compute out feature

I haven't found information about this feature and isl 0.13.0. Could
you please give me a link to be referred to in the proposal?

> - Improved code generation quality

I also haven't found code quality comparison between CLooG and ISL
code generator. Do you mean, that ISL code generator can improve code
quality with unrolling, full/partial tile separation, fine-grained
code size adjustments?

> - "New internal representaion will be generated by ISL. Its structure is
> planned to be similar to the CLAST tree, but can be changed ..."
>
> What does this  mean? The isl_ast representation is already defined. Are you
> saying that isl may generate an ast that is different in structure to the
> clast tree currently generated? Or are you saying we
> still need to define the isl_ast and its nodes itself?

I wanted to say that ISL will generate ISL AST from the polyhedral
representation. This ISL AST (with pointers to original basic blocks
instead of statments) will be internal representation for Graphite,
that should be traversed and transformed into the GIMPLE CFG. I
eliminated the mention of this internal representation in the improved
version of the proposal.

-- 
   Cheers, Roman Gareev


Re: Request for discussion: Rewrite of inline assembler docs

2014-03-21 Thread James Greenhalgh
On Thu, Feb 27, 2014 at 11:07:21AM +, Andrew Haley wrote:
> Over the years there has been a great deal of traffic on these lists
> caused by misunderstandings of GCC's inline assembler.  That's partly
> because it's inherently tricky, but the existing documentation needs
> to be improved.
> 
> dw  has done a fairly thorough reworking of
> the documentation.  I've helped a bit.
> 
> Section 6.41 of the GCC manual has been rewritten.  It has become:
> 
> 6.41 How to Use Inline Assembly Language in C Code
> 6.41.1 Basic Asm - Assembler Instructions with No Operands
> 6.41.2 Extended Asm - Assembler Instructions with C Expression Operands
> 
> We could simply post the patch to GCC-patches and have at it, but I
> think it's better to discuss the document here first.  You can read it
> at

This documentation looks like a huge improvement.

As the discussion here seems to have stalled, perhaps it is time to propose
the patch to gcc-patches?

I'm certainly keen to see this make it to trunk, the increase in clarity
is substantial.

Thanks,
James



Re: [gsoc 2014] moving fold-const patterns to gimple

2014-03-21 Thread Richard Biener
On Thu, Mar 20, 2014 at 9:52 PM, Prathamesh Kulkarni
 wrote:
> On Wed, Mar 19, 2014 at 3:13 PM, Richard Biener
>  wrote:
>> On Tue, Mar 18, 2014 at 9:04 AM, Prathamesh Kulkarni
>>  wrote:
>>> On Mon, Mar 17, 2014 at 2:22 PM, Richard Biener
>>>  wrote:
 On Sun, Mar 16, 2014 at 1:21 PM, Prathamesh Kulkarni
  wrote:
> In c_expr::c_expr, shouldn't OP_C_EXPR be passed to operand
> constructor instead of OP_EXPR ?

 Indeed - I have committed the fix.

>>> Hi, I have attached an initial draft of my proposal.
>>> I would be grateful to receive your feedback.
>>
>> Ok, I had a look at the proposal and it is mostly fine.  I'd be more specific
>> on the deliverables, specifically
>>
>> 1) genmatch - program to read meta description and generate code to
>> simplify GENERIC and GIMPLE according to the pattern descriptions
>> using an algorithm not linear in the number of patterns
>>
>> 2) add patterns matched by tree-ssa-forwprop.c (and thus patterns
>> in fold-const.c it depends on) to the meta-description and perform
>> the simplifications of tree-ssa-forwprop.c in terms of the simplification API
>>
>> You will figure out that there are possibly a lot of patterns in fold-const.c
>> that forwprop depends on (I know mainly of all the comparison 
>> simplifications).
>>
>> For the Timeline I'd move e) as a sub-task of f) to June 28 - July 16,
>> eventually just dividing the weeks of July 17 - August 11 to that and
>> the following task.
>>
>> That is, the overall deliverable should be a tree-ssa-forwprop.c that is
>> (mostly) implemented in terms of patterns, ready for commit to trunk
>> during stage1.
>>
>> As for targeting GENERIC, useful testing coverage of that path will
>> come from for example re-writing fold-const.c:fold_comparison using
>> the GENERIC API of the pattern simplifier.
>>
>> The devil will be in the details (as always) ;)
>>
>> As Maxim said - make sure to register your proposal in-time, you
>> can always improve on it later.
>>
> Thanks for the feedback. I have uploaded it:
> http://www.google-melange.com/gsoc/proposal/public/google/gsoc2014/prathamesh3492/5629499534213120
> Would you like to suggest any further changes ?
> There are a few formatting glitches, I am fixing those.
>
> Could you help me point out how to write test-cases for transforms ?
> For example:
> /* Fold ~A + 1 -> -A */
> (match_and_simplify
>   (PLUS_EXPR (BIT_NOT_EXPR @0) @1)
>   if (@1 == integer_one_node)
>   (NEGATE_EXPR @0))
>
> Is the following test-case correctly written ?
> /* { dg-do compile } */
> /* { dg-options "-O -fdump-tree-forwprop" }  */
>
> int foo (int x)
> {
>   int temp1 = ~x;
>   int temp2 = temp1 + 1;
>   return temp2;
> }
>
> /* { dg-final { scan-tree-dump "temp* = -x*(D)" "forwprop1" } } */
> Shall that be (somewhat) correct ?

Yes, though the pattern to scan for may be somewhat fragile (generally
avoid using '*' there but use \[^\n\r\]* to avoid getting false matches across
newlines).

> (Unfortunately, I cannot check if I have written the test-case correctly,
> because I am in the middle of a bootstrap build.
> Is there a way to run test-cases on only stage-1 compiler ?
> I tried make check-cc1 RUNTESTFLAGS=dg.exp=tree-ssa/match-2.c but that
> did not work).

I usually do development on a separate build directory that I configure
like

CFLAGS=-g CXXFLAGS=-g /src/configure --disable-bootstrap
make CFLAGS=-g CXXFLAGS=-g

then I can test incremental changes by in gcc/ doing

> make cc1
> make check-gcc RUNTESTFLAGS="tree-ssa.exp=match-2.c"

Richard.

> forwprop output is:
> ;; Function foo (foo, funcdef_no=0, decl_uid=1743, symbol_order=0)
>
> gimple_match_and_simplified to temp2_3 = -x_1(D);
> foo (int x)
> {
>   int temp2;
>   int temp1;
>
>   :
>   temp1_2 = ~x_1(D);
>   temp2_3 = -x_1(D);
>   temp2_4 = temp2_3;
>   return temp2_4;
>
> }
>
> Thanks and Regards,
> Prathamesh
>
>> Thanks,
>> Richard.
>>
>>> Thanks and Regards,
>>> Prathamesh
 Thanks,
 Richard.

> This caused segfault for patterns when "simplification" operand was
> only c_expr (patch attached).
>
> * genmatch.c (c_expr::c_expr): use OP_C_EXPR instead of OP_EXPR in
> call to operand constructor.