Success with MinGW and AVR and LTO - almost

2010-01-10 Thread Andrew Hutchinson
I have just succeed in building last snapshot version 4.5.0 20100107 for 
AVR target with working LTO  on both LINUX and MinGW hosts!


As noted before #define LINKER_NAME has to be deleted from target avr.h 
(I will raise patch for this)


I also built avr target for MINGW under MSYS and this has no obvious 
issues for either build or use with normal (non-lto) compilation

However, LTO use failed completely.

LTO using libelf needs to handle files as BINARY with Windows. This 
would seem to  apply to any target on MinGW


I hacked fopen/open calls in  lto.c and lto-elf.c to use O_BINARY and 
rb and compilation with -flto was then successful!


I am not sure how this should be fixed properly.

Andy







Re: Success with MinGW and AVR and LTO - almost

2010-01-10 Thread Andrew Hutchinson

I think  rb is nop. However, O_BINARY is less portable.

There is another way. If  MinGW hosted build  is linked with binmode.o - 
the default for files become binary

Some other methods are here:

http://oldwiki.mingw.org/index.php/binary




Rafael Espindola wrote:

I hacked fopen/open calls in  lto.c and lto-elf.c to use O_BINARY and rb
and compilation with -flto was then successful!

I am not sure how this should be fixed properly.



Using O_BINARY and rb should be a nop on unix, no? Is it wrong to
use them on any arch we care about?

  

Andy



Cheers,
  


Re: Success with MinGW and AVR and LTO - almost

2010-01-10 Thread Andrew Hutchinson



Kai Tietz wrote:


Well, on linux (libc) fopen/freopen/etc the b is an nop (but
handled). For O_BINARY the common approach here is to do the following
condifition before use:

#ifndef O_BINARY
#define O_BINARY 0
#endif

This is a pattern pretty often used. To rely here on binmode.o is a
way, too, but it is the most ugly one, too. It affects any file open,
which isn't necessarily wanted.

Cheers,
Kai

  
Is LTO really the only place gcc needs  binary access to files for build 
of cross compiler?


Andy




Re: Success with MinGW and AVR and LTO - almost

2010-01-10 Thread Andrew Hutchinson



Kai Tietz wrote:

Well, open call there aren't that much but point of interest is in
'c-pch.c:  fd = open (name, O_RDONLY | O_BINARY, 0666);' as it uses
O_BINARY, too. See also for pattern in libiberty mkstemps.c

Regards,
Kai


  


It looks like O_BINARY is already defined in system.h, so all it needs 
is the patches to open().


I backed off my shot gun fix and there are just two places that appear 
to be problem:


lto_elf_file_open () in lto-elf.c
lto_read_section_data() in lto.c

With O_BINARY on read/write remove failures from my simple test.

Andy



Re: AVR gives weird error with LTO

2009-12-30 Thread Andrew Hutchinson

I used v and progressed a little

The problem seems to be that linker is called with -fwhopr or -flto as 
command line option.


ld -fwhopr .

Linker find  '-f' and complains.

I assume this is not a valid option for ld?  Or is my linker wrong 
version or something?


Note this is cross compile toolchain.

Andy



Dave Korn wrote:

Andrew Hutchinson wrote:
  

When AVR target is built, without explicitly disabling LTO, it will
produce 1000's of testsuite failures of -LTO -WHOPR tests with this
compilation error:

ld: -f may not be used without -shared

Any idea what is wrong or how to make LTO work correctly here?



  The standard way to be proceed would be: add -v to the command-line in the
PR; find out what is actually getting passed to ld; figure out what kind of
specs-processing accident (most likely) is causing ld to receive a -f option.

cheers,
  DaveK

  


Re: AVR gives weird error with LTO

2009-12-30 Thread Andrew Hutchinson


Dave Korn wrote:

Rafael Espindola wrote:
  

 It's not a valid option for ld.  It *is* a valid option for the collect2
driver/wrapper executable that gcc uses to invoke ld, which suggests to me
that the AVR port must be configured not to build collect2, but that it is
going to need to do so if it wants to use LTO/WHOPR.  See use_collect2 in
gcc/config.gcc
  

Or you could port gold to AVR and use the plugin :-)



  I hadn't checked, but yeah, since AVR is an ELF platform, that's a nice
solution too.  There might still be reason to build a collect2, for interop
with older binutils.

cheers,
  DaveK


  

Thank you David and Rafel

I will dig further into collect2.  I had noted that avr.h has the following:


/* This is undefined macro for collect2 disabling */
#define LINKER_NAME ld


Also, the MINGW host is the most significant for the AVR target - and 
problems with collect2 may be related to maintaining compatibility to that.


Andy



Re: AVR gives weird error with LTO

2009-12-30 Thread Andrew Hutchinson



Thank you David and Rafel

I will dig further into collect2.  I had noted that avr.h has the following:


/* This is undefined macro for collect2 disabling */
#define LINKER_NAME ld



That's indeed going to break LTO.

Richard.

  

That seems to be the key issue.
Without #define LINKER_NAME, AVR is running LTO/WHOPR tests ok ! (No 
idea if it does anything useful though)


Now to figure out why it was added in 2000 (rth). Hopefully Eric 
Weddington or Denis might have some idea and perhaps know if it still 
has a purpose.


Andy






How should I prototype cpp_define in target patch?

2009-12-23 Thread Andrew Hutchinson

I want to post patch for

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42457

The code moved out to -c.c file ALREADY uses:

builtin_define_std
cpp_define

Both from c-cppbuiltin.c. These have no prototypes defined in gcc.

So of course there are warnings emitted.

Is this OK?
Should I locally define prototypes?
Something else?

Andy



Re: How should I prototype cpp_define in target patch?

2009-12-23 Thread Andrew Hutchinson

Doh!


Joseph S. Myers wrote:

On Wed, 23 Dec 2009, Andrew Hutchinson wrote:

  

builtin_define_std
cpp_define

Both from c-cppbuiltin.c. These have no prototypes defined in gcc.



They do have prototypes, in c-common.h and cpplib.h.

  


Re: Which optimizer should remove redundant subreg of sign_extension?

2009-12-23 Thread Andrew Hutchinson


Paolo Bonzini wrote:


I think that if you add the simplification to simplify-rtx.c's 
simplify_subreg, combine should pick it up automagically.


Paolo



There we have it! There is apparently already  this optimization 
performed - so I will have to dig further into why it does not a happen.


simplify_subreg()
snip
/* If we're requesting the lowpart of a zero or sign extension,
there are three possibilities.  If the outermode is the same
as the origmode, we can omit both the extension and the subreg.
If the outermode is not larger than the origmode, we can apply
the truncation without the extension.  Finally, if the outermode
is larger than the origmode, but both are integer modes, we
can just extend to the appropriate mode.  */




Approval as AVR maintainer

2009-12-23 Thread Andrew Hutchinson

How does one get to be maintainer of port?

Specifically AVR port - so that I do not need to get approval to commit 
changes. The time it takes now is rather longer than getting approval on 
other parts of GCC.


The process does not seem to be written down anywhere - but I am sure 
someone will correct me if I am wrong.


Seasonal Greetings

Andy



Which optimizer should remove redundant subreg of sign_extension?

2009-12-22 Thread Andrew Hutchinson

I came across this RTL on AVR in  combine dump (part of va-arg-9.c test)

(set (reg:QI 25 r25 [+1 ])
   (subreg:QI (sign_extend:HI (reg:QI 49)) 1))

The sign extension is completely redundant - the upper part of register 
is not used elsewhere
-  but the RTL remains unchanged through all the optimizers and 
sign_extension appears in final code.


Which RTL optimisation should be taking care of this? Propagation?
It would help me look in the right place to understand and perhaps fix 
issue.


I suspect the presence of hard register is why it does not get removed. 
(the hard register is the function return value)



Andy




Need help to correct vla-dealloc testcase

2009-12-06 Thread Andrew Hutchinson
I need advise before I submit pathc  to fix the test 
gcc-torture/execute/vla-dealloc-1.c (attached below)


The test appears to be unsafe. The original fault was failure to 
deallocate VLA on the jump - thus with the fault present the test would 
appear to perform 1 million new allocation - and fail presumably due to 
either execution time or run time error - neither of which seem certian.


I have to modify the test since it presumes 32bit or larger integers - 
and thus on 16bit targets overflowing into  -ve  allocations make it 
somewhat undefined behavior. It take rather a long time to execute - way 
more than other execution tests and trips timeout limit for AVR 
simulator tests.


a) I could disable test for target without 32bit integers
b) I could change n to be 32 bit on 16 bit targets (the test will then 
be equally uncertain on 16 bit targets at detecting fault.)
c) I could reduce n to 10,000 - but that likely will create more false 
positives


a) is my easy way out - but perhaps I should  address the apparent 
weakness where the test could pass with the original problem present?


Suggestions?



/* VLAs should be deallocated on a jump to before their definition,
  including a jump to a label in an inner scope.  PR 19771.  */

void *volatile p;

int
main (void)
{
 int n = 0;
 if (0)
   {
   lab:;
   }
 int x[n % 1000 + 1];
 x[0] = 1;
 x[n % 1000] = 2;
 p = x;
 n++;
 if (n  100)
   goto lab;
 return 0;
}



Re: Need help to correct vla-dealloc testcase

2009-12-06 Thread Andrew Hutchinson

Thanks

I am submitting patch to drop count to 10,000 for 16 bit int target.
Using 32 bit counter of 1 million takes a minute or so on simulator - 
which is high.
The lower count  is quick and only requires a (16bit)  stack limit to be 
lower than 10MB - which is pretty safe.


Andy


Joseph S. Myers wrote:

On Sun, 6 Dec 2009, Andrew Hutchinson wrote:

  

The test appears to be unsafe. The original fault was failure to deallocate
VLA on the jump - thus with the fault present the test would appear to perform
1 million new allocation - and fail presumably due to either execution time or
run time error - neither of which seem certian.



It's expected to run into RLIMIT_STACK or equivalent, with an expectation 
that stack limits are generally below 500MB.  (A million executions of a 
few instructions should be pretty fast in general.)


  

b) I could change n to be 32 bit on 16 bit targets (the test will then be
equally uncertain on 16 bit targets at detecting fault.)



This seems to be the natural approach.

  


How does builtin_sqrt get used - or not

2009-11-21 Thread Andrew Hutchinson
I am tracking test failure with avr target where function sqrtf is 
undefined reference at link time.


Here is command line:

/media/verbatim/gcchead/obj-dir/gcc/xgcc 
-B/media/verbatim/gcchead/obj-dir/gcc/ 
/media/verbatim/gcchead/trunk/gcc/testsuite/gcc.dg/pr41963.c   -O2 
-ffast-math -DSTACK_SIZE=2048 -DNO_TRAMPOLINES  -DSIGNAL_SUPPRESS 
-mmcu=atmega128  /home/andy/winavrfiles/avrtest/dejagnuboards/exit.c 
-Wl,-u,vfprintf -lprintf_flt 
-Wl,-Tbss=0x802000,--defsym=__heap_end=0x80  -lm   -o pr41963.exe   

I am lead to believe that gcc might use builtin_sqrtf rather than 
sqrtf().   I am successfully using fabsf() - with no link errors.


Is their any target configuration needed for builtin_sqrtf that I should 
know about ?



Andy





Bug in binop rotate ?

2009-10-17 Thread Andrew Hutchinson
I have been adding rotate capability to AVR port and have come across 
what I think is bug in

optabs.c: expand_binop()

This occurs during a rotate expansion. For example

target  = op0  rotated by op1

In the particular situation (code extract below) it tries a reverse 
rotate of (bits - op1). Where this expression is expanded as a simple 
integer,

a negation or subtraction depending on type of op1 and target.

The expansion of the subtraction is using the mode of the target - I 
believe it should be using the mode of op1.

The mode of the rotation  amount need not be the same as the target.

target:DI = Op0:DI rotate op1:HI

In my testcase it is not and I get  asserts latter in simplfy_rtx.

The negation mode looks equally wrong.

Am I mistaken?


 /* If we were trying to rotate, and that didn't work, try rotating
the other direction before falling back to shifts and bitwise-or.  */
 if (((binoptab == rotl_optab
optab_handler (rotr_optab, mode)-insn_code != CODE_FOR_nothing)
  || (binoptab == rotr_optab
   optab_handler (rotl_optab, mode)-insn_code != CODE_FOR_nothing))
  mclass == MODE_INT)
   {
 optab otheroptab = (binoptab == rotl_optab ? rotr_optab : rotl_optab);
 rtx newop1;
 unsigned int bits = GET_MODE_BITSIZE (mode);

 if (CONST_INT_P (op1))
   newop1 = GEN_INT (bits - INTVAL (op1));
 else if (targetm.shift_truncation_mask (mode) == bits - 1)
   newop1 = negate_rtx (mode, op1);
 else
   newop1 = expand_binop (mode, sub_optab,
  GEN_INT (bits), op1,
  NULL_RTX, unsignedp, OPTAB_DIRECT);


Re: Bug in binop rotate ?

2009-10-17 Thread Andrew Hutchinson

Thanks for your review.

I have submitted bug report.



Richard Guenther wrote:

On Sat, Oct 17, 2009 at 3:47 PM, Andrew Hutchinson
andrewhutchin...@cox.net wrote:
  

I have been adding rotate capability to AVR port and have come across what I
think is bug in
optabs.c: expand_binop()

This occurs during a rotate expansion. For example

target  = op0  rotated by op1

In the particular situation (code extract below) it tries a reverse rotate
of (bits - op1). Where this expression is expanded as a simple integer,
a negation or subtraction depending on type of op1 and target.

The expansion of the subtraction is using the mode of the target - I believe
it should be using the mode of op1.
The mode of the rotation  amount need not be the same as the target.

target:DI = Op0:DI rotate op1:HI

In my testcase it is not and I get  asserts latter in simplfy_rtx.

The negation mode looks equally wrong.

Am I mistaken?



I think you are correct.

Richard.

  

 /* If we were trying to rotate, and that didn't work, try rotating
   the other direction before falling back to shifts and bitwise-or.  */
 if (((binoptab == rotl_optab
   optab_handler (rotr_optab, mode)-insn_code != CODE_FOR_nothing)
 || (binoptab == rotr_optab
  optab_handler (rotl_optab, mode)-insn_code != CODE_FOR_nothing))
 mclass == MODE_INT)
  {
optab otheroptab = (binoptab == rotl_optab ? rotr_optab : rotl_optab);
rtx newop1;
unsigned int bits = GET_MODE_BITSIZE (mode);

if (CONST_INT_P (op1))
  newop1 = GEN_INT (bits - INTVAL (op1));
else if (targetm.shift_truncation_mask (mode) == bits - 1)
  newop1 = negate_rtx (mode, op1);
else
  newop1 = expand_binop (mode, sub_optab,
 GEN_INT (bits), op1,
 NULL_RTX, unsignedp, OPTAB_DIRECT);




  


Re: Constraint modifier for partially overlaping operands

2009-10-17 Thread Andrew Hutchinson


The situation comes up where no or a partial overlap of registers 
permits optimal code - since this can avoid using scratch register

Thus no overlap OR partial overlap is preferred (or required)

Using nothing leaves  overlap without preference - full, partial,none
Using = gives the least preffered case - full
Using  gives only the no-overlapping case - none

Ideally NOT= is required - which I would hope the register allocator 
would quite like too.




Dave Korn wrote:

Ian Lance Taylor wrote:
  

Andrew Hutchinson andrewhutchinson@ writes:



I can use = modifier to make operands use same register and early
clobber  to avoid overlaps.

Is it possible to have or construct a contraint that permits partial
overlap operands. (which neither = or  would allow)
The case would be  wide types taking multiple hard registers.

eg Input r20..23 Output r22..25
  

There is no such constraint today.  I suppose it would be possible to
define such a constraint if it seemed useful.  



  Maybe I'm misunderstanding, but I thought that was already the default if
you use neither = to specify full overlap nor  for no overlap?

  Frex, a lot of ABIs specify that DImode values stored in pairs of SImode
registers must always use an odd-even register pair (using a test in
HARD_REGNO_MODE_OK), but when I was working on a custom port that allowed them
in any register pair, GCC would happily generate partially overlapping movdi
instructions such as (set:DI (reg:DI 5) (reg:DI 6)) (i.e., move r6/7 - r5/6).
 This hasn't changed since 3.3, has it?

cheers,
  DaveK

  


Re: Constraint modifier for partially overlaping operands

2009-10-17 Thread Andrew Hutchinson

Yes.

But we need to lower after combine and before register allocation.
I'm still figuring out how to do that.

Lowering before combine - in particular causes a lot of code bloat. This 
loose all optimization of conditional jumps, shifts etc.
In our case, most lowering is delayed until after reload. This retains 
the RTL optimization but is suboptimal for allocation and lacks enough 
forward propagation.


For a similar reason, not splitting wide types often produces far better 
code.


One exception is DImode which by default is  lowered at expand -since 
there are no  DImode instructions defined.
This ends up with pretty dire code since the built in expansion cant use 
a carry based pattern and we again miss
the RTL optimization at the wider level. 

It would seem we need to have target dependent pass order to improve on 
this significantly.



Andy








Richard Henderson wrote:

On 10/16/2009 11:04 PM, Ian Lance Taylor wrote:

Andrew Hutchinsonandrewhutchin...@cox.net  writes:


I can use = modifier to make operands use same register and early
clobber  to avoid overlaps.

Is it possible to have or construct a contraint that permits partial
overlap operands. (which neither = or  would allow)
The case would be  wide types taking multiple hard registers.

eg Input r20..23 Output r22..25


There is no such constraint today.  I suppose it would be possible to
define such a constraint if it seemed useful.


I'd much prefer if the port decomposed its double word operations and 
used the lower-subreg pass to decompose the double word registers.  At 
which point the register allocator has all of the information it needs 
to do the right thing.



r~



Re: Cannot get Bit test RTL to cooperate with Combine.

2009-09-21 Thread Andrew Hutchinson

Thank you so much for your information!

I will investigate your patch.
(I just hacked lowpart_for_combine to allow lowering something larger 
than word and the subreg matched no problem.)


It looks like RTL generation is somewhat odd and not helping.
My test used
extern long x;
if (x  1)

If there is only a single reference to x then (x 1), is lowered to HI 
mode and does not included any subregs (nosplit-wide-types). So my 
patterns match.


If my test code included two bit tests -  I get HI mode subregs on the x 
1  (which will not match) but not on x  2  the latter is  in the wider 
SI mode and will match.
If I turn on split-wide-types, the  subregs are not removed by subreg 
lowering since there is now mixed mode usage!


Something seem backwards in expansion, regarding lowering and references.

Andy


Joern Rennecke wrote:

On Sun, Sep 20, 2009 at 01:49:39PM -0400, Andrew Hutchinson wrote:

All,

I have been debugging AVR port to see why we fail to match so many bit
test opportunities.

When dealing with longer modes I have come across a problem I can not 
solve.


Expansion in RTL for a bit test can produce two styles.

STYLE 1  Bit to be tested is NOT LSB (e.g. if ( longthing  0x10)),  the
expanded code contains the test as:

(and:SI (reg:SI 45 [ lx.1 ])
   (const_int 16 [0x10]))

Bit tests are matched by combine. Combine has no problems with this and
eventually creates a matching pattern based on the conversion of the AND
to a zero extraction

(set (pc)
   (if_then_else (ne (zero_extract:SI (subreg:QI (reg:SI 45 [ lx.1 ]) 0)
   (const_int 1 [0x1])
   (const_int 4 [0x4]))
   (const_int 0 [0x0]))
   (label_ref:HI 133)
   (pc)))

This will match Bit test patterns and produces optimal code. :-)


Unfortunately, when combine knows about upper bits that are zero, it
will generate an lshiftrt instead, which can't be legitimately matched
by a bit test.

I have a patch for this which I haven't gotten around yet to test it
separately in trunk and formally submit to the patches list, but you can
extract it from arc-20081210-branch:

2008-12-02  Jorn Rennecke  joern.renne...@arc.com

* combine.c (undo_since): New function, broken out of:
(undo_all).
(combine_simplify_bittest): New function.
(combine_simplify_rtx, simplify_if_then_else): Use it.
* config/arc/arc.c (arc_rtx_costs): Check for bbit test.

svn diff -r144651:144652 
svn://gcc.gnu.org/svn/gcc/branches/arc-20081210-branch/gcc/combine.c




Re: Cannot get Bit test RTL to cooperate with Combine.

2009-09-21 Thread Andrew Hutchinson

Why doesn't combine try matching unsimplified expressions when it fails?

This would at least permit creating patterns based on explicit format 
of  input RTL without the added vagaries of simplification


Andy



Joern Rennecke wrote:

On Sun, Sep 20, 2009 at 01:49:39PM -0400, Andrew Hutchinson wrote:

All,

I have been debugging AVR port to see why we fail to match so many bit
test opportunities.

When dealing with longer modes I have come across a problem I can not 
solve.


Expansion in RTL for a bit test can produce two styles.

STYLE 1  Bit to be tested is NOT LSB (e.g. if ( longthing  0x10)),  the
expanded code contains the test as:

(and:SI (reg:SI 45 [ lx.1 ])
   (const_int 16 [0x10]))

Bit tests are matched by combine. Combine has no problems with this and
eventually creates a matching pattern based on the conversion of the AND
to a zero extraction

(set (pc)
   (if_then_else (ne (zero_extract:SI (subreg:QI (reg:SI 45 [ lx.1 ]) 0)
   (const_int 1 [0x1])
   (const_int 4 [0x4]))
   (const_int 0 [0x0]))
   (label_ref:HI 133)
   (pc)))

This will match Bit test patterns and produces optimal code. :-)


Unfortunately, when combine knows about upper bits that are zero, it
will generate an lshiftrt instead, which can't be legitimately matched
by a bit test.

I have a patch for this which I haven't gotten around yet to test it
separately in trunk and formally submit to the patches list, but you can
extract it from arc-20081210-branch:

2008-12-02  Jorn Rennecke  joern.renne...@arc.com

* combine.c (undo_since): New function, broken out of:
(undo_all).
(combine_simplify_bittest): New function.
(combine_simplify_rtx, simplify_if_then_else): Use it.
* config/arc/arc.c (arc_rtx_costs): Check for bbit test.

svn diff -r144651:144652 
svn://gcc.gnu.org/svn/gcc/branches/arc-20081210-branch/gcc/combine.c




Cannot get Bit test RTL to cooperate with Combine.

2009-09-20 Thread Andrew Hutchinson

All,

I have been debugging AVR port to see why we fail to match so many bit 
test opportunities.


When dealing with longer modes I have come across a problem I can not solve.

Expansion in RTL for a bit test can produce two styles.

STYLE 1  Bit to be tested is NOT LSB (e.g. if ( longthing  0x10)),  the 
expanded code contains the test as:


(and:SI (reg:SI 45 [ lx.1 ])
   (const_int 16 [0x10]))

Bit tests are matched by combine. Combine has no problems with this and 
eventually creates a matching pattern based on the conversion of the AND 
to a zero extraction


(set (pc)
   (if_then_else (ne (zero_extract:SI (subreg:QI (reg:SI 45 [ lx.1 ]) 0)
   (const_int 1 [0x1])
   (const_int 4 [0x4]))
   (const_int 0 [0x0]))
   (label_ref:HI 133)
   (pc)))

This will match Bit test patterns and produces optimal code. :-)

STYLE 2 Bit to be tested is LSB (e.g. if ( longthing  1)), the expanded 
RTL code uses SUBREG to lower width (apparently from SImode to word size).


(and:HI (subreg:HI (reg:SI 45 [ lx.1 ]) 0)
   (const_int 1 [0x1]))

This seems to occur regardless of -f(no)split-wide-types for size  
HImode (which is integer mode). This RTL becomes a problem for combine


Combine  uses  subst(), combine_simplify_rtx() and eventually  
simplify_comparison()  where it attempts to WIDEN the AND and take the 
lowpart.


ge_low_part(HImode,
(and:SI (reg:SI 45 [ lx.1 ])
   (const_int 1 [0x1]))
)
However, gen_lowpart_for_combine() FAILS  as it will reject taking 
lowpart of SImode expression  because  sizeUNITS_PER_WORD.

So no test pattern can be  matched. :-(

Style 2 is hugely problematic. The substitution works fine, but the 
simplification will always fail - making it apparently  impossible to 
create matching patterns for bit tests of  the LSB of SImode or DImode 
values.


Any clues how I might get around this?

Andy








Re: Problems with builtin setjmp receiver getting eliminated - Help

2008-03-23 Thread Andrew Hutchinson
I have realised that  part of the problem is that the receiver block has 
no incoming edges so cfgcleanup removes it as unreachable block - right?


So any target that need a non trivial receiver for builtin_setjmp will 
not work? That would mean  any that have an offset between stack and 
pointers?


I guess the same problem exists for non-local goto?

I am not convinced it could be this wrong. So please comment and suggest 
solution - I'm sure I can write target handler but it seems so wrong to 
leave this as  issue open.



Andy



Andrew Hutchinson wrote:
I have real problems trying to get to the root of  bug in  
builtin_setjmp implementation and seek anyones wisdom on what I have 
found and a way forward.


Sometimes it's not always clear which part is wrong - when presented 
with mismatches.

I will post a bug report when I have got a little closer.

The problem I was looking at is the frame pointer being wrong on the 
AVR target. (Stack and PC are ok)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21078 - but ANY 
setjmp/longjump has the problem.


I traced this down to the handling of the frame pointer: by the 
receiver - (which is the code the longjmp will jump back  to)


Setjmp: - save pointer

   Buf[0] = Virtual_stack_var

Longjmp: get pointer

   Hard_Frame_pointer=buf[0]

Receiver: put back in frame pointer

   Virtual_stack_var = Hard_Frame_pointer

The uniqueness on AVR is that Frame_pointer (and stack pointer) are 1 
byte different from the first stack element. So the Virtual_stack_var 
is 1 different from the frame_pointer

i.e.
#define STACK_GROWS_DOWNWARD
#define STARTING_FRAME_OFFSET 1
#define STACK_POINTER_OFFSET 1

That's  ok as this is recognized by instantiate_virtual_regs, which 
makes the replacements later. In this case


Buf[0] = Frame_pointer+1
..
..
Frame_pointer =  Virtual_stack_var - 1

However, what is happening is that an earlier pass noted in RTL dump 
file  sibling eliminates the receiver code  block. So the frane 
point is not reset correctly, and  ends up being 1 out (which is bad). 
Other targets may survive if they don't have offset between stack and 
pointers.


So where do I look to find out why this is happening? Does the RTL 
have something missing or is the other pass not checking?


The RTL that gets eliminated is:

;; Start of basic block () - 5
(code_label/s 13 12 14 5 4  [2 uses])

(note 14 13 15 5 [bb 5] NOTE_INSN_BASIC_BLOCK)

(insn 15 14 16 5 built-in-setjmp.c:17 (use (reg/f:HI 28 r28)) -1 (nil))

(insn 16 15 17 5 built-in-setjmp.c:17 (clobber (reg:HI 2 r2)) -1 (nil))

(insn 17 16 18 5 built-in-setjmp.c:17 (set (reg/f:HI 37 
virtual-stack-vars)

   (reg/f:HI 28 r28)) -1 (nil))

(insn 18 17 19 5 built-in-setjmp.c:17 (clobber (reg/f:HI 28 r28)) -1 
(nil))


(insn 19 18 20 5 built-in-setjmp.c:17 (asm_input/v () 0) -1 (nil))
;; End of basic block 5 - ( 6)


The second PROBLEM I noted is that gcc creates RTL for TWO receivers 
for a  setjmp.  One is naturally from expand_builtin_setjmp_receiver 
but there is then another one just after created by 
expand_nl_goto_receiver in stmt.c- whats all this about?


Despite having two, both get optimised out!


Andy













Re: Excess registers pushed - regs_ever_live not right way?

2008-03-02 Thread Andrew Hutchinson



I gave up with DF and instead went through function tree argument to 
rediscover argument registers.
It was then simple matter to exclude these from epilog/prolog registers 
saves/restores.


The patches is posted - and seem quite portable to other targets.

http://gcc.gnu.org/ml/gcc-patches/2008-03/msg00115.html


best regards and thanks for help.



Seongbae Park (???, ???) wrote:

2008/3/1 Andrew Hutchinson [EMAIL PROTECTED]:
  

I'm am still struggling with a good solution that avoids unneeded saves
 of parameter registers.

 To solve problem all I need to know are the registers actually used for
 parameters. Since the caller assumes all of these are clobbered by
 callee - they should never need to be saved.



I'm totally confused what is the problem here.
I thought you were seeing extra callee-save register save/restore in prologue,
but now it sounds like you're seeing extra caller-save register save/restore.
Which one are you trying to solve, and what kind of target is this ?

  

 DF_REG_DEF_COUNT is showing 1 artificial def for all POTENTIAL parameter
 registers - not just the ones that are really used (since it uses target
 FUNCTION_ARG_REGNO_P to get parameter registers)



You said you wanted to know if there's a def of a register within a function.
For an incoming parameter, there will be one artificial def,
and if there's no other def, it means there's no real def
of the register within the function.

  

 So the DF artificial defs are useless in trying to find real parameter
 registers.



I don't understand what you mean by this. What do you mean by
real parameter register ?

  

 That seem to require going over all DF chains to work out which
 registers are externally defined. DF does not solve problem for me.



What do you mean by externally defined ?
DF may not solve the problem for you,
but now I'm completely lost on what your problem is.

  

 There has got to be an easier way of finding parameter registers used by
 function.



If you want to find all the uses (use as in reading a register but not
writing to it),
you should look at USE chain, not DEF chain, naturally.

Seongbae

  


Re: Excess registers pushed - regs_ever_live not right way?

2008-03-01 Thread Andrew Hutchinson
I'm am still struggling with a good solution that avoids unneeded saves
of parameter registers.

To solve problem all I need to know are the registers actually used for
parameters. Since the caller assumes all of these are clobbered by
callee - they should never need to be saved.

DF_REG_DEF_COUNT is showing 1 artificial def for all POTENTIAL parameter
registers - not just the ones that are really used (since it uses target
FUNCTION_ARG_REGNO_P to get parameter registers)

So the DF artificial defs are useless in trying to find real parameter
registers.

That seem to require going over all DF chains to work out which
registers are externally defined. DF does not solve problem for me.

There has got to be an easier way of finding parameter registers used by
function.

Ideas?




Seongbae Park (박성배, 朴成培) wrote:
 You can use DF_REG_DEF_COUNT() - if this is indeed a parameter register,
 there should be only one def (artificial def) or no def at all.
 Or if you want to see all defs for the reg,
 follow DF_REG_DEF_CHAIN().

 Seongbae

 On Wed, Feb 27, 2008 at 6:03 PM, Andrew Hutchinson
 [EMAIL PROTECTED] wrote:
   
 Register contains  parameter that is passed to function. This register
  is not part of call used set.

  If this type of register were modified by function, then it would be
  saved by function.

  If this register is not modified by function, it should not be saved.
  This is true even if function is not a leaf function (as same register
  would be preserved by deeper calls)


  Andy





  Seongbae Park (박성배, 朴成培) wrote:
   On Wed, Feb 27, 2008 at 5:16 PM, Andrew Hutchinson
   [EMAIL PROTECTED] wrote:
  
   Register saves by prolog (pushes) are typically made with reference to
df_regs_ever_live_p() or  regs_ever_live. ||
  
If my understanding is correct,  these calls reflect register USEs and
not register DEFs. So if register is used in a function, but not
otherwise changed, it will get pushed unnecessarily on stack by prolog.
  
  
   This implies that the register is either a global register
   or a parameter register, in either case it won't be saved/restored
   as callee save.
   What kind of a register is it and how com there's only use of it in a 
 function
   but it's not a global ?
  
   Seongbae
  
  

 



   


Re: Excess registers pushed - regs_ever_live not right way?

2008-03-01 Thread Andrew Hutchinson

Sorry terminology is fighting language!

by parameter - I mean argument registers - also a stray use may have 
crept in.


Original problem :  prolog is saving live registers that are not call 
used following normal gcc methods.
But in AVR target this will include some argument registers - as not 
all argument registers are call used.
Function Argument registers do not need to be saved (since callee 
assumes they are always clobbered).


To solve problem all I need to know is what registers really do contain 
function arguments. Then I can omit these from prolog saves and fix bug.


DF does not tell me what registers contain function arguments. It marks 
all possible arguments registers with artificial def (which are known 
anyway). Unfortunately it is not as simple as counting defs as I had hoped.


So I would then have to go through all chains for possible arguments 
to see if that external def is actually used inside function.  This can 
not be shortcut by looking for just any use - or multiple defs as real 
argument registers can be re-use inside function.


Is this conclusion correct?

Andy



Seongbae Park (박성배, 朴成培) wrote:

2008/3/1 Andrew Hutchinson [EMAIL PROTECTED]:
  

I'm am still struggling with a good solution that avoids unneeded saves
 of parameter registers.

 To solve problem all I need to know are the registers actually used for
 parameters. Since the caller assumes all of these are clobbered by
 callee - they should never need to be saved.



I'm totally confused what is the problem here.
I thought you were seeing extra callee-save register save/restore in prologue,
but now it sounds like you're seeing extra caller-save register save/restore.
Which one are you trying to solve, and what kind of target is this ?

  

 DF_REG_DEF_COUNT is showing 1 artificial def for all POTENTIAL parameter
 registers - not just the ones that are really used (since it uses target
 FUNCTION_ARG_REGNO_P to get parameter registers)



You said you wanted to know if there's a def of a register within a function.
For an incoming parameter, there will be one artificial def,
and if there's no other def, it means there's no real def
of the register within the function.

  

 So the DF artificial defs are useless in trying to find real parameter
 registers.



I don't understand what you mean by this. What do you mean by
real parameter register ?

  

 That seem to require going over all DF chains to work out which
 registers are externally defined. DF does not solve problem for me.



What do you mean by externally defined ?
DF may not solve the problem for you,
but now I'm completely lost on what your problem is.

  

 There has got to be an easier way of finding parameter registers used by
 function.



If you want to find all the uses (use as in reading a register but not
writing to it),
you should look at USE chain, not DEF chain, naturally.

Seongbae

  


Excess registers pushed - regs_ever_live not right way?

2008-02-27 Thread Andrew Hutchinson
Register saves by prolog (pushes) are typically made with reference to 
df_regs_ever_live_p() or  regs_ever_live. ||


If my understanding is correct,  these calls reflect register USEs and 
not register DEFs. So if register is used in a function, but not 
otherwise changed, it will get pushed unnecessarily on stack by prolog.


(as noted in this bug  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32871)

I checked a couple of other ports but they all use 
df_regs_ever_live_p(). Indeed this is noted method in manual.


The question is, what df routine or variable can be used to determine 
which registers are DEFs and hence destructively used by a function?


Maybe:  df_invalidated_by_call  in conjunction with:
df_get_call_refs perhaps() perhaps?


Andy





Re: Excess registers pushed - regs_ever_live not right way?

2008-02-27 Thread Andrew Hutchinson
Thanks

I will check this.

DF Dump in RTL file does not list Artificial defs - which is what I
think I need. However, I do note that all potential parameter registers
(including those unused) - are listed as invalidated by call. - which
means 1 (or more) defs. So like you suggest I just need to find count.

Andy




Seongbae Park (박성배, 朴成培) wrote:
 You can use DF_REG_DEF_COUNT() - if this is indeed a parameter register,
 there should be only one def (artificial def) or no def at all.
 Or if you want to see all defs for the reg,
 follow DF_REG_DEF_CHAIN().

 Seongbae

 On Wed, Feb 27, 2008 at 6:03 PM, Andrew Hutchinson
 [EMAIL PROTECTED] wrote:
   
 Register contains  parameter that is passed to function. This register
  is not part of call used set.

  If this type of register were modified by function, then it would be
  saved by function.

  If this register is not modified by function, it should not be saved.
  This is true even if function is not a leaf function (as same register
  would be preserved by deeper calls)


  Andy





  Seongbae Park (박성배, 朴成培) wrote:
   On Wed, Feb 27, 2008 at 5:16 PM, Andrew Hutchinson
   [EMAIL PROTECTED] wrote:
  
   Register saves by prolog (pushes) are typically made with reference to
df_regs_ever_live_p() or  regs_ever_live. ||
  
If my understanding is correct,  these calls reflect register USEs and
not register DEFs. So if register is used in a function, but not
otherwise changed, it will get pushed unnecessarily on stack by prolog.
  
  
   This implies that the register is either a global register
   or a parameter register, in either case it won't be saved/restored
   as callee save.
   What kind of a register is it and how com there's only use of it in a 
 function
   but it's not a global ?
  
   Seongbae
  
  

 



   


Re: Finding out what backend instruction pattern matches instruction

2008-01-22 Thread Andrew Hutchinson

Thank you greatly for the feedback

I took at look at mips.md - and we already use the conditional length 
for some instruction.


However, some effective AVR instruction lengths are vastly complicated 
by operands, length and addressing modes. (After all we emulate 16 and 
32 bit operations with only 8 bit CPU) . So it takes a relatively 
complex set of logic to get true length correct. This logic is already 
present as c functions. (move and shifts being the more complex 
variety). We call these routines only for final adjustment - but have to 
figure out which to call based on the insn RTL .


I fear that duplicating this logic in RTL patterns won't be very elegant 
or less error prone.


As I say above, the logic already exists as c function. If we could call 
these directly to determine the attribute value, it would be much easier!


For now, matching the name seems optimal.

Andy

Ian Lance Taylor wrote:

Andrew Hutchinson [EMAIL PROTECTED] writes:

  

The alternative, perhaps,  would be to set each length attribute
dynamically in each pattern - if that was possible. But that looks
like way more work.



That is certainly the best way.  Search for length in mips.md for
one example of how it can be done.

Ian

  


Re: Segmentation fault in df-scan.c

2008-01-21 Thread Andrew Hutchinson


Alas, enable-checking produced no different result or additional 
warnings or errors (though it might help me in the future!)


I have a work around but don't fully understand why a define_expand should
have caused segmentation fault.

I believe the issue might be that gcse does not expect to see any POST_INC 
patterns in its first pass. (The RTL dump files show that is where it died.)
A few are normally  created by patterns - but perhaps almost all 
restricted to prolog/epilog. In my case, I used define_expand so it 
appears in very earliest RTL, in a normal block. Most POST_INC/DEC etc 
are created after gcse pass. (by auto-inc-dec pass of course).


The expander used

rtx tmp_reg_rtx = copy_to_mode_reg (QImode,gen_rtx_MEM 
(QImode,gen_rtx_POST_INC (HImode, addr1)));


aka Rx= [Ry++] fails

However,making this simpler works:
rtx tmp_reg_rtx = copy_to_mode_reg (QImode,gen_rtx_MEM  (QImode, addr1));
emit_move_insn (addr1, gen_rtx_PLUS (Pmode, addr1, const1_rtx));
aka
Rx=[Ry]
Ry=Ry+1 

For now I have gone back to the second case, though the code is not 
quite as good.


thanks again

Andy






Finding out what backend instruction pattern matches instruction

2008-01-21 Thread Andrew Hutchinson
I am working on AVR port  and seek advice of the best way working out 
what instructions patterns have been natched to RTL.


This requires adjustment of instruction length to assist branching - 
when operands are finally known. Before this, worst case lengths are 
used from pattern length attributes.


At present, the ADJUST_INSN_LENGTH routine looks at the instruction RTL 
to figure out what pattern was matched, then calls the appropriate 
routine that can do the precise length calculation. The problem with 
this method is that this re-matching can easily be wrong. Great care is 
taken when additional backend patterns are used - or existing ones are 
re-arranged, or  instruction length are calculated incorrectly.


To get around this problem, I replaced this RTL checking with a simple 
lookup of the instruction name using


name = get_insn_name (INSN_CODE (insn));

Then a simple string compare can be used to determine precisely what has 
been matched.


It works fine, but is this an acceptable method ?

The alternative, perhaps,  would be to set each length attribute 
dynamically in each pattern - if that was possible. But that looks like 
way more work.








Segmentation fault in df-scan.c

2008-01-20 Thread Andrew Hutchinson



While working on a Cygwin/AVR backend patch, I had segmentation fault 
occur in df-scan.c - which appears unrelated to target.

I can't provide testcase as backend is modfied - but source was 2003-1.c

It all happens in df_scan.c (Rev 130805 14 Dec 2007)

df_ref_create_structure()  trys to access EMPTY collection_rec-def_vec  
as type DF_REF_REG_DEF is being set by  df_uses_record(), yet no space 
was allocated by df_noted_rescan()


This appears to be a bug but seek your combined  wisdom before filling a 
report:


1) emit-rtl (line 4647) calls df_notes_rescan (insn);
2) df_notes_rescan (line 2043) creates struct df_collection_rec 
collection_rec but does not allocate any storage for member def_vec
then  (line 2062)  calls df_uses_record - related to usage of 
REG_EQUIV and REG_EQUAL notes
3) df_uses_record (line 2994) , calls df_ref_record (relate to recording 
definition for PRE_DEC..POST_MODIFY) - with type set as DF_REF_REG_DEF

5) df_ref_record calls  df_ref_create_structure - which fails


Below is stack dump and a few variables and RTX of insn printed out


Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain 
conditions.

Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as i686-pc-cygwin...
(gdb) source ./gdbini.in
./gdbini.in: No such file or directory.
(gdb) source ./gdbinit.in
Breakpoint 1 at 0x6268d6: file ../../gcc/gcc/diagnostic.c, line 660.
Breakpoint 2 at 0x626863: file ../../gcc/gcc/diagnostic.c, line 604.
Breakpoint 3 at 0xa77a20
Breakpoint 4 at 0xa77a10
(gdb) run -mmcu=atmega128  -g -w  -O3  -DSTACK_SIZE=400 -da 
-DNO_TRAMPOLINES -fno-show-column  -DSIGNAL_SUPPRESS  -std=gnu99 200

3-1.c -o 2003-1.o
Starting program: /cygdrive/e/awhconf/gcc/cc1.exe -mmcu=atmega128  -g 
-w  -O3  -DSTACK_SIZE=400 -da -DNO_TRAMPOLINES -fno-show-col

umn  -DSIGNAL_SUPPRESS  -std=gnu99 2003-1.c -o 2003-1.o
Loaded symbols for /cygdrive/c/WINDOWS/system32/ntdll.dll
Loaded symbols for /cygdrive/c/WINDOWS/system32/kernel32.dll
Loaded symbols for /usr/bin/cygwin1.dll
Loaded symbols for /cygdrive/c/WINDOWS/system32/advapi32.dll
Loaded symbols for /cygdrive/c/WINDOWS/system32/rpcrt4.dll
Loaded symbols for /usr/bin/cygiconv-2.dll
foo baz bar main
Analyzing compilation unit
Performing interprocedural optimizations
visibility early_local_cleanups inline static-var 
pure-constAssembling functions:

bar foo baz main
Program received signal SIGSEGV, Segmentation fault.
0x007a03de in df_ref_create_structure (collection_rec=0x22c840, 
reg=0x124, loc=0x7ff31b04, bb=0x7fec3c00, insn=0x7ff778a0,

  ref_type=DF_REF_REG_DEF, ref_flags=292) at ../../gcc/gcc/df-scan.c:2611
2611collection_rec-def_vec[collection_rec-next_def++] = 
this_ref;

(gdb) where
#0  0x007a03de in df_ref_create_structure (collection_rec=0x22c840, 
reg=0x124, loc=0x7ff31b04, bb=0x7fec3c00, insn=0x7ff778a0,

  ref_type=DF_REF_REG_DEF, ref_flags=292) at ../../gcc/gcc/df-scan.c:2611
#1  0x007a2d8a in df_uses_record (collection_rec=0x22c840, loc=0x0, 
ref_type=DF_REF_REG_MEM_LOAD, bb=0x7fec3c00,

  insn=0x7ff778a0, flags=DF_REF_IN_NOTE) at ../../gcc/gcc/df-scan.c:2994
#2  0x007a56db in df_notes_rescan (insn=0x7ff778a0) at 
../../gcc/gcc/df-scan.c:2062
#3  0x004d3c91 in set_unique_reg_note (insn=0x7ff778a0, kind=REG_EQUAL, 
datum=0x7ff1e8f0) at ../../gcc/gcc/emit-rtl.c:4647
#4  0x005ce935 in try_replace_reg (from=0x7ff1d740, to=0x1bebbd8, 
insn=0x7ff778a0) at ../../gcc/gcc/gcse.c:2687
#5  0x005cef5d in constprop_register (insn=0x7ff778a0, from=0x7ff1d740, 
to=0x7ff319c8, alter_jumps=0 '\0')

  at ../../gcc/gcc/gcse.c:2904
#6  0x005cfdfc in one_cprop_pass (pass=1, cprop_jumps=0 '\0', 
bypass_jumps=0 '\0') at ../../gcc/gcc/gcse.c:2973

#7  0x005d5166 in rest_of_handle_gcse () at ../../gcc/gcc/gcse.c:722
#8  0x00621508 in execute_one_pass (pass=0xa79770) at 
../../gcc/gcc/passes.c:1118
#9  0x006216ae in execute_pass_list (pass=0xa79350) at 
../../gcc/gcc/passes.c:1171
#10 0x006216c1 in execute_pass_list (pass=0xa79630) at 
../../gcc/gcc/passes.c:1172
#11 0x00848b4c in tree_rest_of_compilation (fndecl=0x7fdcf340) at 
../../gcc/gcc/tree-optimize.c:404
#12 0x0062277b in cgraph_expand_function (node=0x7ff40480) at 
../../gcc/gcc/cgraphunit.c:1151

#13 0x006243fe in cgraph_optimize () at ../../gcc/gcc/cgraphunit.c:1214
#14 0x0041aff7 in c_write_global_declarations () at 
../../gcc/gcc/c-decl.c:8074
#15 0x006295e6 in toplev_main (argc=14, argv=0x1b91d60) at 
../../gcc/gcc/toplev.c:1055

#16 0x004938da in main (argc=14, argv=0x1b91d60) at ../../gcc/gcc/main.c:35
(gdb) pr
The history is empty.
(gdb) print insn
$1 = (rtx) 0x7ff778a0
(gdb) pr
(insn 10 84 11 3 2003-1.c:36 (set (reg:QI 50)
  (mem:QI (post_inc:HI (reg:HI 48)) [0 S1 A8])) 8 {*movqi} 
(expr_list:REG_EQUAL (mem:QI (post_inc:HI (reg:HI 48)) [0 S1