Success with MinGW and AVR and LTO - almost
I have just succeed in building last snapshot version 4.5.0 20100107 for AVR target with working LTO on both LINUX and MinGW hosts! As noted before #define LINKER_NAME has to be deleted from target avr.h (I will raise patch for this) I also built avr target for MINGW under MSYS and this has no obvious issues for either build or use with normal (non-lto) compilation However, LTO use failed completely. LTO using libelf needs to handle files as BINARY with Windows. This would seem to apply to any target on MinGW I hacked fopen/open calls in lto.c and lto-elf.c to use O_BINARY and rb and compilation with -flto was then successful! I am not sure how this should be fixed properly. Andy
Re: Success with MinGW and AVR and LTO - almost
I think rb is nop. However, O_BINARY is less portable. There is another way. If MinGW hosted build is linked with binmode.o - the default for files become binary Some other methods are here: http://oldwiki.mingw.org/index.php/binary Rafael Espindola wrote: I hacked fopen/open calls in lto.c and lto-elf.c to use O_BINARY and rb and compilation with -flto was then successful! I am not sure how this should be fixed properly. Using O_BINARY and rb should be a nop on unix, no? Is it wrong to use them on any arch we care about? Andy Cheers,
Re: Success with MinGW and AVR and LTO - almost
Kai Tietz wrote: Well, on linux (libc) fopen/freopen/etc the b is an nop (but handled). For O_BINARY the common approach here is to do the following condifition before use: #ifndef O_BINARY #define O_BINARY 0 #endif This is a pattern pretty often used. To rely here on binmode.o is a way, too, but it is the most ugly one, too. It affects any file open, which isn't necessarily wanted. Cheers, Kai Is LTO really the only place gcc needs binary access to files for build of cross compiler? Andy
Re: Success with MinGW and AVR and LTO - almost
Kai Tietz wrote: Well, open call there aren't that much but point of interest is in 'c-pch.c: fd = open (name, O_RDONLY | O_BINARY, 0666);' as it uses O_BINARY, too. See also for pattern in libiberty mkstemps.c Regards, Kai It looks like O_BINARY is already defined in system.h, so all it needs is the patches to open(). I backed off my shot gun fix and there are just two places that appear to be problem: lto_elf_file_open () in lto-elf.c lto_read_section_data() in lto.c With O_BINARY on read/write remove failures from my simple test. Andy
Re: AVR gives weird error with LTO
I used v and progressed a little The problem seems to be that linker is called with -fwhopr or -flto as command line option. ld -fwhopr . Linker find '-f' and complains. I assume this is not a valid option for ld? Or is my linker wrong version or something? Note this is cross compile toolchain. Andy Dave Korn wrote: Andrew Hutchinson wrote: When AVR target is built, without explicitly disabling LTO, it will produce 1000's of testsuite failures of -LTO -WHOPR tests with this compilation error: ld: -f may not be used without -shared Any idea what is wrong or how to make LTO work correctly here? The standard way to be proceed would be: add -v to the command-line in the PR; find out what is actually getting passed to ld; figure out what kind of specs-processing accident (most likely) is causing ld to receive a -f option. cheers, DaveK
Re: AVR gives weird error with LTO
Dave Korn wrote: Rafael Espindola wrote: It's not a valid option for ld. It *is* a valid option for the collect2 driver/wrapper executable that gcc uses to invoke ld, which suggests to me that the AVR port must be configured not to build collect2, but that it is going to need to do so if it wants to use LTO/WHOPR. See use_collect2 in gcc/config.gcc Or you could port gold to AVR and use the plugin :-) I hadn't checked, but yeah, since AVR is an ELF platform, that's a nice solution too. There might still be reason to build a collect2, for interop with older binutils. cheers, DaveK Thank you David and Rafel I will dig further into collect2. I had noted that avr.h has the following: /* This is undefined macro for collect2 disabling */ #define LINKER_NAME ld Also, the MINGW host is the most significant for the AVR target - and problems with collect2 may be related to maintaining compatibility to that. Andy
Re: AVR gives weird error with LTO
Thank you David and Rafel I will dig further into collect2. I had noted that avr.h has the following: /* This is undefined macro for collect2 disabling */ #define LINKER_NAME ld That's indeed going to break LTO. Richard. That seems to be the key issue. Without #define LINKER_NAME, AVR is running LTO/WHOPR tests ok ! (No idea if it does anything useful though) Now to figure out why it was added in 2000 (rth). Hopefully Eric Weddington or Denis might have some idea and perhaps know if it still has a purpose. Andy
How should I prototype cpp_define in target patch?
I want to post patch for http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42457 The code moved out to -c.c file ALREADY uses: builtin_define_std cpp_define Both from c-cppbuiltin.c. These have no prototypes defined in gcc. So of course there are warnings emitted. Is this OK? Should I locally define prototypes? Something else? Andy
Re: How should I prototype cpp_define in target patch?
Doh! Joseph S. Myers wrote: On Wed, 23 Dec 2009, Andrew Hutchinson wrote: builtin_define_std cpp_define Both from c-cppbuiltin.c. These have no prototypes defined in gcc. They do have prototypes, in c-common.h and cpplib.h.
Re: Which optimizer should remove redundant subreg of sign_extension?
Paolo Bonzini wrote: I think that if you add the simplification to simplify-rtx.c's simplify_subreg, combine should pick it up automagically. Paolo There we have it! There is apparently already this optimization performed - so I will have to dig further into why it does not a happen. simplify_subreg() snip /* If we're requesting the lowpart of a zero or sign extension, there are three possibilities. If the outermode is the same as the origmode, we can omit both the extension and the subreg. If the outermode is not larger than the origmode, we can apply the truncation without the extension. Finally, if the outermode is larger than the origmode, but both are integer modes, we can just extend to the appropriate mode. */
Approval as AVR maintainer
How does one get to be maintainer of port? Specifically AVR port - so that I do not need to get approval to commit changes. The time it takes now is rather longer than getting approval on other parts of GCC. The process does not seem to be written down anywhere - but I am sure someone will correct me if I am wrong. Seasonal Greetings Andy
Which optimizer should remove redundant subreg of sign_extension?
I came across this RTL on AVR in combine dump (part of va-arg-9.c test) (set (reg:QI 25 r25 [+1 ]) (subreg:QI (sign_extend:HI (reg:QI 49)) 1)) The sign extension is completely redundant - the upper part of register is not used elsewhere - but the RTL remains unchanged through all the optimizers and sign_extension appears in final code. Which RTL optimisation should be taking care of this? Propagation? It would help me look in the right place to understand and perhaps fix issue. I suspect the presence of hard register is why it does not get removed. (the hard register is the function return value) Andy
Need help to correct vla-dealloc testcase
I need advise before I submit pathc to fix the test gcc-torture/execute/vla-dealloc-1.c (attached below) The test appears to be unsafe. The original fault was failure to deallocate VLA on the jump - thus with the fault present the test would appear to perform 1 million new allocation - and fail presumably due to either execution time or run time error - neither of which seem certian. I have to modify the test since it presumes 32bit or larger integers - and thus on 16bit targets overflowing into -ve allocations make it somewhat undefined behavior. It take rather a long time to execute - way more than other execution tests and trips timeout limit for AVR simulator tests. a) I could disable test for target without 32bit integers b) I could change n to be 32 bit on 16 bit targets (the test will then be equally uncertain on 16 bit targets at detecting fault.) c) I could reduce n to 10,000 - but that likely will create more false positives a) is my easy way out - but perhaps I should address the apparent weakness where the test could pass with the original problem present? Suggestions? /* VLAs should be deallocated on a jump to before their definition, including a jump to a label in an inner scope. PR 19771. */ void *volatile p; int main (void) { int n = 0; if (0) { lab:; } int x[n % 1000 + 1]; x[0] = 1; x[n % 1000] = 2; p = x; n++; if (n 100) goto lab; return 0; }
Re: Need help to correct vla-dealloc testcase
Thanks I am submitting patch to drop count to 10,000 for 16 bit int target. Using 32 bit counter of 1 million takes a minute or so on simulator - which is high. The lower count is quick and only requires a (16bit) stack limit to be lower than 10MB - which is pretty safe. Andy Joseph S. Myers wrote: On Sun, 6 Dec 2009, Andrew Hutchinson wrote: The test appears to be unsafe. The original fault was failure to deallocate VLA on the jump - thus with the fault present the test would appear to perform 1 million new allocation - and fail presumably due to either execution time or run time error - neither of which seem certian. It's expected to run into RLIMIT_STACK or equivalent, with an expectation that stack limits are generally below 500MB. (A million executions of a few instructions should be pretty fast in general.) b) I could change n to be 32 bit on 16 bit targets (the test will then be equally uncertain on 16 bit targets at detecting fault.) This seems to be the natural approach.
How does builtin_sqrt get used - or not
I am tracking test failure with avr target where function sqrtf is undefined reference at link time. Here is command line: /media/verbatim/gcchead/obj-dir/gcc/xgcc -B/media/verbatim/gcchead/obj-dir/gcc/ /media/verbatim/gcchead/trunk/gcc/testsuite/gcc.dg/pr41963.c -O2 -ffast-math -DSTACK_SIZE=2048 -DNO_TRAMPOLINES -DSIGNAL_SUPPRESS -mmcu=atmega128 /home/andy/winavrfiles/avrtest/dejagnuboards/exit.c -Wl,-u,vfprintf -lprintf_flt -Wl,-Tbss=0x802000,--defsym=__heap_end=0x80 -lm -o pr41963.exe I am lead to believe that gcc might use builtin_sqrtf rather than sqrtf(). I am successfully using fabsf() - with no link errors. Is their any target configuration needed for builtin_sqrtf that I should know about ? Andy
Bug in binop rotate ?
I have been adding rotate capability to AVR port and have come across what I think is bug in optabs.c: expand_binop() This occurs during a rotate expansion. For example target = op0 rotated by op1 In the particular situation (code extract below) it tries a reverse rotate of (bits - op1). Where this expression is expanded as a simple integer, a negation or subtraction depending on type of op1 and target. The expansion of the subtraction is using the mode of the target - I believe it should be using the mode of op1. The mode of the rotation amount need not be the same as the target. target:DI = Op0:DI rotate op1:HI In my testcase it is not and I get asserts latter in simplfy_rtx. The negation mode looks equally wrong. Am I mistaken? /* If we were trying to rotate, and that didn't work, try rotating the other direction before falling back to shifts and bitwise-or. */ if (((binoptab == rotl_optab optab_handler (rotr_optab, mode)-insn_code != CODE_FOR_nothing) || (binoptab == rotr_optab optab_handler (rotl_optab, mode)-insn_code != CODE_FOR_nothing)) mclass == MODE_INT) { optab otheroptab = (binoptab == rotl_optab ? rotr_optab : rotl_optab); rtx newop1; unsigned int bits = GET_MODE_BITSIZE (mode); if (CONST_INT_P (op1)) newop1 = GEN_INT (bits - INTVAL (op1)); else if (targetm.shift_truncation_mask (mode) == bits - 1) newop1 = negate_rtx (mode, op1); else newop1 = expand_binop (mode, sub_optab, GEN_INT (bits), op1, NULL_RTX, unsignedp, OPTAB_DIRECT);
Re: Bug in binop rotate ?
Thanks for your review. I have submitted bug report. Richard Guenther wrote: On Sat, Oct 17, 2009 at 3:47 PM, Andrew Hutchinson andrewhutchin...@cox.net wrote: I have been adding rotate capability to AVR port and have come across what I think is bug in optabs.c: expand_binop() This occurs during a rotate expansion. For example target = op0 rotated by op1 In the particular situation (code extract below) it tries a reverse rotate of (bits - op1). Where this expression is expanded as a simple integer, a negation or subtraction depending on type of op1 and target. The expansion of the subtraction is using the mode of the target - I believe it should be using the mode of op1. The mode of the rotation amount need not be the same as the target. target:DI = Op0:DI rotate op1:HI In my testcase it is not and I get asserts latter in simplfy_rtx. The negation mode looks equally wrong. Am I mistaken? I think you are correct. Richard. /* If we were trying to rotate, and that didn't work, try rotating the other direction before falling back to shifts and bitwise-or. */ if (((binoptab == rotl_optab optab_handler (rotr_optab, mode)-insn_code != CODE_FOR_nothing) || (binoptab == rotr_optab optab_handler (rotl_optab, mode)-insn_code != CODE_FOR_nothing)) mclass == MODE_INT) { optab otheroptab = (binoptab == rotl_optab ? rotr_optab : rotl_optab); rtx newop1; unsigned int bits = GET_MODE_BITSIZE (mode); if (CONST_INT_P (op1)) newop1 = GEN_INT (bits - INTVAL (op1)); else if (targetm.shift_truncation_mask (mode) == bits - 1) newop1 = negate_rtx (mode, op1); else newop1 = expand_binop (mode, sub_optab, GEN_INT (bits), op1, NULL_RTX, unsignedp, OPTAB_DIRECT);
Re: Constraint modifier for partially overlaping operands
The situation comes up where no or a partial overlap of registers permits optimal code - since this can avoid using scratch register Thus no overlap OR partial overlap is preferred (or required) Using nothing leaves overlap without preference - full, partial,none Using = gives the least preffered case - full Using gives only the no-overlapping case - none Ideally NOT= is required - which I would hope the register allocator would quite like too. Dave Korn wrote: Ian Lance Taylor wrote: Andrew Hutchinson andrewhutchinson@ writes: I can use = modifier to make operands use same register and early clobber to avoid overlaps. Is it possible to have or construct a contraint that permits partial overlap operands. (which neither = or would allow) The case would be wide types taking multiple hard registers. eg Input r20..23 Output r22..25 There is no such constraint today. I suppose it would be possible to define such a constraint if it seemed useful. Maybe I'm misunderstanding, but I thought that was already the default if you use neither = to specify full overlap nor for no overlap? Frex, a lot of ABIs specify that DImode values stored in pairs of SImode registers must always use an odd-even register pair (using a test in HARD_REGNO_MODE_OK), but when I was working on a custom port that allowed them in any register pair, GCC would happily generate partially overlapping movdi instructions such as (set:DI (reg:DI 5) (reg:DI 6)) (i.e., move r6/7 - r5/6). This hasn't changed since 3.3, has it? cheers, DaveK
Re: Constraint modifier for partially overlaping operands
Yes. But we need to lower after combine and before register allocation. I'm still figuring out how to do that. Lowering before combine - in particular causes a lot of code bloat. This loose all optimization of conditional jumps, shifts etc. In our case, most lowering is delayed until after reload. This retains the RTL optimization but is suboptimal for allocation and lacks enough forward propagation. For a similar reason, not splitting wide types often produces far better code. One exception is DImode which by default is lowered at expand -since there are no DImode instructions defined. This ends up with pretty dire code since the built in expansion cant use a carry based pattern and we again miss the RTL optimization at the wider level. It would seem we need to have target dependent pass order to improve on this significantly. Andy Richard Henderson wrote: On 10/16/2009 11:04 PM, Ian Lance Taylor wrote: Andrew Hutchinsonandrewhutchin...@cox.net writes: I can use = modifier to make operands use same register and early clobber to avoid overlaps. Is it possible to have or construct a contraint that permits partial overlap operands. (which neither = or would allow) The case would be wide types taking multiple hard registers. eg Input r20..23 Output r22..25 There is no such constraint today. I suppose it would be possible to define such a constraint if it seemed useful. I'd much prefer if the port decomposed its double word operations and used the lower-subreg pass to decompose the double word registers. At which point the register allocator has all of the information it needs to do the right thing. r~
Re: Cannot get Bit test RTL to cooperate with Combine.
Thank you so much for your information! I will investigate your patch. (I just hacked lowpart_for_combine to allow lowering something larger than word and the subreg matched no problem.) It looks like RTL generation is somewhat odd and not helping. My test used extern long x; if (x 1) If there is only a single reference to x then (x 1), is lowered to HI mode and does not included any subregs (nosplit-wide-types). So my patterns match. If my test code included two bit tests - I get HI mode subregs on the x 1 (which will not match) but not on x 2 the latter is in the wider SI mode and will match. If I turn on split-wide-types, the subregs are not removed by subreg lowering since there is now mixed mode usage! Something seem backwards in expansion, regarding lowering and references. Andy Joern Rennecke wrote: On Sun, Sep 20, 2009 at 01:49:39PM -0400, Andrew Hutchinson wrote: All, I have been debugging AVR port to see why we fail to match so many bit test opportunities. When dealing with longer modes I have come across a problem I can not solve. Expansion in RTL for a bit test can produce two styles. STYLE 1 Bit to be tested is NOT LSB (e.g. if ( longthing 0x10)), the expanded code contains the test as: (and:SI (reg:SI 45 [ lx.1 ]) (const_int 16 [0x10])) Bit tests are matched by combine. Combine has no problems with this and eventually creates a matching pattern based on the conversion of the AND to a zero extraction (set (pc) (if_then_else (ne (zero_extract:SI (subreg:QI (reg:SI 45 [ lx.1 ]) 0) (const_int 1 [0x1]) (const_int 4 [0x4])) (const_int 0 [0x0])) (label_ref:HI 133) (pc))) This will match Bit test patterns and produces optimal code. :-) Unfortunately, when combine knows about upper bits that are zero, it will generate an lshiftrt instead, which can't be legitimately matched by a bit test. I have a patch for this which I haven't gotten around yet to test it separately in trunk and formally submit to the patches list, but you can extract it from arc-20081210-branch: 2008-12-02 Jorn Rennecke joern.renne...@arc.com * combine.c (undo_since): New function, broken out of: (undo_all). (combine_simplify_bittest): New function. (combine_simplify_rtx, simplify_if_then_else): Use it. * config/arc/arc.c (arc_rtx_costs): Check for bbit test. svn diff -r144651:144652 svn://gcc.gnu.org/svn/gcc/branches/arc-20081210-branch/gcc/combine.c
Re: Cannot get Bit test RTL to cooperate with Combine.
Why doesn't combine try matching unsimplified expressions when it fails? This would at least permit creating patterns based on explicit format of input RTL without the added vagaries of simplification Andy Joern Rennecke wrote: On Sun, Sep 20, 2009 at 01:49:39PM -0400, Andrew Hutchinson wrote: All, I have been debugging AVR port to see why we fail to match so many bit test opportunities. When dealing with longer modes I have come across a problem I can not solve. Expansion in RTL for a bit test can produce two styles. STYLE 1 Bit to be tested is NOT LSB (e.g. if ( longthing 0x10)), the expanded code contains the test as: (and:SI (reg:SI 45 [ lx.1 ]) (const_int 16 [0x10])) Bit tests are matched by combine. Combine has no problems with this and eventually creates a matching pattern based on the conversion of the AND to a zero extraction (set (pc) (if_then_else (ne (zero_extract:SI (subreg:QI (reg:SI 45 [ lx.1 ]) 0) (const_int 1 [0x1]) (const_int 4 [0x4])) (const_int 0 [0x0])) (label_ref:HI 133) (pc))) This will match Bit test patterns and produces optimal code. :-) Unfortunately, when combine knows about upper bits that are zero, it will generate an lshiftrt instead, which can't be legitimately matched by a bit test. I have a patch for this which I haven't gotten around yet to test it separately in trunk and formally submit to the patches list, but you can extract it from arc-20081210-branch: 2008-12-02 Jorn Rennecke joern.renne...@arc.com * combine.c (undo_since): New function, broken out of: (undo_all). (combine_simplify_bittest): New function. (combine_simplify_rtx, simplify_if_then_else): Use it. * config/arc/arc.c (arc_rtx_costs): Check for bbit test. svn diff -r144651:144652 svn://gcc.gnu.org/svn/gcc/branches/arc-20081210-branch/gcc/combine.c
Cannot get Bit test RTL to cooperate with Combine.
All, I have been debugging AVR port to see why we fail to match so many bit test opportunities. When dealing with longer modes I have come across a problem I can not solve. Expansion in RTL for a bit test can produce two styles. STYLE 1 Bit to be tested is NOT LSB (e.g. if ( longthing 0x10)), the expanded code contains the test as: (and:SI (reg:SI 45 [ lx.1 ]) (const_int 16 [0x10])) Bit tests are matched by combine. Combine has no problems with this and eventually creates a matching pattern based on the conversion of the AND to a zero extraction (set (pc) (if_then_else (ne (zero_extract:SI (subreg:QI (reg:SI 45 [ lx.1 ]) 0) (const_int 1 [0x1]) (const_int 4 [0x4])) (const_int 0 [0x0])) (label_ref:HI 133) (pc))) This will match Bit test patterns and produces optimal code. :-) STYLE 2 Bit to be tested is LSB (e.g. if ( longthing 1)), the expanded RTL code uses SUBREG to lower width (apparently from SImode to word size). (and:HI (subreg:HI (reg:SI 45 [ lx.1 ]) 0) (const_int 1 [0x1])) This seems to occur regardless of -f(no)split-wide-types for size HImode (which is integer mode). This RTL becomes a problem for combine Combine uses subst(), combine_simplify_rtx() and eventually simplify_comparison() where it attempts to WIDEN the AND and take the lowpart. ge_low_part(HImode, (and:SI (reg:SI 45 [ lx.1 ]) (const_int 1 [0x1])) ) However, gen_lowpart_for_combine() FAILS as it will reject taking lowpart of SImode expression because sizeUNITS_PER_WORD. So no test pattern can be matched. :-( Style 2 is hugely problematic. The substitution works fine, but the simplification will always fail - making it apparently impossible to create matching patterns for bit tests of the LSB of SImode or DImode values. Any clues how I might get around this? Andy
Re: Problems with builtin setjmp receiver getting eliminated - Help
I have realised that part of the problem is that the receiver block has no incoming edges so cfgcleanup removes it as unreachable block - right? So any target that need a non trivial receiver for builtin_setjmp will not work? That would mean any that have an offset between stack and pointers? I guess the same problem exists for non-local goto? I am not convinced it could be this wrong. So please comment and suggest solution - I'm sure I can write target handler but it seems so wrong to leave this as issue open. Andy Andrew Hutchinson wrote: I have real problems trying to get to the root of bug in builtin_setjmp implementation and seek anyones wisdom on what I have found and a way forward. Sometimes it's not always clear which part is wrong - when presented with mismatches. I will post a bug report when I have got a little closer. The problem I was looking at is the frame pointer being wrong on the AVR target. (Stack and PC are ok) http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21078 - but ANY setjmp/longjump has the problem. I traced this down to the handling of the frame pointer: by the receiver - (which is the code the longjmp will jump back to) Setjmp: - save pointer Buf[0] = Virtual_stack_var Longjmp: get pointer Hard_Frame_pointer=buf[0] Receiver: put back in frame pointer Virtual_stack_var = Hard_Frame_pointer The uniqueness on AVR is that Frame_pointer (and stack pointer) are 1 byte different from the first stack element. So the Virtual_stack_var is 1 different from the frame_pointer i.e. #define STACK_GROWS_DOWNWARD #define STARTING_FRAME_OFFSET 1 #define STACK_POINTER_OFFSET 1 That's ok as this is recognized by instantiate_virtual_regs, which makes the replacements later. In this case Buf[0] = Frame_pointer+1 .. .. Frame_pointer = Virtual_stack_var - 1 However, what is happening is that an earlier pass noted in RTL dump file sibling eliminates the receiver code block. So the frane point is not reset correctly, and ends up being 1 out (which is bad). Other targets may survive if they don't have offset between stack and pointers. So where do I look to find out why this is happening? Does the RTL have something missing or is the other pass not checking? The RTL that gets eliminated is: ;; Start of basic block () - 5 (code_label/s 13 12 14 5 4 [2 uses]) (note 14 13 15 5 [bb 5] NOTE_INSN_BASIC_BLOCK) (insn 15 14 16 5 built-in-setjmp.c:17 (use (reg/f:HI 28 r28)) -1 (nil)) (insn 16 15 17 5 built-in-setjmp.c:17 (clobber (reg:HI 2 r2)) -1 (nil)) (insn 17 16 18 5 built-in-setjmp.c:17 (set (reg/f:HI 37 virtual-stack-vars) (reg/f:HI 28 r28)) -1 (nil)) (insn 18 17 19 5 built-in-setjmp.c:17 (clobber (reg/f:HI 28 r28)) -1 (nil)) (insn 19 18 20 5 built-in-setjmp.c:17 (asm_input/v () 0) -1 (nil)) ;; End of basic block 5 - ( 6) The second PROBLEM I noted is that gcc creates RTL for TWO receivers for a setjmp. One is naturally from expand_builtin_setjmp_receiver but there is then another one just after created by expand_nl_goto_receiver in stmt.c- whats all this about? Despite having two, both get optimised out! Andy
Re: Excess registers pushed - regs_ever_live not right way?
I gave up with DF and instead went through function tree argument to rediscover argument registers. It was then simple matter to exclude these from epilog/prolog registers saves/restores. The patches is posted - and seem quite portable to other targets. http://gcc.gnu.org/ml/gcc-patches/2008-03/msg00115.html best regards and thanks for help. Seongbae Park (???, ???) wrote: 2008/3/1 Andrew Hutchinson [EMAIL PROTECTED]: I'm am still struggling with a good solution that avoids unneeded saves of parameter registers. To solve problem all I need to know are the registers actually used for parameters. Since the caller assumes all of these are clobbered by callee - they should never need to be saved. I'm totally confused what is the problem here. I thought you were seeing extra callee-save register save/restore in prologue, but now it sounds like you're seeing extra caller-save register save/restore. Which one are you trying to solve, and what kind of target is this ? DF_REG_DEF_COUNT is showing 1 artificial def for all POTENTIAL parameter registers - not just the ones that are really used (since it uses target FUNCTION_ARG_REGNO_P to get parameter registers) You said you wanted to know if there's a def of a register within a function. For an incoming parameter, there will be one artificial def, and if there's no other def, it means there's no real def of the register within the function. So the DF artificial defs are useless in trying to find real parameter registers. I don't understand what you mean by this. What do you mean by real parameter register ? That seem to require going over all DF chains to work out which registers are externally defined. DF does not solve problem for me. What do you mean by externally defined ? DF may not solve the problem for you, but now I'm completely lost on what your problem is. There has got to be an easier way of finding parameter registers used by function. If you want to find all the uses (use as in reading a register but not writing to it), you should look at USE chain, not DEF chain, naturally. Seongbae
Re: Excess registers pushed - regs_ever_live not right way?
I'm am still struggling with a good solution that avoids unneeded saves of parameter registers. To solve problem all I need to know are the registers actually used for parameters. Since the caller assumes all of these are clobbered by callee - they should never need to be saved. DF_REG_DEF_COUNT is showing 1 artificial def for all POTENTIAL parameter registers - not just the ones that are really used (since it uses target FUNCTION_ARG_REGNO_P to get parameter registers) So the DF artificial defs are useless in trying to find real parameter registers. That seem to require going over all DF chains to work out which registers are externally defined. DF does not solve problem for me. There has got to be an easier way of finding parameter registers used by function. Ideas? Seongbae Park (박성배, 朴成培) wrote: You can use DF_REG_DEF_COUNT() - if this is indeed a parameter register, there should be only one def (artificial def) or no def at all. Or if you want to see all defs for the reg, follow DF_REG_DEF_CHAIN(). Seongbae On Wed, Feb 27, 2008 at 6:03 PM, Andrew Hutchinson [EMAIL PROTECTED] wrote: Register contains parameter that is passed to function. This register is not part of call used set. If this type of register were modified by function, then it would be saved by function. If this register is not modified by function, it should not be saved. This is true even if function is not a leaf function (as same register would be preserved by deeper calls) Andy Seongbae Park (박성배, 朴成培) wrote: On Wed, Feb 27, 2008 at 5:16 PM, Andrew Hutchinson [EMAIL PROTECTED] wrote: Register saves by prolog (pushes) are typically made with reference to df_regs_ever_live_p() or regs_ever_live. || If my understanding is correct, these calls reflect register USEs and not register DEFs. So if register is used in a function, but not otherwise changed, it will get pushed unnecessarily on stack by prolog. This implies that the register is either a global register or a parameter register, in either case it won't be saved/restored as callee save. What kind of a register is it and how com there's only use of it in a function but it's not a global ? Seongbae
Re: Excess registers pushed - regs_ever_live not right way?
Sorry terminology is fighting language! by parameter - I mean argument registers - also a stray use may have crept in. Original problem : prolog is saving live registers that are not call used following normal gcc methods. But in AVR target this will include some argument registers - as not all argument registers are call used. Function Argument registers do not need to be saved (since callee assumes they are always clobbered). To solve problem all I need to know is what registers really do contain function arguments. Then I can omit these from prolog saves and fix bug. DF does not tell me what registers contain function arguments. It marks all possible arguments registers with artificial def (which are known anyway). Unfortunately it is not as simple as counting defs as I had hoped. So I would then have to go through all chains for possible arguments to see if that external def is actually used inside function. This can not be shortcut by looking for just any use - or multiple defs as real argument registers can be re-use inside function. Is this conclusion correct? Andy Seongbae Park (박성배, 朴成培) wrote: 2008/3/1 Andrew Hutchinson [EMAIL PROTECTED]: I'm am still struggling with a good solution that avoids unneeded saves of parameter registers. To solve problem all I need to know are the registers actually used for parameters. Since the caller assumes all of these are clobbered by callee - they should never need to be saved. I'm totally confused what is the problem here. I thought you were seeing extra callee-save register save/restore in prologue, but now it sounds like you're seeing extra caller-save register save/restore. Which one are you trying to solve, and what kind of target is this ? DF_REG_DEF_COUNT is showing 1 artificial def for all POTENTIAL parameter registers - not just the ones that are really used (since it uses target FUNCTION_ARG_REGNO_P to get parameter registers) You said you wanted to know if there's a def of a register within a function. For an incoming parameter, there will be one artificial def, and if there's no other def, it means there's no real def of the register within the function. So the DF artificial defs are useless in trying to find real parameter registers. I don't understand what you mean by this. What do you mean by real parameter register ? That seem to require going over all DF chains to work out which registers are externally defined. DF does not solve problem for me. What do you mean by externally defined ? DF may not solve the problem for you, but now I'm completely lost on what your problem is. There has got to be an easier way of finding parameter registers used by function. If you want to find all the uses (use as in reading a register but not writing to it), you should look at USE chain, not DEF chain, naturally. Seongbae
Excess registers pushed - regs_ever_live not right way?
Register saves by prolog (pushes) are typically made with reference to df_regs_ever_live_p() or regs_ever_live. || If my understanding is correct, these calls reflect register USEs and not register DEFs. So if register is used in a function, but not otherwise changed, it will get pushed unnecessarily on stack by prolog. (as noted in this bug http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32871) I checked a couple of other ports but they all use df_regs_ever_live_p(). Indeed this is noted method in manual. The question is, what df routine or variable can be used to determine which registers are DEFs and hence destructively used by a function? Maybe: df_invalidated_by_call in conjunction with: df_get_call_refs perhaps() perhaps? Andy
Re: Excess registers pushed - regs_ever_live not right way?
Thanks I will check this. DF Dump in RTL file does not list Artificial defs - which is what I think I need. However, I do note that all potential parameter registers (including those unused) - are listed as invalidated by call. - which means 1 (or more) defs. So like you suggest I just need to find count. Andy Seongbae Park (박성배, 朴成培) wrote: You can use DF_REG_DEF_COUNT() - if this is indeed a parameter register, there should be only one def (artificial def) or no def at all. Or if you want to see all defs for the reg, follow DF_REG_DEF_CHAIN(). Seongbae On Wed, Feb 27, 2008 at 6:03 PM, Andrew Hutchinson [EMAIL PROTECTED] wrote: Register contains parameter that is passed to function. This register is not part of call used set. If this type of register were modified by function, then it would be saved by function. If this register is not modified by function, it should not be saved. This is true even if function is not a leaf function (as same register would be preserved by deeper calls) Andy Seongbae Park (박성배, 朴成培) wrote: On Wed, Feb 27, 2008 at 5:16 PM, Andrew Hutchinson [EMAIL PROTECTED] wrote: Register saves by prolog (pushes) are typically made with reference to df_regs_ever_live_p() or regs_ever_live. || If my understanding is correct, these calls reflect register USEs and not register DEFs. So if register is used in a function, but not otherwise changed, it will get pushed unnecessarily on stack by prolog. This implies that the register is either a global register or a parameter register, in either case it won't be saved/restored as callee save. What kind of a register is it and how com there's only use of it in a function but it's not a global ? Seongbae
Re: Finding out what backend instruction pattern matches instruction
Thank you greatly for the feedback I took at look at mips.md - and we already use the conditional length for some instruction. However, some effective AVR instruction lengths are vastly complicated by operands, length and addressing modes. (After all we emulate 16 and 32 bit operations with only 8 bit CPU) . So it takes a relatively complex set of logic to get true length correct. This logic is already present as c functions. (move and shifts being the more complex variety). We call these routines only for final adjustment - but have to figure out which to call based on the insn RTL . I fear that duplicating this logic in RTL patterns won't be very elegant or less error prone. As I say above, the logic already exists as c function. If we could call these directly to determine the attribute value, it would be much easier! For now, matching the name seems optimal. Andy Ian Lance Taylor wrote: Andrew Hutchinson [EMAIL PROTECTED] writes: The alternative, perhaps, would be to set each length attribute dynamically in each pattern - if that was possible. But that looks like way more work. That is certainly the best way. Search for length in mips.md for one example of how it can be done. Ian
Re: Segmentation fault in df-scan.c
Alas, enable-checking produced no different result or additional warnings or errors (though it might help me in the future!) I have a work around but don't fully understand why a define_expand should have caused segmentation fault. I believe the issue might be that gcse does not expect to see any POST_INC patterns in its first pass. (The RTL dump files show that is where it died.) A few are normally created by patterns - but perhaps almost all restricted to prolog/epilog. In my case, I used define_expand so it appears in very earliest RTL, in a normal block. Most POST_INC/DEC etc are created after gcse pass. (by auto-inc-dec pass of course). The expander used rtx tmp_reg_rtx = copy_to_mode_reg (QImode,gen_rtx_MEM (QImode,gen_rtx_POST_INC (HImode, addr1))); aka Rx= [Ry++] fails However,making this simpler works: rtx tmp_reg_rtx = copy_to_mode_reg (QImode,gen_rtx_MEM (QImode, addr1)); emit_move_insn (addr1, gen_rtx_PLUS (Pmode, addr1, const1_rtx)); aka Rx=[Ry] Ry=Ry+1 For now I have gone back to the second case, though the code is not quite as good. thanks again Andy
Finding out what backend instruction pattern matches instruction
I am working on AVR port and seek advice of the best way working out what instructions patterns have been natched to RTL. This requires adjustment of instruction length to assist branching - when operands are finally known. Before this, worst case lengths are used from pattern length attributes. At present, the ADJUST_INSN_LENGTH routine looks at the instruction RTL to figure out what pattern was matched, then calls the appropriate routine that can do the precise length calculation. The problem with this method is that this re-matching can easily be wrong. Great care is taken when additional backend patterns are used - or existing ones are re-arranged, or instruction length are calculated incorrectly. To get around this problem, I replaced this RTL checking with a simple lookup of the instruction name using name = get_insn_name (INSN_CODE (insn)); Then a simple string compare can be used to determine precisely what has been matched. It works fine, but is this an acceptable method ? The alternative, perhaps, would be to set each length attribute dynamically in each pattern - if that was possible. But that looks like way more work.
Segmentation fault in df-scan.c
While working on a Cygwin/AVR backend patch, I had segmentation fault occur in df-scan.c - which appears unrelated to target. I can't provide testcase as backend is modfied - but source was 2003-1.c It all happens in df_scan.c (Rev 130805 14 Dec 2007) df_ref_create_structure() trys to access EMPTY collection_rec-def_vec as type DF_REF_REG_DEF is being set by df_uses_record(), yet no space was allocated by df_noted_rescan() This appears to be a bug but seek your combined wisdom before filling a report: 1) emit-rtl (line 4647) calls df_notes_rescan (insn); 2) df_notes_rescan (line 2043) creates struct df_collection_rec collection_rec but does not allocate any storage for member def_vec then (line 2062) calls df_uses_record - related to usage of REG_EQUIV and REG_EQUAL notes 3) df_uses_record (line 2994) , calls df_ref_record (relate to recording definition for PRE_DEC..POST_MODIFY) - with type set as DF_REF_REG_DEF 5) df_ref_record calls df_ref_create_structure - which fails Below is stack dump and a few variables and RTX of insn printed out Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as i686-pc-cygwin... (gdb) source ./gdbini.in ./gdbini.in: No such file or directory. (gdb) source ./gdbinit.in Breakpoint 1 at 0x6268d6: file ../../gcc/gcc/diagnostic.c, line 660. Breakpoint 2 at 0x626863: file ../../gcc/gcc/diagnostic.c, line 604. Breakpoint 3 at 0xa77a20 Breakpoint 4 at 0xa77a10 (gdb) run -mmcu=atmega128 -g -w -O3 -DSTACK_SIZE=400 -da -DNO_TRAMPOLINES -fno-show-column -DSIGNAL_SUPPRESS -std=gnu99 200 3-1.c -o 2003-1.o Starting program: /cygdrive/e/awhconf/gcc/cc1.exe -mmcu=atmega128 -g -w -O3 -DSTACK_SIZE=400 -da -DNO_TRAMPOLINES -fno-show-col umn -DSIGNAL_SUPPRESS -std=gnu99 2003-1.c -o 2003-1.o Loaded symbols for /cygdrive/c/WINDOWS/system32/ntdll.dll Loaded symbols for /cygdrive/c/WINDOWS/system32/kernel32.dll Loaded symbols for /usr/bin/cygwin1.dll Loaded symbols for /cygdrive/c/WINDOWS/system32/advapi32.dll Loaded symbols for /cygdrive/c/WINDOWS/system32/rpcrt4.dll Loaded symbols for /usr/bin/cygiconv-2.dll foo baz bar main Analyzing compilation unit Performing interprocedural optimizations visibility early_local_cleanups inline static-var pure-constAssembling functions: bar foo baz main Program received signal SIGSEGV, Segmentation fault. 0x007a03de in df_ref_create_structure (collection_rec=0x22c840, reg=0x124, loc=0x7ff31b04, bb=0x7fec3c00, insn=0x7ff778a0, ref_type=DF_REF_REG_DEF, ref_flags=292) at ../../gcc/gcc/df-scan.c:2611 2611collection_rec-def_vec[collection_rec-next_def++] = this_ref; (gdb) where #0 0x007a03de in df_ref_create_structure (collection_rec=0x22c840, reg=0x124, loc=0x7ff31b04, bb=0x7fec3c00, insn=0x7ff778a0, ref_type=DF_REF_REG_DEF, ref_flags=292) at ../../gcc/gcc/df-scan.c:2611 #1 0x007a2d8a in df_uses_record (collection_rec=0x22c840, loc=0x0, ref_type=DF_REF_REG_MEM_LOAD, bb=0x7fec3c00, insn=0x7ff778a0, flags=DF_REF_IN_NOTE) at ../../gcc/gcc/df-scan.c:2994 #2 0x007a56db in df_notes_rescan (insn=0x7ff778a0) at ../../gcc/gcc/df-scan.c:2062 #3 0x004d3c91 in set_unique_reg_note (insn=0x7ff778a0, kind=REG_EQUAL, datum=0x7ff1e8f0) at ../../gcc/gcc/emit-rtl.c:4647 #4 0x005ce935 in try_replace_reg (from=0x7ff1d740, to=0x1bebbd8, insn=0x7ff778a0) at ../../gcc/gcc/gcse.c:2687 #5 0x005cef5d in constprop_register (insn=0x7ff778a0, from=0x7ff1d740, to=0x7ff319c8, alter_jumps=0 '\0') at ../../gcc/gcc/gcse.c:2904 #6 0x005cfdfc in one_cprop_pass (pass=1, cprop_jumps=0 '\0', bypass_jumps=0 '\0') at ../../gcc/gcc/gcse.c:2973 #7 0x005d5166 in rest_of_handle_gcse () at ../../gcc/gcc/gcse.c:722 #8 0x00621508 in execute_one_pass (pass=0xa79770) at ../../gcc/gcc/passes.c:1118 #9 0x006216ae in execute_pass_list (pass=0xa79350) at ../../gcc/gcc/passes.c:1171 #10 0x006216c1 in execute_pass_list (pass=0xa79630) at ../../gcc/gcc/passes.c:1172 #11 0x00848b4c in tree_rest_of_compilation (fndecl=0x7fdcf340) at ../../gcc/gcc/tree-optimize.c:404 #12 0x0062277b in cgraph_expand_function (node=0x7ff40480) at ../../gcc/gcc/cgraphunit.c:1151 #13 0x006243fe in cgraph_optimize () at ../../gcc/gcc/cgraphunit.c:1214 #14 0x0041aff7 in c_write_global_declarations () at ../../gcc/gcc/c-decl.c:8074 #15 0x006295e6 in toplev_main (argc=14, argv=0x1b91d60) at ../../gcc/gcc/toplev.c:1055 #16 0x004938da in main (argc=14, argv=0x1b91d60) at ../../gcc/gcc/main.c:35 (gdb) pr The history is empty. (gdb) print insn $1 = (rtx) 0x7ff778a0 (gdb) pr (insn 10 84 11 3 2003-1.c:36 (set (reg:QI 50) (mem:QI (post_inc:HI (reg:HI 48)) [0 S1 A8])) 8 {*movqi} (expr_list:REG_EQUAL (mem:QI (post_inc:HI (reg:HI 48)) [0 S1