[Bug target/50749] SH Target: Post-increment addressing used only for first memory access
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749 --- Comment #10 from Oleg Endo oleg.e...@t-online.de 2011-12-30 02:14:00 UTC --- (In reply to comment #9) (In reply to comment #8) Specifying -fno-tree-forwprop doesn't seem to have any effect on these cases. For that function, -fdump-tree-all shows that the tree loop ivopts optimization does it. Try -fno-tree-forwprop -fno-ivopts. This makes even worse code. Anyway, I think this issue should be addressed by the auto_inc_dec pass. If OK, I'd like to change it from target PR to middle-end PR.
[Bug target/50749] SH Target: Post-increment addressing used only for first memory access
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749 --- Comment #11 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-12-30 03:24:01 UTC --- (In reply to comment #10) If OK, I'd like to change it from target PR to middle-end PR. Sure.
[Bug target/50749] SH Target: Post-increment addressing used only for first memory access
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749 --- Comment #8 from Oleg Endo oleg.e...@t-online.de 2011-11-28 22:31:44 UTC --- (In reply to comment #7) The problem is that SH target can't do those simple array accesses well at QI/HImode because of the lack of displacement addressing for those modes. In these particular cases, this is true. Even if there was a working QI/HImode displacement addressing, it would most likely result in worse code compared to post-inc / pre-dec addressing opportunities, because of register pressure on R0. Apart from that there is no displacement addressing for FPU loads/stores, which also misses some opportunities: float test_func_5 (float* p, int c) { float r = 0; do { r += *p++; r += *p++; r += *p++; } while (--c); return r; } Compiled with -Os -m4-single: fldi0fr0 .L11: movr4,r1 fmov.s@r1+,fr1 dtr5 faddfr1,fr0 fmov.s@r1,fr1 movr4,r1 add#8,r1 add#12,r4 faddfr1,fr0 fmov.s@r1,fr1 bf/s.L11 faddfr1,fr0 rts nop ..could be: fldi0fr0 .L11: fmov.s@r4+,fr1 dtr5 faddfr1,fr0 fmov.s@r4+,fr1 faddfr1,fr0 fmov.s@r4+,fr1 bf/s.L11 faddfr1,fr0 rts nop Specifying -fno-tree-forwprop doesn't seem to have any effect on these cases.
[Bug target/50749] SH Target: Post-increment addressing used only for first memory access
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749 --- Comment #9 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-11-28 23:29:57 UTC --- (In reply to comment #8) Specifying -fno-tree-forwprop doesn't seem to have any effect on these cases. For that function, -fdump-tree-all shows that the tree loop ivopts optimization does it. Try -fno-tree-forwprop -fno-ivopts.
[Bug target/50749] SH Target: Post-increment addressing used only for first memory access
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749 --- Comment #5 from Oleg Endo oleg.e...@t-online.de 2011-10-30 12:36:51 UTC --- (In reply to comment #4) I'm a bit curious to see what happens if they are changed to non-zero for SI/DImode. ..not much actually. I've tried defining both as TARGET_SH1 and compared before/after CSiBE results. No code size changes at all. Also the micro test cases do not seem to be affected by that. Of course I might have missed some border case...
[Bug target/50749] SH Target: Post-increment addressing used only for first memory access
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749 --- Comment #6 from Oleg Endo oleg.e...@t-online.de 2011-10-30 13:53:51 UTC --- (In reply to comment #1) GCC makes usual mem accesses into those with post_inc/pre_dec at auto_inc_dec pass. I guess that auto_inc_dec pass can't find post_inc insns well in that case because other tree/rtl optimizers tweak the code already. If this is the case, the problem would be not target specific. Looks like so. For the simple test case... int test (char* p, int c) { int r = 0; r += *p++; r += *p++; r += *p++; return r; } ...the load insns are initially expanded as: (insn 2 5 3 2 (set (reg/v/f:SI 169 [ p ]) (reg:SI 4 r4 [ p ])) (nil)) (insn 7 6 8 3 (set (reg:SI 171) (sign_extend:SI (mem:QI (reg/v/f:SI 169 [ p ]) [0 *p_2(D)+0 S1 A8]))) (nil)) (insn 8 7 9 3 (set (reg:SI 172) (reg/v/f:SI 169 [ p ])) (nil)) (insn 9 8 10 3 (set (reg/f:SI 173) (plus:SI (reg/v/f:SI 169 [ p ]) (const_int 1 [0x1]))) (nil)) (insn 10 9 11 3 (set (reg:SI 174) (sign_extend:SI (mem:QI (reg/f:SI 173) [0 MEM[(char *)p_2(D) + 1B]+0 S1 A8]))) (nil)) (insn 13 12 14 3 (set (reg/f:SI 177) (plus:SI (reg/v/f:SI 169 [ p ]) (const_int 2 [0x2]))) (nil)) (insn 14 13 15 3 (set (reg:SI 178) (sign_extend:SI (mem:QI (reg/f:SI 177) [0 MEM[(char *)p_2(D) + 2B]+0 S1 A8]))) (nil)) The auto_inc_dec pass then fails to realize that (reg:SI 177) = (reg:SI 169) + 2 = (reg:SI 173) + 1. I wonder whether there might be something in the target code that suggests the early optimizers to do that? I've tried playing with the TARGET_ADDRESS_COST hook but it didn't have any effect in this case.
[Bug target/50749] SH Target: Post-increment addressing used only for first memory access
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749 --- Comment #7 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-10-30 23:36:27 UTC --- (In reply to comment #6) I wonder whether there might be something in the target code that suggests the early optimizers to do that? I've tried playing with the TARGET_ADDRESS_COST hook but it didn't have any effect in this case. -ftree-dump-all shows that forward propagation on ssa trees makes those memory accesses into simple array accesses. You can try -fno-tree-forwprop and see the effect of that option. It seems that there are no special knobs to control forwprop from the target side. The problem is that SH target can't do those simple array accesses well at QI/HImode because of the lack of displacement addressing for those modes.
[Bug target/50749] SH Target: Post-increment addressing used only for first memory access
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749 --- Comment #4 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-10-19 21:36:56 UTC --- (In reply to comment #3) USE_LOAD_POST_INCREMENT and USE_STORE_PRE_DECREMENT are used only in move_by_pieces which is for some block operations when MOVE_BY_PIECES_P says OK. They don't disable post_inc/pre_dec addressing for SI/DImode in general, I think. It seems that they are 0 for SI/DImode because we have addressing with display for a limited size of memory chunk in these modes, though I'm wrong about it. I'm a bit curious to see what happens if they are changed to non-zero for SI/DImode.
[Bug target/50749] SH Target: Post-increment addressing used only for first memory access
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749 --- Comment #3 from Oleg Endo oleg.e...@t-online.de 2011-10-19 00:00:01 UTC --- Kaz, do you happen to know why the following is defined in sh.h? #define USE_LOAD_POST_INCREMENT(mode)((mode == SImode || mode == DImode) \ ? 0 : TARGET_SH1) #define USE_STORE_PRE_DECREMENT(mode)((mode == SImode || mode == DImode) \ ? 0 : TARGET_SH1) Is there any (historical) reason to disable post-inc / pre-dec addressing for SImode / DImode? Thanks, Oleg
[Bug target/50749] SH Target: Post-increment addressing used only for first memory access
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749 --- Comment #1 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-10-16 23:33:40 UTC --- GCC makes usual mem accesses into those with post_inc/pre_dec at auto_inc_dec pass. I guess that auto_inc_dec pass can't find post_inc insns well in that case because other tree/rtl optimizers tweak the code already. If this is the case, the problem would be not target specific.
[Bug target/50749] SH Target: Post-increment addressing used only for first memory access
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749 --- Comment #2 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-10-17 00:32:39 UTC --- *** Bug 50750 has been marked as a duplicate of this bug. ***