Re: A strange code snippet: jump to a return instruction
On Thu, Jan 9, 2014 at 3:00 PM, xmeng wrote: > > Here is a strange code snippet in gcc.bin in version 4.7.0: > > 00402e20 <_ZL28if_exists_else_spec_functioniPPKc>: > 402e20: 31 c0 xor%eax,%eax > 402e22: 83 ff 02cmp$0x2,%edi > 402e25: 75 11 jne402e38 > 402e27: 53 push %rbx > 402e28: 48 8b 3emov(%rsi),%rdi > 402e2b: 48 89 f3mov%rsi,%rbx > 402e2e: 80 3f 2fcmpb $0x2f,(%rdi) > 402e31: 74 0d je 402e40 > 402e33: 48 8b 43 08 mov0x8(%rbx),%rax > 402e37: 5b pop%rbx > 402e38: f3 c3 repz retq > 402e3a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) > 402e40: be 04 00 00 00 mov$0x4,%esi > 402e45: e8 3e fa ff ff callq 402888 > 402e4a: 85 c0 test %eax,%eax > 402e4c: 75 e5 jne402e33 > 402e4e: 48 8b 03mov(%rbx),%rax > 402e51: 5b pop%rbx > 402e52: eb e4 jmp402e38 > 402e54: 66 66 66 2e 0f 1f 84data32 data32 nopw > %cs:0x0(%rax,%rax,1) > 402e5b: 00 00 00 00 00 > > The last instruction of this function is a two bytes jump "jmp 402e38". It > jumps to a two bytes return "repz retq". Why not just emit a two bytes > return at the end of the function, instead we jump to the return? > > I actually find similar "jump to a return" snippets in every version from > 4.7.0 to 4.8.2, but I don't find any such case for 4.6 or prior. > > Is there any reason for emitting such code snippets? I doubt it. Please file a bug report with a test case as described at http://gcc.gnu.org/bugs/ . Thanks. Ian
A strange code snippet: jump to a return instruction
Hi, Here is a strange code snippet in gcc.bin in version 4.7.0: 00402e20 <_ZL28if_exists_else_spec_functioniPPKc>: 402e20: 31 c0 xor%eax,%eax 402e22: 83 ff 02cmp$0x2,%edi 402e25: 75 11 jne402e38 402e27: 53 push %rbx 402e28: 48 8b 3emov(%rsi),%rdi 402e2b: 48 89 f3mov%rsi,%rbx 402e2e: 80 3f 2fcmpb $0x2f,(%rdi) 402e31: 74 0d je 402e40 402e33: 48 8b 43 08 mov0x8(%rbx),%rax 402e37: 5b pop%rbx 402e38: f3 c3 repz retq 402e3a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 402e40: be 04 00 00 00 mov$0x4,%esi 402e45: e8 3e fa ff ff callq 402888 402e4a: 85 c0 test %eax,%eax 402e4c: 75 e5 jne402e33 402e4e: 48 8b 03mov(%rbx),%rax 402e51: 5b pop%rbx 402e52: eb e4 jmp402e38 402e54: 66 66 66 2e 0f 1f 84data32 data32 nopw %cs:0x0(%rax,%rax,1) 402e5b: 00 00 00 00 00 The last instruction of this function is a two bytes jump "jmp 402e38". It jumps to a two bytes return "repz retq". Why not just emit a two bytes return at the end of the function, instead we jump to the return? I actually find similar "jump to a return" snippets in every version from 4.7.0 to 4.8.2, but I don't find any such case for 4.6 or prior. Is there any reason for emitting such code snippets? Thanks --Xiaozhu
gcc-4.8-20140109 is now available
Snapshot gcc-4.8-20140109 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.8-20140109/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.8 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_8-branch revision 206499 You'll find: gcc-4.8-20140109.tar.bz2 Complete GCC MD5=54e5f3043dad049d00e560783e942c58 SHA1=d8218d6660dd5ca69f841e7eb8452a64a4b9cad4 Diffs from 4.8-20140102 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.8 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
RE: Infinite number of iterations in loop [v850, mep]
> -Original Message- > From: Richard Biener [mailto:richard.guent...@gmail.com] > Sent: 08 January 2014 14:42 > To: Paulo Matos > Cc: Andrew Haley; gcc@gcc.gnu.org; Jan Hubicka > Subject: Re: Infinite number of iterations in loop [v850, mep] > > On Wed, Jan 8, 2014 at 3:09 PM, Paulo Matos wrote: > >> -Original Message- > >> From: Richard Biener [mailto:richard.guent...@gmail.com] > >> Sent: 08 January 2014 11:03 > >> To: Paulo Matos > >> Cc: Andrew Haley; gcc@gcc.gnu.org > >> Subject: Re: Infinite number of iterations in loop [v850, mep] > >> > >> That was refering to the case with extern b. For the above case the > >> issue must be sth else. Trying a cross to v850-elf to see if it > >> reproduces for me (if 'b' is a stack or argument slot then we might > >> bogously think that *c++ = 0 may clobber it, otherwise RTL > >> number of iteration analysis might just be confused). > >> > >> So for example (should be arch independent) > >> > >> struct X { int i; int j; int k; int l[24]; }; > >> > >> int foo (struct X x, int *p) > >> { > >> int z = x.j; > >> *p = 1; > >> return z; > >> } > >> > >> see if there is a anti-dependence between x.j and *p on the RTL side > >> (at least the code dispatching to the tree oracle using the MEM_EXPRs > >> should save you from that). > >> > >> So - v850 at least doesn't pass b in memory and the doloop recognition > >> works for me (on trunk). > >> > > > > You are right, everything is fine with the above example regarding the anti- > dependence and with the loop as well. I got confused with mine not generating > a > loop for > > void fn1 (unsigned int b) > > { > > unsigned int a; > > for (a = 0; a < b; a++) > > *c++ = 0; > > } > > > > but that simply because in our case it is not profitable. > > > > However, for the case: > > void matrix_add_const(unsigned int N, short *A, short val) { > > unsigned int i,j; > > for (i=0; i > for (j=0; j >A[i*N+j] += val; > > } > > } > > } > > > > GCC thinks for v850 and my port that the inner loop might be infinite. > > It looks like GCC is mangling the loop so much that the obviousness that the > inner loop is finite is lost. > > > > This however turns out to be very performance degrading. Using -fno-ivopts > makes generation of loops work again both in my port and v850. > > Is there a way to fine-tune ivopts besides trying to tune the costs or do > > you > reckon this is something iv-analysis should be smarter about? > > Well. We have > > Loop 2 is simple: > simple exit 5 -> 7 > infinite if: (expr_list:REG_DEP_TRUE (and:SI (reg:SI 76) > (const_int 1 [0x1])) > (nil)) > number of iterations: (lshiftrt:SI (plus:SI (minus:SI (reg:SI 68 [ D.1398 ]) > (reg:SI 64 [ ivtmp___6 ])) > (const_int -2 [0xfffe])) > (const_int 1 [0x1])) > upper bound: 2147483646 > realistic bound: -1 > Doloop: Possible infinite iteration case. > Doloop: The loop is not suitable. > > as we replaced the induction variable by a pointer induction with > step 2. So this might be a very common issue for RTL loop opts, > the upper bound of the IV is 2 * N in this case, so 2 * N & 1 > should be always false and thus "infinite" be optimized. > > (insn 34 33 36 3 (parallel [ > (set (reg:SI 76) > (plus:SI (reg/v:SI 71 [ N ]) > (reg/v:SI 71 [ N ]))) > (clobber (reg:CC 32 psw)) > ]) 21 {addsi3} > (expr_list:REG_UNUSED (reg:CC 32 psw) > (nil))) > > that doesn't look too difficult to do with the above definition. > nonzero_bits might be of use here, not sure (not my area of > expertise). > I would like some comments on the following patch that seems to work but I think it could be generalized. The idea is for the specific infinite condition of type (and reg int), we can search for the definition of reg, check nonzero_bits and check that they don't match any of the bits in int. diff --git a/gcc/loop-iv.c b/gcc/loop-iv.c index 4c34007..215fd22 100644 --- a/gcc/loop-iv.c +++ b/gcc/loop-iv.c @@ -2064,6 +2064,50 @@ simplify_using_initial_values (struct loop *loop, enum rtx_code op, rtx *expr) e = single_pred_edge (e->src); } + /* For certain patterns we can do even better, like (and (reg) 1). */ + if (GET_CODE (*expr) == AND + && REG_P (XEXP (*expr, 0)) + && CONST_INT_P (XEXP (*expr, 1))) +{ + rtx reg = XEXP (*expr, 0); + unsigned HOST_WIDE_INT mask = INTVAL (XEXP (*expr, 1)); + rtx insn_def = NULL_RTX; + basic_block bb = loop_preheader_edge (loop)->src; + + while (1) + { + rtx insn; + + if (bb == ENTRY_BLOCK_PTR) + break; + + FOR_BB_INSNS_REVERSE (bb, insn) + { + if (!INSN_P (insn)) + break; + if (df_reg_defined (insn, reg)) + { + insn_def = insn; + break; + } + } + + if (insn_def)
Generating minimum libstdc++ symbols for a new platform
Hi, It was recently pointed out to me that our new powerpc64le-linux-gnu target does not yet have a corresponding directory in libstdc ++-v3/config/abi/post/ to hold a baseline_symbols.txt for the platform. I've been looking around and haven't found any documentation for how the minimum baseline symbols file should be generated. Can someone please enlighten me about the process? Thanks, Bill
Re: Still fails with strict-volatile-bitfields
On 09/01/14 08:26, Bernd Edlinger wrote: > Hi, > > On Thu, 9 Jan 2014 15:01:54, Yoey Ye wrote: >> >> Sandra, Bernd, >> >> Can you take a look at >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59734 >> >> It seems a siimple case still doesn't work as expected. Did I miss anything? >> >> Thanks, >> Joey > > Yes, > > this is a major case where the C++ memory model is > in conflict with AAPCS. > Does the compiler warn about this? And if so, is the warning on by default? I think it's essential that we don't have quiet changes in behaviour without such warnings. R. > You can get the expected code, by changing the struct > like this: > > struct str { > volatile unsigned f1: 8; > unsigned dummy:24; > }; > > If it is written this way the C++ memory model allows > QImode, HImode, SImode. And -fstrict-volatile-bitfields > demands SImode, so the conflict is resolved. This dummy > member makes only a difference on the C level, and is > completely invisible in the generated code. > > If -fstrict-volatile-bitfields is now given, we use SImode, > if -fno-strict-volatile-bitfields is given, we give GCC the > freedom to choose the access mode, maybe QImode if that is > faster. > > In the _very_ difficult process to find an solution > that seems to be acceptable to all maintainers, we came to > the solution, that we need to adhere to the C++ memory > model by default. And we may not change the default > setting of -fstruct-volatile-bitfields at the same time! > > As a future extension we discussed the possibility > to add a new option -fstrict-volatile-bitfields=aapcs > that explicitly allows us to break the C++ memory model. > > But I did not yet try to implement this, as I feel that > would certainly not be accepted as we are in Phase3 now. > > As another future extension there was the discussion > about the -Wportable-volatility warning, which I see now > as a warning that analyzes the structure layout and > warns about any structures that are not "well-formed", > in the sense, that a bit-field fails to define all > bits of the container. > > Those people that do use bit-fields to access device-registers > do always define all bits, and of course in the same mode. > > It would be good to have a warning, when some bits are missing. > They currently have to use great care to check their structures > manually. > > I had a proposal for that warning but that concentrated > only on the volatile attribute, but I will have to re-write > that completely so that it can be done in stor-layout.c: > > It should warn independent of optimization levels or actual > bitfield member references, thus, be implemented entirely at > the time we lay out the structure. The well-formed-ness of > a bit-field makes that possible. > > But that will come too late for Phase3 as well. > > > Regards > Bernd. > >
Re: proposal to make SIZE_TYPE more flexible
On Wed, 8 Jan 2014, DJ Delorie wrote: > So... OK if __int20 and __int128 keywords exist always (for ports that > request them, which for __int128 would be all of them), but still be > "unsupported" if for some reason the port doesn't support them because > of command line options? That seems reasonable. -- Joseph S. Myers jos...@codesourcery.com
RE: Infinite number of iterations in loop [v850, mep]
> -Original Message- > From: Richard Biener [mailto:richard.guent...@gmail.com] > Sent: 08 January 2014 14:42 > To: Paulo Matos > Cc: Andrew Haley; gcc@gcc.gnu.org; Jan Hubicka > Subject: Re: Infinite number of iterations in loop [v850, mep] > > Well. We have > > Loop 2 is simple: > simple exit 5 -> 7 > infinite if: (expr_list:REG_DEP_TRUE (and:SI (reg:SI 76) > (const_int 1 [0x1])) > (nil)) > number of iterations: (lshiftrt:SI (plus:SI (minus:SI (reg:SI 68 [ D.1398 ]) > (reg:SI 64 [ ivtmp___6 ])) > (const_int -2 [0xfffe])) > (const_int 1 [0x1])) > upper bound: 2147483646 > realistic bound: -1 > Doloop: Possible infinite iteration case. > Doloop: The loop is not suitable. > > as we replaced the induction variable by a pointer induction with > step 2. So this might be a very common issue for RTL loop opts, > the upper bound of the IV is 2 * N in this case, so 2 * N & 1 > should be always false and thus "infinite" be optimized. > > (insn 34 33 36 3 (parallel [ > (set (reg:SI 76) > (plus:SI (reg/v:SI 71 [ N ]) > (reg/v:SI 71 [ N ]))) > (clobber (reg:CC 32 psw)) > ]) 21 {addsi3} > (expr_list:REG_UNUSED (reg:CC 32 psw) > (nil))) > > that doesn't look too difficult to do with the above definition. > nonzero_bits might be of use here, not sure (not my area of > expertise). > I am trying to do something that shouldn't be too hard with the current df infrastructure but I don't think I am doing it the right way. Once I have the assumption (and:SI (reg:SI 76) (const_int 1 [0x1])) I need to reach the definition of reg 76 which is insn 34. Generally we can only do this if there if no other definition of reg 76 except one in a dominator basic block. The CFG looks like: BB2 / \ BB3 BB7 | \ BB6exit \ BB4 - /\ BB5 BB9 | \ | BB8 BB7 (exit)BB4 (loop) | BB6 (loop) BB3 contains insn 34 and there's no other definition of reg 76 in loop BB6->BB4->BB5->BB8 or the inner BB4->BB9. Is there a way to do this search for definition of reg 76 automatically? I can see a df_find_def, however this requires me to have insn 34 already. I need to search for it. Is there anything in GCC to do this already? I am sure GCC must be doing this already somewhere. Cheers, Paulo Matos > Richard. >
RE: Still fails with strict-volatile-bitfields
Hi, On Thu, 9 Jan 2014 15:01:54, Yoey Ye wrote: > > Sandra, Bernd, > > Can you take a look at > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59734 > > It seems a siimple case still doesn't work as expected. Did I miss anything? > > Thanks, > Joey Yes, this is a major case where the C++ memory model is in conflict with AAPCS. You can get the expected code, by changing the struct like this: struct str { volatile unsigned f1: 8; unsigned dummy:24; }; If it is written this way the C++ memory model allows QImode, HImode, SImode. And -fstrict-volatile-bitfields demands SImode, so the conflict is resolved. This dummy member makes only a difference on the C level, and is completely invisible in the generated code. If -fstrict-volatile-bitfields is now given, we use SImode, if -fno-strict-volatile-bitfields is given, we give GCC the freedom to choose the access mode, maybe QImode if that is faster. In the _very_ difficult process to find an solution that seems to be acceptable to all maintainers, we came to the solution, that we need to adhere to the C++ memory model by default. And we may not change the default setting of -fstruct-volatile-bitfields at the same time! As a future extension we discussed the possibility to add a new option -fstrict-volatile-bitfields=aapcs that explicitly allows us to break the C++ memory model. But I did not yet try to implement this, as I feel that would certainly not be accepted as we are in Phase3 now. As another future extension there was the discussion about the -Wportable-volatility warning, which I see now as a warning that analyzes the structure layout and warns about any structures that are not "well-formed", in the sense, that a bit-field fails to define all bits of the container. Those people that do use bit-fields to access device-registers do always define all bits, and of course in the same mode. It would be good to have a warning, when some bits are missing. They currently have to use great care to check their structures manually. I had a proposal for that warning but that concentrated only on the volatile attribute, but I will have to re-write that completely so that it can be done in stor-layout.c: It should warn independent of optimization levels or actual bitfield member references, thus, be implemented entirely at the time we lay out the structure. The well-formed-ness of a bit-field makes that possible. But that will come too late for Phase3 as well. Regards Bernd.