Re: A strange code snippet: jump to a return instruction

2014-01-09 Thread Ian Lance Taylor
On Thu, Jan 9, 2014 at 3:00 PM, xmeng  wrote:
>
> Here is a strange code snippet in gcc.bin in version 4.7.0:
>
> 00402e20 <_ZL28if_exists_else_spec_functioniPPKc>:
>   402e20:   31 c0   xor%eax,%eax
>   402e22:   83 ff 02cmp$0x2,%edi
>   402e25:   75 11   jne402e38
>   402e27:   53  push   %rbx
>   402e28:   48 8b 3emov(%rsi),%rdi
>   402e2b:   48 89 f3mov%rsi,%rbx
>   402e2e:   80 3f 2fcmpb   $0x2f,(%rdi)
>   402e31:   74 0d   je 402e40
>   402e33:   48 8b 43 08 mov0x8(%rbx),%rax
>   402e37:   5b  pop%rbx
>   402e38:   f3 c3   repz retq
>   402e3a:   66 0f 1f 44 00 00   nopw   0x0(%rax,%rax,1)
>   402e40:   be 04 00 00 00  mov$0x4,%esi
>   402e45:   e8 3e fa ff ff  callq  402888 
>   402e4a:   85 c0   test   %eax,%eax
>   402e4c:   75 e5   jne402e33
>   402e4e:   48 8b 03mov(%rbx),%rax
>   402e51:   5b  pop%rbx
>   402e52:   eb e4   jmp402e38
>   402e54:   66 66 66 2e 0f 1f 84data32 data32 nopw
> %cs:0x0(%rax,%rax,1)
>   402e5b:   00 00 00 00 00
>
> The last instruction of this function is a two bytes jump "jmp 402e38". It
> jumps to a two bytes return "repz retq". Why not just emit a two bytes
> return at the end of the function, instead we jump to the return?
>
> I actually find similar "jump to a return" snippets in every version from
> 4.7.0 to 4.8.2, but I don't find any such case for 4.6 or prior.
>
> Is there any reason for emitting such code snippets?

I doubt it.  Please file a bug report with a test case as described at
http://gcc.gnu.org/bugs/ .  Thanks.

Ian


A strange code snippet: jump to a return instruction

2014-01-09 Thread xmeng

Hi,

Here is a strange code snippet in gcc.bin in version 4.7.0:

00402e20 <_ZL28if_exists_else_spec_functioniPPKc>:
  402e20:   31 c0   xor%eax,%eax
  402e22:   83 ff 02cmp$0x2,%edi
  402e25:   75 11   jne402e38
  402e27:   53  push   %rbx
  402e28:   48 8b 3emov(%rsi),%rdi
  402e2b:   48 89 f3mov%rsi,%rbx
  402e2e:   80 3f 2fcmpb   $0x2f,(%rdi)
  402e31:   74 0d   je 402e40
  402e33:   48 8b 43 08 mov0x8(%rbx),%rax
  402e37:   5b  pop%rbx
  402e38:   f3 c3   repz retq
  402e3a:   66 0f 1f 44 00 00   nopw   0x0(%rax,%rax,1)
  402e40:   be 04 00 00 00  mov$0x4,%esi
  402e45:   e8 3e fa ff ff  callq  402888 
  402e4a:   85 c0   test   %eax,%eax
  402e4c:   75 e5   jne402e33
  402e4e:   48 8b 03mov(%rbx),%rax
  402e51:   5b  pop%rbx
  402e52:   eb e4   jmp402e38
  402e54:   66 66 66 2e 0f 1f 84data32 data32 nopw 
%cs:0x0(%rax,%rax,1)

  402e5b:   00 00 00 00 00

The last instruction of this function is a two bytes jump "jmp 402e38". 
It jumps to a two bytes return "repz retq". Why not just emit a two 
bytes return at the end of the function, instead we jump to the return?


I actually find similar "jump to a return" snippets in every version 
from 4.7.0 to 4.8.2, but I don't find any such case for 4.6 or prior.


Is there any reason for emitting such code snippets?

Thanks

--Xiaozhu


gcc-4.8-20140109 is now available

2014-01-09 Thread gccadmin
Snapshot gcc-4.8-20140109 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.8-20140109/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.8 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_8-branch 
revision 206499

You'll find:

 gcc-4.8-20140109.tar.bz2 Complete GCC

  MD5=54e5f3043dad049d00e560783e942c58
  SHA1=d8218d6660dd5ca69f841e7eb8452a64a4b9cad4

Diffs from 4.8-20140102 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.8
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


RE: Infinite number of iterations in loop [v850, mep]

2014-01-09 Thread Paulo Matos
> -Original Message-
> From: Richard Biener [mailto:richard.guent...@gmail.com]
> Sent: 08 January 2014 14:42
> To: Paulo Matos
> Cc: Andrew Haley; gcc@gcc.gnu.org; Jan Hubicka
> Subject: Re: Infinite number of iterations in loop [v850, mep]
> 
> On Wed, Jan 8, 2014 at 3:09 PM, Paulo Matos  wrote:
> >> -Original Message-
> >> From: Richard Biener [mailto:richard.guent...@gmail.com]
> >> Sent: 08 January 2014 11:03
> >> To: Paulo Matos
> >> Cc: Andrew Haley; gcc@gcc.gnu.org
> >> Subject: Re: Infinite number of iterations in loop [v850, mep]
> >>
> >> That was refering to the case with extern b.  For the above case the
> >> issue must be sth else.  Trying a cross to v850-elf to see if it
> >> reproduces for me (if 'b' is a stack or argument slot then we might
> >> bogously think that *c++ = 0 may clobber it, otherwise RTL
> >> number of iteration analysis might just be confused).
> >>
> >> So for example (should be arch independent)
> >>
> >> struct X { int i; int j; int k; int l[24]; };
> >>
> >> int foo (struct X x, int *p)
> >> {
> >>   int z = x.j;
> >>   *p = 1;
> >>   return z;
> >> }
> >>
> >> see if there is a anti-dependence between x.j and *p on the RTL side
> >> (at least the code dispatching to the tree oracle using the MEM_EXPRs
> >> should save you from that).
> >>
> >> So - v850 at least doesn't pass b in memory and the doloop recognition
> >> works for me (on trunk).
> >>
> >
> > You are right, everything is fine with the above example regarding the anti-
> dependence and with the loop as well. I got confused with mine not generating 
> a
> loop for
> > void fn1 (unsigned int b)
> > {
> >   unsigned int a;
> >   for (a = 0; a < b; a++)
> > *c++ = 0;
> > }
> >
> > but that simply because in our case it is not profitable.
> >
> > However, for the case:
> > void matrix_add_const(unsigned int N, short *A, short val) {
> >  unsigned int i,j;
> >  for (i=0; i >   for (j=0; j >A[i*N+j] += val;
> >   }
> >  }
> > }
> >
> > GCC thinks for v850 and my port that the inner loop might be infinite.
> > It looks like GCC is mangling the loop so much that the obviousness that the
> inner loop is finite is lost.
> >
> > This however turns out to be very performance degrading. Using -fno-ivopts
> makes generation of loops work again both in my port and v850.
> > Is there a way to fine-tune ivopts besides trying to tune the costs or do 
> > you
> reckon this is something iv-analysis should be smarter about?
> 
> Well.  We have
> 
> Loop 2 is simple:
>   simple exit 5 -> 7
>   infinite if: (expr_list:REG_DEP_TRUE (and:SI (reg:SI 76)
> (const_int 1 [0x1]))
> (nil))
>   number of iterations: (lshiftrt:SI (plus:SI (minus:SI (reg:SI 68 [ D.1398 ])
> (reg:SI 64 [ ivtmp___6 ]))
> (const_int -2 [0xfffe]))
> (const_int 1 [0x1]))
>   upper bound: 2147483646
>   realistic bound: -1
> Doloop: Possible infinite iteration case.
> Doloop: The loop is not suitable.
> 
> as we replaced the induction variable by a pointer induction with
> step 2.  So this might be a very common issue for RTL loop opts,
> the upper bound of the IV is 2 * N in this case, so 2 * N & 1
> should be always false and thus "infinite" be optimized.
> 
> (insn 34 33 36 3 (parallel [
> (set (reg:SI 76)
> (plus:SI (reg/v:SI 71 [ N ])
> (reg/v:SI 71 [ N ])))
> (clobber (reg:CC 32 psw))
> ]) 21 {addsi3}
>  (expr_list:REG_UNUSED (reg:CC 32 psw)
> (nil)))
> 
> that doesn't look too difficult to do with the above definition.
> nonzero_bits might be of use here, not sure (not my area of
> expertise).
> 

I would like some comments on the following patch that seems to work but I 
think it could be generalized.
The idea is for the specific infinite condition of type (and reg int), we can 
search for the definition of reg,
check nonzero_bits and check that they don't match any of the bits in int.

diff --git a/gcc/loop-iv.c b/gcc/loop-iv.c
index 4c34007..215fd22 100644
--- a/gcc/loop-iv.c
+++ b/gcc/loop-iv.c
@@ -2064,6 +2064,50 @@ simplify_using_initial_values (struct loop *loop, enum 
rtx_code op, rtx *expr)
   e = single_pred_edge (e->src);
 }
 
+  /* For certain patterns we can do even better, like (and (reg) 1).  */
+  if (GET_CODE (*expr) == AND
+  && REG_P (XEXP (*expr, 0))
+  && CONST_INT_P (XEXP (*expr, 1)))
+{
+  rtx reg = XEXP (*expr, 0);
+  unsigned HOST_WIDE_INT mask = INTVAL (XEXP (*expr, 1));
+  rtx insn_def = NULL_RTX;
+  basic_block bb = loop_preheader_edge (loop)->src;
+
+  while (1)
+   {
+ rtx insn;
+
+ if (bb == ENTRY_BLOCK_PTR)
+   break;
+
+ FOR_BB_INSNS_REVERSE (bb, insn)
+   {
+ if (!INSN_P (insn))
+   break;
+ if (df_reg_defined (insn, reg))
+   {
+ insn_def = insn;
+ break;
+   }
+   }
+
+ if (insn_def)

Generating minimum libstdc++ symbols for a new platform

2014-01-09 Thread Bill Schmidt
Hi,

It was recently pointed out to me that our new powerpc64le-linux-gnu
target does not yet have a corresponding directory in libstdc
++-v3/config/abi/post/ to hold a baseline_symbols.txt for the platform.
I've been looking around and haven't found any documentation for how the
minimum baseline symbols file should be generated.  Can someone please
enlighten me about the process?

Thanks,
Bill



Re: Still fails with strict-volatile-bitfields

2014-01-09 Thread Richard Earnshaw
On 09/01/14 08:26, Bernd Edlinger wrote:
> Hi,
> 
> On Thu, 9 Jan 2014 15:01:54, Yoey Ye wrote:
>>
>> Sandra, Bernd,
>>
>> Can you take a look at
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59734
>>
>> It seems a siimple case still doesn't work as expected. Did I miss anything?
>>
>> Thanks,
>> Joey
> 
> Yes,
> 
> this is a major case where the C++ memory model is
> in conflict with AAPCS.
> 

Does the compiler warn about this?  And if so, is the warning on by
default?  I think it's essential that we don't have quiet changes in
behaviour without such warnings.

R.

> You can get the expected code, by changing the struct
> like this:
> 
> struct str {
>   volatile unsigned f1: 8;
>   unsigned dummy:24;
> };
> 
> If it is written this way the C++ memory model allows
> QImode, HImode, SImode. And -fstrict-volatile-bitfields
> demands SImode, so the conflict is resolved. This dummy
> member makes only a difference on the C level, and is
> completely invisible in the generated code.
> 
> If -fstrict-volatile-bitfields is now given, we use SImode,
> if -fno-strict-volatile-bitfields is given, we give GCC the
> freedom to choose the access mode, maybe QImode if that is
> faster.
> 
> In the _very_ difficult process to find an solution
> that seems to be acceptable to all maintainers, we came to
> the solution, that we need to adhere to the C++ memory
> model by default. And we may not change the default
> setting of -fstruct-volatile-bitfields at the same time!
> 
> As a future extension we discussed the possibility
> to add a new option -fstrict-volatile-bitfields=aapcs
> that explicitly allows us to break the C++ memory model.
> 
> But I did not yet try to implement this, as I feel that
> would certainly not be accepted as we are in Phase3 now.
> 
> As another future extension there was the discussion
> about the -Wportable-volatility warning, which I see now
> as a warning that analyzes the structure layout and
> warns about any structures that are not "well-formed",
> in the sense, that a bit-field fails to define all
> bits of the container.
> 
> Those people that do use bit-fields to access device-registers
> do always define all bits, and of course in the same mode.
> 
> It would be good to have a warning, when some bits are missing.
> They currently have to use great care to check their structures
> manually.
> 
> I had a proposal for that warning but that concentrated
> only on the volatile attribute, but I will have to re-write
> that completely so that it can be done in stor-layout.c:
> 
> It should warn independent of optimization levels or actual
> bitfield member references, thus, be implemented entirely at
> the time we lay out the structure. The well-formed-ness of
> a bit-field makes that possible.
> 
> But that will come too late for Phase3 as well.
> 
> 
> Regards
> Bernd.  
> 
> 




Re: proposal to make SIZE_TYPE more flexible

2014-01-09 Thread Joseph S. Myers
On Wed, 8 Jan 2014, DJ Delorie wrote:

> So... OK if __int20 and __int128 keywords exist always (for ports that
> request them, which for __int128 would be all of them), but still be
> "unsupported" if for some reason the port doesn't support them because
> of command line options?

That seems reasonable.

-- 
Joseph S. Myers
jos...@codesourcery.com


RE: Infinite number of iterations in loop [v850, mep]

2014-01-09 Thread Paulo Matos
> -Original Message-
> From: Richard Biener [mailto:richard.guent...@gmail.com]
> Sent: 08 January 2014 14:42
> To: Paulo Matos
> Cc: Andrew Haley; gcc@gcc.gnu.org; Jan Hubicka
> Subject: Re: Infinite number of iterations in loop [v850, mep]
> 
> Well.  We have
> 
> Loop 2 is simple:
>   simple exit 5 -> 7
>   infinite if: (expr_list:REG_DEP_TRUE (and:SI (reg:SI 76)
> (const_int 1 [0x1]))
> (nil))
>   number of iterations: (lshiftrt:SI (plus:SI (minus:SI (reg:SI 68 [ D.1398 ])
> (reg:SI 64 [ ivtmp___6 ]))
> (const_int -2 [0xfffe]))
> (const_int 1 [0x1]))
>   upper bound: 2147483646
>   realistic bound: -1
> Doloop: Possible infinite iteration case.
> Doloop: The loop is not suitable.
> 
> as we replaced the induction variable by a pointer induction with
> step 2.  So this might be a very common issue for RTL loop opts,
> the upper bound of the IV is 2 * N in this case, so 2 * N & 1
> should be always false and thus "infinite" be optimized.
> 
> (insn 34 33 36 3 (parallel [
> (set (reg:SI 76)
> (plus:SI (reg/v:SI 71 [ N ])
> (reg/v:SI 71 [ N ])))
> (clobber (reg:CC 32 psw))
> ]) 21 {addsi3}
>  (expr_list:REG_UNUSED (reg:CC 32 psw)
> (nil)))
> 
> that doesn't look too difficult to do with the above definition.
> nonzero_bits might be of use here, not sure (not my area of
> expertise).
> 

I am trying to do something that shouldn't be too hard with the current df 
infrastructure but I don't think I am doing it the right way.
Once I have the assumption (and:SI (reg:SI 76) (const_int 1 [0x1]))
I need to reach the definition of reg 76 which is insn 34. Generally we can 
only do this if there if no other definition of reg 76 except one in a 
dominator basic block.
The CFG looks like:
  BB2
 /   \
   BB3   BB7
| \
  BB6exit 
\
BB4 -
/\
 BB5 BB9
 |  \ |
BB8 BB7 (exit)BB4 (loop)
 |
BB6 (loop)

BB3 contains insn 34 and there's no other definition of reg 76 in loop 
BB6->BB4->BB5->BB8 or the inner BB4->BB9.
Is there a way to do this search for definition of reg 76 automatically? I can 
see a df_find_def, however this requires me to have insn 34 already. I need to 
search for it. Is there anything in GCC to do this already?

I am sure GCC must be doing this already somewhere.

Cheers,

Paulo Matos



> Richard.
> 



RE: Still fails with strict-volatile-bitfields

2014-01-09 Thread Bernd Edlinger
Hi,

On Thu, 9 Jan 2014 15:01:54, Yoey Ye wrote:
>
> Sandra, Bernd,
>
> Can you take a look at
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59734
>
> It seems a siimple case still doesn't work as expected. Did I miss anything?
>
> Thanks,
> Joey

Yes,

this is a major case where the C++ memory model is
in conflict with AAPCS.

You can get the expected code, by changing the struct
like this:

struct str {
  volatile unsigned f1: 8;
  unsigned dummy:24;
};

If it is written this way the C++ memory model allows
QImode, HImode, SImode. And -fstrict-volatile-bitfields
demands SImode, so the conflict is resolved. This dummy
member makes only a difference on the C level, and is
completely invisible in the generated code.

If -fstrict-volatile-bitfields is now given, we use SImode,
if -fno-strict-volatile-bitfields is given, we give GCC the
freedom to choose the access mode, maybe QImode if that is
faster.

In the _very_ difficult process to find an solution
that seems to be acceptable to all maintainers, we came to
the solution, that we need to adhere to the C++ memory
model by default. And we may not change the default
setting of -fstruct-volatile-bitfields at the same time!

As a future extension we discussed the possibility
to add a new option -fstrict-volatile-bitfields=aapcs
that explicitly allows us to break the C++ memory model.

But I did not yet try to implement this, as I feel that
would certainly not be accepted as we are in Phase3 now.

As another future extension there was the discussion
about the -Wportable-volatility warning, which I see now
as a warning that analyzes the structure layout and
warns about any structures that are not "well-formed",
in the sense, that a bit-field fails to define all
bits of the container.

Those people that do use bit-fields to access device-registers
do always define all bits, and of course in the same mode.

It would be good to have a warning, when some bits are missing.
They currently have to use great care to check their structures
manually.

I had a proposal for that warning but that concentrated
only on the volatile attribute, but I will have to re-write
that completely so that it can be done in stor-layout.c:

It should warn independent of optimization levels or actual
bitfield member references, thus, be implemented entirely at
the time we lay out the structure. The well-formed-ness of
a bit-field makes that possible.

But that will come too late for Phase3 as well.


Regards
Bernd.