Re: load reverse

2013-08-12 Thread Andrew Haley
On 08/12/2013 05:22 AM, sravan megan wrote:
 Anyone please help me to get out of this issue

It's hard for anyone to do that because we don't have your code.
Did you step through insn-output.c with GDB when compiling your
test case?  What happened?

Andrew.



Re: How to specify multiple OSDIRNAME suffixes for multilib (Multilib usage with MPX)?

2013-08-12 Thread Ilya Enkovich
Hi Terry,

Thanks a lot for your reply! I suppose I have to introduce some new
option like MULTILIB_COMPATIBLE to produce additional search locations
for libraries. Does it sound reasonable? Any advice on implementation?

Thanks,
Ilya

2013/8/12 Terry Guo terry@arm.com:


 -Original Message-
 From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of
 Ilya Enkovich
 Sent: Friday, August 09, 2013 8:37 PM
 To: GCC Development
 Subject: How to specify multiple OSDIRNAME suffixes for multilib (Multilib
 usage with MPX)?

 Hi,

 I'm currently trying to create multilib libraries compiled with MPX.
 The main difference with existing multilib variants on i386 target is
 that new targets (32/mpx, 64/mpx) are compatible with old variants
 (32, 64). Also we should not prevent user from using mpx if he does
 not have MPX variants for some libraries - legacy versions should be
 used instead. Thus we need to check several suffixes instead of one.
 E.g. for 64bit MPX binary we should firstly check ../lib64/mpx, then
 check ../lib64 and finally the default one.

 I looked at MULTILIB_REUSE and thought it might solve my problem
 according to documentation: And for some targets it is better to
 reuse an existing multilib than to fall back to default multilib when
 there is no corresponding multilib. [1]. So I tried following
 declarations:

 MULTILIB_OSDIRNAMES+= m64/fmpx=../lib64/mpx
 MULTILIB_REUSE = m64=m64/fmpx

 But it appeared that only the first entry for some options set counts
 when multilibs are parsed in gcc.c and my reuse here is just ignored.

 Is it a wrong implementation of MULTILIB_REUSE or my wrong
 understanding of this option? Is there a way to implement mpx
 multilibs still allowing legacy ones when some mpx libs are missing?

 [1] http://gcc.gnu.org/onlinedocs/gccint/Target-Fragment.html#Target-
 Fragment

 Thanks,
 Ilya

 Hi Ilya,

 Sorry for the later response. I am the author of MULTILIB_REUSE. So far this
 feature is not flexible enough to meet your requirement. It can't
 dynamically decide to choose m64/fmpx if such libraries are there, then
 secondly choose m64 if m64/fmpx don't exist. This feature only makes a
 static decision. The following statement:
  MULTILIB_REUSE = m64=m64/fmpx
 means that when options m64 and fmpx are given, we should reuse libraries
 for m64 always. And for this purpose, we also need:
 MULTILIB_EXCEPTIONS = m64/fmpx to make sure libraries for m64 fmpx won't
 be built.

 If m64/fmpx isn't excluded, the MULTILIB_REUSE will think the required
 libraries are there and no need to reuse.

 IMHO, the way used by gcc to select multilib is based on string match rather
 than detecting the existence of libraries. So the flexible way like you
 wanted isn't supported yet.

 BR,
 Terry






RE: How to specify multiple OSDIRNAME suffixes for multilib (Multilib usage with MPX)?

2013-08-12 Thread Terry Guo


 -Original Message-
 From: Ilya Enkovich [mailto:enkovich@gmail.com]
 Sent: Monday, August 12, 2013 4:37 PM
 To: Terry Guo
 Cc: GCC Development
 Subject: Re: How to specify multiple OSDIRNAME suffixes for multilib
 (Multilib usage with MPX)?
 
 Hi Terry,
 
 Thanks a lot for your reply! I suppose I have to introduce some new option
 like MULTILIB_COMPATIBLE to produce additional search locations for
 libraries. Does it sound reasonable? Any advice on implementation?
 
 Thanks,
 Ilya
 

Make sense to me. And I think the feature you mentioned can cover
MULTILIB_REUSE, so to keep things simple, I would prefer to unifying them
into one term, either MULTILIB_COMPATIBLE or MULTILIB_REUSE. I am ok with
both names.

In terms of implementation, I think gcc as a driver program only decides the
path to libraries based on command line options and multilib configuration,
the linker will finally search the libraries and link them together. When
MULTILIB_COMPATIBLE is provided, gcc can select more than one paths and pass
them to linker. When there is only one compatible library, the linker can
find it by searching all paths, the whole thing can work. But when there are
more than one compatible libraries spread in different paths, I am not sure
it works. You can try it out.

BR,
Terry

 2013/8/12 Terry Guo terry@arm.com:
 
 
  -Original Message-
  From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf
  Of Ilya Enkovich
  Sent: Friday, August 09, 2013 8:37 PM
  To: GCC Development
  Subject: How to specify multiple OSDIRNAME suffixes for multilib
  (Multilib usage with MPX)?
 
  Hi,
 
  I'm currently trying to create multilib libraries compiled with MPX.
  The main difference with existing multilib variants on i386 target is
  that new targets (32/mpx, 64/mpx) are compatible with old variants
  (32, 64). Also we should not prevent user from using mpx if he does
  not have MPX variants for some libraries - legacy versions should be
  used instead. Thus we need to check several suffixes instead of one.
  E.g. for 64bit MPX binary we should firstly check ../lib64/mpx, then
  check ../lib64 and finally the default one.
 
  I looked at MULTILIB_REUSE and thought it might solve my problem
  according to documentation: And for some targets it is better to
  reuse an existing multilib than to fall back to default multilib when
  there is no corresponding multilib. [1]. So I tried following
  declarations:
 
  MULTILIB_OSDIRNAMES+= m64/fmpx=../lib64/mpx MULTILIB_REUSE =
  m64=m64/fmpx
 
  But it appeared that only the first entry for some options set counts
  when multilibs are parsed in gcc.c and my reuse here is just ignored.
 
  Is it a wrong implementation of MULTILIB_REUSE or my wrong
  understanding of this option? Is there a way to implement mpx
  multilibs still allowing legacy ones when some mpx libs are missing?
 
  [1] http://gcc.gnu.org/onlinedocs/gccint/Target-Fragment.html#Target-
  Fragment
 
  Thanks,
  Ilya
 
  Hi Ilya,
 
  Sorry for the later response. I am the author of MULTILIB_REUSE. So
  far this feature is not flexible enough to meet your requirement. It
  can't dynamically decide to choose m64/fmpx if such libraries are
  there, then secondly choose m64 if m64/fmpx don't exist. This feature
  only makes a static decision. The following statement:
   MULTILIB_REUSE = m64=m64/fmpx
  means that when options m64 and fmpx are given, we should reuse
  libraries for m64 always. And for this purpose, we also need:
  MULTILIB_EXCEPTIONS = m64/fmpx to make sure libraries for m64 fmpx
  won't be built.
 
  If m64/fmpx isn't excluded, the MULTILIB_REUSE will think the required
  libraries are there and no need to reuse.
 
  IMHO, the way used by gcc to select multilib is based on string match
  rather than detecting the existence of libraries. So the flexible way
  like you wanted isn't supported yet.
 
  BR,
  Terry
 
 
 
 





Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-12 Thread Peter Zijlstra
On Mon, Aug 05, 2013 at 12:55:15PM -0400, Steven Rostedt wrote:
 [ sent to both Linux kernel mailing list and to gcc list ]
 

Let me hijack this thread for something related...

I've been wanting to 'abuse' static_key/asm-goto to sort-of JIT
if-forest functions like perf_prepare_sample() and perf_output_sample().

They are of the form:

void func(obj, args..)
{
unsigned long f = ...;

if (f  F1)
do_f1();

if (f  F2)
do_f2();

...

if (f  FN)
do_fn();
}

Where f is constant for the entire lifetime of the particular object.

So I was thinking of having these functions use static_key/asm-goto;
then write the proper static key values unsafe so as to avoid all
trickery (as these functions would never actually be used) and copy the
end result into object private memory. The object will then use indirect
calls into these functions.

The advantage of using something like this is that it would work for all
architectures that now support the asm-goto feature. For arch/gcc
combinations that do not we'd simply revert to the current state of
affairs.

I suppose the question is, do people strenuously object to creativity
like that and or is there something GCC can do to make this
easier/better still?



Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX

2013-08-12 Thread Jan Beulich
 On 09.08.13 at 19:03, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Aug 9, 2013 at 12:08 AM, Jan Beulich jbeul...@suse.com wrote:
 On 08.08.13 at 18:01, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, Aug 8, 2013 at 12:19 AM, Jan Beulich jbeul...@suse.com wrote:
 On 08.08.13 at 02:33, H.J. Lu hjl.to...@gmail.com wrote:
 We use the .gnu_attribute directive to record an object attribute:

 enum
 {
   Tag_GNU_X86_EXTERN_BRANCH = 4,
 };

 for the types of external branch instructions in relocatable files.

 enum
 {
   /* All external branch instructions are legacy.  */
   Val_GNU_X86_EXTERN_BRANCH_LEGACY = 0,
   /* There is at lease one external branch instruction with BND prefix.  
 */
   Val_GNU_X86_EXTERN_BRANCH_BND = 1,
 };

 An x86 feature note section, .note.x86-feature, is used to indicate
 features in executables and shared library. The contents of this note
 section are:

 .section.note.x86-feature
 .align  4
 .long   .L1 - .L0
 .long   .L3 - .L2
 .long   1
 .L0:
 .asciz x86 feature
 .L1:
 .align  4
 .L2:
 .longFeatureFlag (Feature flag)
 .L3:

 The current valid bits in FeatureFlag are

 #define NT_X86_FEATURE_PLT_BND(0x1  0)

 It should be set if PLT entry has BND prefix to preserve bound registers.

 The remaining bits in FeatureFlag are reserved.

 When merging Tag_GNU_X86_EXTERN_BRANCH, if any input relocatable
 file has Tag_GNU_X86_EXTERN_BRANCH set to Val_GNU_X86_EXTERN_BRANCH_BND,
 the resulting Tag_GNU_X86_EXTERN_BRANCH value should be
 Val_GNU_X86_EXTERN_BRANCH_BND.

 When generating executable or shared library, if PLT is needed and
 Tag_GNU_X86_EXTERN_BRANCH value is Val_GNU_X86_EXTERN_BRANCH_BND,
 the 32-byte PLT entry should be used and the feature note section should
 be generated with the NT_X86_FEATURE_PLT_BND bit set to 1 and the feature
 note section should be included in PT_NOTE segment. The benefit of the
 note section is it is backward compatible with existing run-time and 
 tools.

 While I can see the purpose of the attribute section, I don't see
 what the note section is for: You don't mention at all what it's
 consumed for, and I also can't see how it validly would be for
 anything. That's because iirc note section contents, if not
 understood by the consumer, is required to not have any effect
 on the correctness of the program. Hence if loaded on a system
 that MPX capable, has an MPX aware kernel, but no MPX aware
 user space (apart from this one executable or shared library, or
 a set thereof), it ought to still work correctly. Which - afaict - it
 won't if the dynamic loader itself isn't MPX aware.


 The note section isn't required for correctness.  But it can be used
 by ld.so to select an alternate MPX aware shared library in a different
 directory, instead of a legacy one.

 Okay, that clarifies your intentions with the note section. However,
 then you need something else to make sure an MPX aware app can't
 load on an MPX enabled kernel without MPX-enabled ld.so.
 
 The MPX enabled app will still run correctly.  ld.so will clear the bound
 registers (that makes unlimited bound) for the first call with lazy binding.

Only if those registers are used for their primary purpose. The
documentation specifically says that this isn't a requirement.
But anyway, I see we're once again not going to get anywhere
with this...

Jan



Re: How to specify multiple OSDIRNAME suffixes for multilib (Multilib usage with MPX)?

2013-08-12 Thread Ilya Enkovich
2013/8/12 Terry Guo terry@arm.com:


 -Original Message-
 From: Ilya Enkovich [mailto:enkovich@gmail.com]
 Sent: Monday, August 12, 2013 4:37 PM
 To: Terry Guo
 Cc: GCC Development
 Subject: Re: How to specify multiple OSDIRNAME suffixes for multilib
 (Multilib usage with MPX)?

 Hi Terry,

 Thanks a lot for your reply! I suppose I have to introduce some new option
 like MULTILIB_COMPATIBLE to produce additional search locations for
 libraries. Does it sound reasonable? Any advice on implementation?

 Thanks,
 Ilya


 Make sense to me. And I think the feature you mentioned can cover
 MULTILIB_REUSE, so to keep things simple, I would prefer to unifying them
 into one term, either MULTILIB_COMPATIBLE or MULTILIB_REUSE. I am ok with
 both names.

 In terms of implementation, I think gcc as a driver program only decides the
 path to libraries based on command line options and multilib configuration,
 the linker will finally search the libraries and link them together. When
 MULTILIB_COMPATIBLE is provided, gcc can select more than one paths and pass
 them to linker. When there is only one compatible library, the linker can
 find it by searching all paths, the whole thing can work. But when there are
 more than one compatible libraries spread in different paths, I am not sure
 it works. You can try it out.

Thanks for tips! I do not want to change semantics of existing option
and will try to implement new option. I hope it will work fine with
multiple compatible libraries available. At least simple test with
providing two paths with the same library worked fine for me. Linker
just chooses the first path.

Thanks,
Ilya


 BR,
 Terry

 2013/8/12 Terry Guo terry@arm.com:
 
 
  -Original Message-
  From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf
  Of Ilya Enkovich
  Sent: Friday, August 09, 2013 8:37 PM
  To: GCC Development
  Subject: How to specify multiple OSDIRNAME suffixes for multilib
  (Multilib usage with MPX)?
 
  Hi,
 
  I'm currently trying to create multilib libraries compiled with MPX.
  The main difference with existing multilib variants on i386 target is
  that new targets (32/mpx, 64/mpx) are compatible with old variants
  (32, 64). Also we should not prevent user from using mpx if he does
  not have MPX variants for some libraries - legacy versions should be
  used instead. Thus we need to check several suffixes instead of one.
  E.g. for 64bit MPX binary we should firstly check ../lib64/mpx, then
  check ../lib64 and finally the default one.
 
  I looked at MULTILIB_REUSE and thought it might solve my problem
  according to documentation: And for some targets it is better to
  reuse an existing multilib than to fall back to default multilib when
  there is no corresponding multilib. [1]. So I tried following
  declarations:
 
  MULTILIB_OSDIRNAMES+= m64/fmpx=../lib64/mpx MULTILIB_REUSE =
  m64=m64/fmpx
 
  But it appeared that only the first entry for some options set counts
  when multilibs are parsed in gcc.c and my reuse here is just ignored.
 
  Is it a wrong implementation of MULTILIB_REUSE or my wrong
  understanding of this option? Is there a way to implement mpx
  multilibs still allowing legacy ones when some mpx libs are missing?
 
  [1] http://gcc.gnu.org/onlinedocs/gccint/Target-Fragment.html#Target-
  Fragment
 
  Thanks,
  Ilya
 
  Hi Ilya,
 
  Sorry for the later response. I am the author of MULTILIB_REUSE. So
  far this feature is not flexible enough to meet your requirement. It
  can't dynamically decide to choose m64/fmpx if such libraries are
  there, then secondly choose m64 if m64/fmpx don't exist. This feature
  only makes a static decision. The following statement:
   MULTILIB_REUSE = m64=m64/fmpx
  means that when options m64 and fmpx are given, we should reuse
  libraries for m64 always. And for this purpose, we also need:
  MULTILIB_EXCEPTIONS = m64/fmpx to make sure libraries for m64 fmpx
  won't be built.
 
  If m64/fmpx isn't excluded, the MULTILIB_REUSE will think the required
  libraries are there and no need to reuse.
 
  IMHO, the way used by gcc to select multilib is based on string match
  rather than detecting the existence of libraries. So the flexible way
  like you wanted isn't supported yet.
 
  BR,
  Terry
 
 
 
 





Re: [RFC] vector subscripts/BIT_FIELD_REF in Big Endian.

2013-08-12 Thread Tejas Belagod
What's interesting to me here is the bitpos - does this not need 
BYTES_BIG_ENDIAN correction? This seems to be inconsistenct with what happens 
with reduction operations in the autovectorizer where the scalar result in the 
reduction epilogue gets extracted with a BIT_FIELD_REF but the bitpos there is 
corrected for BIG_ENDIAN.


a[0] is at the left end of the array in BIG_ENDIAN, and big-endian
machines number bits from the left, so bit position 0 is correct.



...
   vect_sum_9.17_74 = [reduc_plus_expr] vect_sum_9.15_73;
   stmp_sum_9.16_75 = BIT_FIELD_REF vect_sum_9.17_74, 32, 96;
   sum_76 = stmp_sum_9.16_75 + sum_47;

the BIT_FIELD_REF here seems to have been corrected for BYTES_BIG_ENDIAN


Yes, because something else is going on here.  This is a reduction
operation where the sum ends up in the rightmost element of a vector
register that contains four 32-bit integers.  This is at position 96
from the left end of the register according to big-endian numbering.



Thanks for your reply.

Sorry, I'm still a bit confused here. The reduc_splus_ documentation says

Compute the sum of the signed elements of a vector. The vector is operand 1,
and the scalar result is stored in the least significant bits of operand 0
(also a vector).

Shouldn't this mean the scalar result should be in bitpos 0 which is the left 
end of the register in BIG ENDIAN?


Thanks,
Tejas

If vec_extract is defined in the back-end, how does one figure out if the 
BIT_FIELD_REF is a product of the gimplifier's indirect ref folding or the 
vectorizer's bit-field extraction and apply the appropriate correction in 
vec_extract's expansion? Or am I missing something that corrects BIT_FIELD_REFs 
between the gimplifier and the RTL expander?


There is no inconsistency here.

Hope this helps!
Bill


Thanks,
Tejas.









Re: [RFC] vector subscripts/BIT_FIELD_REF in Big Endian.

2013-08-12 Thread Bill Schmidt
On Mon, 2013-08-12 at 11:54 +0100, Tejas Belagod wrote:
  What's interesting to me here is the bitpos - does this not need 
  BYTES_BIG_ENDIAN correction? This seems to be inconsistenct with what 
  happens 
  with reduction operations in the autovectorizer where the scalar result in 
  the 
  reduction epilogue gets extracted with a BIT_FIELD_REF but the bitpos 
  there is 
  corrected for BIG_ENDIAN.
  
  a[0] is at the left end of the array in BIG_ENDIAN, and big-endian
  machines number bits from the left, so bit position 0 is correct.
  
 
  ...
 vect_sum_9.17_74 = [reduc_plus_expr] vect_sum_9.15_73;
 stmp_sum_9.16_75 = BIT_FIELD_REF vect_sum_9.17_74, 32, 96;
 sum_76 = stmp_sum_9.16_75 + sum_47;
 
  the BIT_FIELD_REF here seems to have been corrected for BYTES_BIG_ENDIAN
  
  Yes, because something else is going on here.  This is a reduction
  operation where the sum ends up in the rightmost element of a vector
  register that contains four 32-bit integers.  This is at position 96
  from the left end of the register according to big-endian numbering.
  
 
 Thanks for your reply.
 
 Sorry, I'm still a bit confused here. The reduc_splus_ documentation says
 
 Compute the sum of the signed elements of a vector. The vector is operand 1,
 and the scalar result is stored in the least significant bits of operand 0
 (also a vector).
 
 Shouldn't this mean the scalar result should be in bitpos 0 which is the left 
 end of the register in BIG ENDIAN?

No.  The least significant bits of any register are the rightmost bits,
and big-endian numbering begins at the left.  (I don't really like the
commentary, since least significant bits isn't a very good term to use
with vectors.)  Analogously, a 64-bit integer is numbered with 0 on the
left being the most significant bit, and 63 on the right being the least
significant bit.

Thanks,
Bill

 
 Thanks,
 Tejas
 
  If vec_extract is defined in the back-end, how does one figure out if the 
  BIT_FIELD_REF is a product of the gimplifier's indirect ref folding or the 
  vectorizer's bit-field extraction and apply the appropriate correction in 
  vec_extract's expansion? Or am I missing something that corrects 
  BIT_FIELD_REFs 
  between the gimplifier and the RTL expander?
  
  There is no inconsistency here.
  
  Hope this helps!
  Bill
  
  Thanks,
  Tejas.
 
  
  
 
 



Re: i686 elf return values

2013-08-12 Thread Maciej W. Rozycki
On Tue, 6 Aug 2013, Gabriel Dos Reis wrote:

 On Tue, Aug 6, 2013 at 1:46 PM, Nathan Sidwell nat...@acm.org wrote:
  Hi,
  i386elf.h defines:
 
  /* The ELF ABI for the i386 says that records and unions are returned
 in memory.  */
 
  #define SUBTARGET_RETURN_IN_MEMORY(TYPE, FNTYPE) \
  (TYPE_MODE (TYPE) == BLKmode \
   || (VECTOR_MODE_P (TYPE_MODE (TYPE))  int_size_in_bytes (TYPE) ==
  8))
 
  and as such differs from the regular i86 return mechanism.  Notice that the
  comment doesn't match the code:
  *) some structs/unions are non BLKmode
  *) some vectors can be BLKmode, some might not be -- the vector mode check
  appears to be an attempt to catch DImode vectors.
 
  Basing your ABI on the internal modes used by the compiler is not, IMHO, a
  sensible design choice.
 
  This code doesn't appear at first glance to cope with transparent_union.  In
  fact it looks pretty bitrotted.
 
  Is it best just to junk the different behaviour at this point?
 
 Yes and yes :-)

 This piece was introduced along i386elf.h itself at r28057 (1999-07-11) 
as:

#define RETURN_IN_MEMORY(TYPE) \
  (TYPE_MODE (TYPE) == BLKmode)

while the corresponding i386.h piece was:

#define RETURN_IN_MEMORY(TYPE) \
  ((TYPE_MODE (TYPE) == BLKmode) || int_size_in_bytes (TYPE)  12)

AFAICT at that time for the i386 target GCC only supported integer and x87 
data types.  The largest native (hardware) data type therefore was the x87 
80-bit extended represented in C as a 12-byte `long double' type.  For 
this and narrower types the two macros produce the same result.  Also at 
that time IIUC all structs/unions were BLKmode.

 So the difference between the two macros only applied to complex types.
The former would only put `float complex' data in registers (EDX:EAX) -- 
because that type fits in 8 bytes -- and any other results in memory.  The 
latter however wanted to put all complex data in registers, including 
`double complex' (16 bytes) and `long double complex' (24 bytes).

 As observed by Nathan in:

http://gcc.gnu.org/ml/gcc-patches/2013-08/msg00373.html

this can't possibly work, as i386 only makes 3 GPRs possibly available 
(defined as call-clobbered) for results: EAX, EDX and ECX.

 Jeff, given the above -- do you happen to remember what made you make the 
i386/ELF target different from the base i386 target?  Did I miss anything 
in the consideration above?  The decision looks like has been deliberate, 
the i386.h default would normally apply (as i386elf.h included it at the 
beginning), but the macro was explicitly undefined and then defined as 
above.

 Of course even the decision made by the base i386 target to put `float 
complex' data in GPRs seems odd to me -- that's awkward and costly to 
handle as there's no way to pass such data between FPRs and GPRs without 
going through memory (so what's the point not to just leave it there?).  
And then this data is obviously useless in GPRs as it has to be put back 
to FPRs by the caller by the same long route if any further arithmetic is 
required.  If in registers at all, I would expect complex results to be 
returned in ST(1):ST(0) -- that would make them accessible right away and 
the ABI consistent with real results returned in ST(0).

 Then, as further evolution, with the addition of MMX and SSE support at 
r34721 (2000-06-26) both macros were changed, to:

#define RETURN_IN_MEMORY(TYPE) \
  (TYPE_MODE (TYPE) == BLKmode \
   || (VECTOR_MODE_P (TYPE_MODE (TYPE))  int_size_in_bytes (TYPE) == 8))

and:

#define RETURN_IN_MEMORY(TYPE)\
  ((TYPE_MODE (TYPE) == BLKmode)  \
   || (VECTOR_MODE_P (TYPE_MODE (TYPE))  int_size_in_bytes (TYPE) == 8) \
   || (int_size_in_bytes (TYPE)  12  TYPE_MODE (TYPE) != TImode\
! VECTOR_MODE_P (TYPE_MODE (TYPE

respectively, which IIUC made MMX data returned in memory and SSE data in 
registers -- in both cases.

 Bernd, if you still remember: am I missing anything here?  Especially the 
TImode piece is unobvious to me -- why would it matter for MMX/SSE?  
Neither supports 128-bit integers.

 From this point onwards no further changes were made to the version of 
RETURN_IN_MEMORY in i386elf.h.

 The version in i386.h was updated with r39693 (2001-02-14):

#define RETURN_IN_MEMORY(TYPE)  \
  ((TYPE_MODE (TYPE) == BLKmode)\
   || (VECTOR_MODE_P (TYPE_MODE (TYPE))  int_size_in_bytes (TYPE) == 8)\
   || (int_size_in_bytes (TYPE)  12  TYPE_MODE (TYPE) != TImode  \
TYPE_MODE (TYPE) != TFmode  ! VECTOR_MODE_P (TYPE_MODE (TYPE

and then in r45726 (2001-09-21) factored out to ix86_return_in_memory:

#define RETURN_IN_MEMORY(TYPE)  \
  ix86_return_in_memory (TYPE)

-- which eventually evolved to its current form.

 My conclusion therefore is i386/ELF was not maintained, as far as the 

Re: i686 elf return values

2013-08-12 Thread Nathan Sidwell

On 08/12/13 08:07, Maciej W. Rozycki wrote:


  My conclusion therefore is i386/ELF was not maintained, as far as the
return convention is concerned, beyond r34721 and it looks to me like it
should have been converted with r45726 to make use of
ix86_return_in_memory just like generic i386, perhaps with a special
exception for complex types (although as I noted above, this exception was
probably a mistake from the beginning).

  Any thoughts?


Thanks for digging into this.  It does look like the ABI is accidental, and 
i686elf.h should not define SUBTARGET_RETURN_IN_MEMORY.


nathan



Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-12 Thread H. Peter Anvin
On 08/12/2013 02:17 AM, Peter Zijlstra wrote:
 
 I've been wanting to 'abuse' static_key/asm-goto to sort-of JIT
 if-forest functions like perf_prepare_sample() and perf_output_sample().
 
 They are of the form:
 
 void func(obj, args..)
 {
   unsigned long f = ...;
 
   if (f  F1)
   do_f1();
 
   if (f  F2)
   do_f2();
 
   ...
 
   if (f  FN)
   do_fn();
 }
 

Am I reading this right that f can be a combination of any of these?

 Where f is constant for the entire lifetime of the particular object.
 
 So I was thinking of having these functions use static_key/asm-goto;
 then write the proper static key values unsafe so as to avoid all
 trickery (as these functions would never actually be used) and copy the
 end result into object private memory. The object will then use indirect
 calls into these functions.

I'm really not following what you are proposing here, especially not
copy the end result into object private memory.

With asm goto you end up with at minimum a jump or NOP for each of these
function entries, whereas an actual JIT can elide that as well.

On the majority of architectures, including x86, you cannot simply copy
a piece of code elsewhere and have it still work.  You end up doing a
bunch of the work that a JIT would do anyway, and would end up with
considerably higher complexity and worse results than a true JIT.  You
also say the object will then use indirect calls into these
functions... you mean the JIT or pseudo-JIT generated functions, or the
calls inside them?

 I suppose the question is, do people strenuously object to creativity
 like that and or is there something GCC can do to make this
 easier/better still?

I think it would be much easier to just write a minimal JIT for this,
even though it is per architecture.  However, I would really like to
understand what the value is.

-hpa



[no subject]

2013-08-12 Thread Агентство интенет рекламы
E-mail рассылки рекламы

Возможные базы:
- Москва и Петербург;
- Города РФ;
- Фирмы любых сфер бизнеса;
- Любые страны;

Любые формы оплаты.
Моментальный эффект.
Самые низкие расценки ны рынке.
Дателизированный отчет в личном кабинете.

Обращайтесь по любым возникшим вопросам по телефону:  7(49  5) 5О 2 ~ 6 1  - 8  
5


Re: HAVE_ATTR_enabled mishandling?

2013-08-12 Thread Chung-Ju Wu
On 7/10/13 5:51 AM, David Given wrote:
 I think I have found a bug. This is in stock gcc 4.8.1...
 
 My backend does not use the 'enabled' attribute; therefore the following
 code in insn-attr.h kicks in:
 
   #ifndef HAVE_ATTR_enabled
   #define HAVE_ATTR_enabled 0
   #endif
 
 Therefore the following code in gcc/lra-constraints.c is enabled:
 
   #ifdef HAVE_ATTR_enabled
   if (curr_id-alternative_enabled_p != NULL
  ! curr_id-alternative_enabled_p[nalt])
   continue;
   #endif
 
 -alternative_enabled_p is bogus; therefore segfault.
 
 Elsewhere I see structures of the form:
 
   #if HAVE_ATTR_enabled
   ...
   #endif
 
 So I think that #ifdef above is a straight typo. Certainly, changing it
 to a #if makes the crash go away...
 

Hi, Vladimir,

Apparently the issue that David mentioned has already been fixed earlier:
  http://gcc.gnu.org/r198344

2013-04-26  Vladimir Makarov  vmaka...@redhat.com

...
* lra-constraints.c (curr_insn_set): New.
...
(process_alt_operands): Use it.  Use #if HAVE_ATTR_enabled instead
of #ifdef.  Add code to remove cycling.
...

However, such change is only applied on trunk but not on 4.8 branch.
Since 4.8 branch is still open and this issue seems to be a bug,
perhaps it is a good idea to backport this part.

What do you think? :)


Best regards,
jasonwucj



Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-12 Thread Andi Kleen
H. Peter Anvin h...@linux.intel.com writes:

 However, I would really like to
 understand what the value is.

Probably very little. When I last looked at it, the main overhead in
perf currently seems to be backtraces and the ring buffer, not this
code.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only


Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-12 Thread Peter Zijlstra
On Mon, Aug 12, 2013 at 07:56:10AM -0700, H. Peter Anvin wrote:
 On 08/12/2013 02:17 AM, Peter Zijlstra wrote:
  
  I've been wanting to 'abuse' static_key/asm-goto to sort-of JIT
  if-forest functions like perf_prepare_sample() and perf_output_sample().
  
  They are of the form:
  
  void func(obj, args..)
  {
  unsigned long f = ...;
  
  if (f  F1)
  do_f1();
  
  if (f  F2)
  do_f2();
  
  ...
  
  if (f  FN)
  do_fn();
  }
  
 
 Am I reading this right that f can be a combination of any of these?

Correct.

  Where f is constant for the entire lifetime of the particular object.
  
  So I was thinking of having these functions use static_key/asm-goto;
  then write the proper static key values unsafe so as to avoid all
  trickery (as these functions would never actually be used) and copy the
  end result into object private memory. The object will then use indirect
  calls into these functions.
 
 I'm really not following what you are proposing here, especially not
 copy the end result into object private memory.
 
 With asm goto you end up with at minimum a jump or NOP for each of these
 function entries, whereas an actual JIT can elide that as well.
 
 On the majority of architectures, including x86, you cannot simply copy
 a piece of code elsewhere and have it still work.

I thought we used -fPIC which would allow just that.

 You end up doing a
 bunch of the work that a JIT would do anyway, and would end up with
 considerably higher complexity and worse results than a true JIT.  

Well, less complexity but worse result, yes. We'd only poke the specific
static_branch sites with either NOPs or the (relative) jump target for
each of these branches. Then copy the result.

 You
 also say the object will then use indirect calls into these
 functions... you mean the JIT or pseudo-JIT generated functions, or the
 calls inside them?

The calls to these pseudo-JIT generated functions.

  I suppose the question is, do people strenuously object to creativity
  like that and or is there something GCC can do to make this
  easier/better still?
 
 I think it would be much easier to just write a minimal JIT for this,
 even though it is per architecture.  However, I would really like to
 understand what the value is.

Removing a lot of the conditionals from the sample path. Depending on
the configuration these can be quite expensive.


Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-12 Thread H. Peter Anvin
On 08/12/2013 09:09 AM, Peter Zijlstra wrote:

 On the majority of architectures, including x86, you cannot simply copy
 a piece of code elsewhere and have it still work.
 
 I thought we used -fPIC which would allow just that.
 

Doubly wrong.  The kernel is not compiled with -fPIC, nor does -fPIC
allow this kind of movement for code that contains intramodule
references (that is *all* references in the kernel).  Since we really
doesn't want to burden the kernel with a GOT and a PLT, that is life.

 You end up doing a
 bunch of the work that a JIT would do anyway, and would end up with
 considerably higher complexity and worse results than a true JIT.  
 
 Well, less complexity but worse result, yes. We'd only poke the specific
 static_branch sites with either NOPs or the (relative) jump target for
 each of these branches. Then copy the result.

Once again, you can't copy the result.  You end up with a full
disassembler.

-hpa



Combine pass with reused sources

2013-08-12 Thread Lu, John
Hi,

I'm working on compiler for an architecture with a multiply instruction that
takes two 32-bit factors, sign-extends both factors to 64-bits and then does a 
64-bit multiplication and stores the result to a destination register.  The 
combine pass successfully generates the pattern (mulhizi3) for this instruction 
twice for the following function.


long long res0;
long long res1;

long f1(long a, long b, long c, long d) {
  res0=((long long) a)*((long long) b);
  res1=((long long) c)*((long long) d);
}

The generated RTL from combine looks like:

(insn 10 9 11 2 g.c:5 (set (reg:ZI 176)
(mult:ZI (sign_extend:ZI (reg:HI 9 r6 [ b ]))
   (sign_extend:ZI (reg:HI 6 r4 [ a ] 262 {*mulhizi3} (nil))

However, if I modify the function so that one of the factors is reused,

long f1(long a, long b, long c) {
  res0=((long long) a)*((long long) b);
  res1=((long long) c)*((long long) b);
}

combine will not fuse the reused sign-extension result to generate the
mulhizi3 pattern.  

I am wondering if anyone else has hit this issue or if I have done something 
wrong in my port.  Any help would be greatly appreciated.

Thanks,
John Lu




[Bug fortran/56666] Suppression flag for DO loop at (1) will be executed zero times

2013-08-12 Thread tkoenig at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

--- Comment #3 from Thomas Koenig tkoenig at gcc dot gnu.org ---
Files modified in the GCC repository. Log entry:

2013-08-12  Thomas Koenig  tkoe...@gcc.gnu.org

* gcc-4.9/changes.html:  Document -Wzerotrip.


[Bug middle-end/58134] New: -ftree-vectorizer-verbose=n shows vectroiyed loops only for N== 1 and N 2 but not for N==2

2013-08-12 Thread burnus at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58134

Bug ID: 58134
   Summary: -ftree-vectorizer-verbose=n shows vectroiyed loops
only for N== 1 and N 2 but not for N==2
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Keywords: diagnostic
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: burnus at gcc dot gnu.org

... -ftree-vectorizer-verbose=1 test.cc
test.cc:8: note: Vectorized loop

But no result for
-ftree-vectorizer-verbose=2 test.cc 21|grep 'Vectorized loop'

Again with n = 3:
-ftree-vectorizer-verbose=3 test.cc 21|grep 'Vectorized loop'
test.cc:8: note: Vectorized loop


#include algorithm
typedef int myint;

void max(__restrict myint *data, myint val, int n) {
  //__assume_aligned(data,64);
  data = (myint*) __builtin_assume_aligned(data, 64);
  for (int i = 0; i  n; i++)
data[i] = std::max(data[i], val);
}


[Bug middle-end/58096] [4.9 Regression] gcc.dg/tree-ssa/attr-alias.c fails with r201439

2013-08-12 Thread yufeng at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58096

Yufeng Zhang yufeng at gcc dot gnu.org changed:

   What|Removed |Added

 CC||yufeng at gcc dot gnu.org

--- Comment #3 from Yufeng Zhang yufeng at gcc dot gnu.org ---
The test case also failed on ARM and AArch64


[Bug middle-end/58125] [4.9 Regression] ICE: in operator[], at vec.h:827 with -fno-inline-small-functions

2013-08-12 Thread mpolacek at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58125

Marek Polacek mpolacek at gcc dot gnu.org changed:

   What|Removed |Added

 CC||hubicka at gcc dot gnu.org

--- Comment #2 from Marek Polacek mpolacek at gcc dot gnu.org ---
Started with r201439.


[Bug middle-end/58125] [4.9 Regression] ICE: in operator[], at vec.h:827 with -fno-inline-small-functions

2013-08-12 Thread mpolacek at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58125

--- Comment #3 from Marek Polacek mpolacek at gcc dot gnu.org ---
Seems like we're trying to access (*inline_summary_vec)[node-uid]; where the
node-uid is 8, but inline_summary_vec's length is 8.


[Bug fortran/52153] REAL128 gives extended precision, not quad precision

2013-08-12 Thread latlon90180+gcc_bugzilla at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52153

A. Kasahara latlon90180+gcc_bugzilla at gmail dot com changed:

   What|Removed |Added

 CC||latlon90180+gcc_bugzilla@gm
   ||ail.com

--- Comment #8 from A. Kasahara latlon90180+gcc_bugzilla at gmail dot com ---
Is there any progress on this?
REAL128 of gfortran4.8 is still 10.


[Bug c++/58129] [C++11] Lack of access control checking using auto type deduction

2013-08-12 Thread redi at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58129

Jonathan Wakely redi at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #1 from Jonathan Wakely redi at gcc dot gnu.org ---
Access control applies to names, and you don't use the private name Private
so there's no error.


[Bug gcov-profile/58127] [4.9 Regression] 37 failures in gcc.dg/tree-prof/ for x86_64-apple-darwin10

2013-08-12 Thread dominiq at lps dot ens.fr
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58127

--- Comment #1 from Dominique d'Humieres dominiq at lps dot ens.fr ---
Revision 201632 is OK, r201634 is not.


[Bug tree-optimization/57980] [4.7/4.8/4.9 Regression] gcc 4.8.1 -foptimize-sibling-calls -O1 ICE in build_int_cst_wide, at tree.c:1210

2013-08-12 Thread mpolacek at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57980

--- Comment #3 from Marek Polacek mpolacek at gcc dot gnu.org ---
Author: mpolacek
Date: Mon Aug 12 08:46:41 2013
New Revision: 201660

URL: http://gcc.gnu.org/viewcvs?rev=201660root=gccview=rev
Log:
PR tree-optimization/57980

Added:
trunk/gcc/testsuite/gcc.dg/pr57980.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-tailcall.c


[Bug tree-optimization/57980] [4.7/4.8/4.9 Regression] gcc 4.8.1 -foptimize-sibling-calls -O1 ICE in build_int_cst_wide, at tree.c:1210

2013-08-12 Thread mpolacek at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57980

--- Comment #4 from Marek Polacek mpolacek at gcc dot gnu.org ---
Fixed on trunk.


[Bug tree-optimization/58006] [4.8/4.9 Regression] ICE compiling VegaStrike with -ffast-math -ftree-parallelize-loops=2

2013-08-12 Thread vincent.legoll at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58006

vincent.legoll at gmail dot com changed:

   What|Removed |Added

 CC||vincent.legoll at gmail dot com

--- Comment #7 from vincent.legoll at gmail dot com ---
Hello, I got the same under Debian Jessie

$ gcc-4.8 -v
Using built-in specs.
COLLECT_GCC=gcc-4.8
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.8/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.8.1-2'
--with-bugurl=file:///usr/share/doc/gcc-4.8/README.Bugs
--enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr
--program-suffix=-4.8 --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--with-gxx-include-dir=/usr/include/c++/4.8 --libdir=/usr/lib --enable-nls
--with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --enable-gnu-unique-object --enable-plugin
--with-system-zlib --disable-browser-plugin --enable-java-awt=gtk
--enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64/jre
--enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64
--with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.8-amd64
--with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar
--enable-objc-gc --enable-multiarch --with-arch-32=i586 --with-abi=m64
--with-multilib-list=m32,m64,mx32 --with-tune=generic --enable-checking=release
--build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.8.1 (Debian 4.8.1-2)


[Bug tree-optimization/58039] -ftree-vectorizer makes a loop crash on a non-aligned memory

2013-08-12 Thread bar at mariadb dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58039

--- Comment #4 from Alexander Barkov bar at mariadb dot org ---
Mikael, thanks for  your comment on this.

(In reply to Mikael Pettersson from comment #3)
 Your code performs mis-aligned uint16_t stores, which x86 allows.

Right, this is done for performance purposes.


 The
 vectorizer turns those into larger and still mis-aligned `movdqa' stores,
 which x86 does not allow, hence the SEGV.

Can you please clarify: is it a bug in the recent gcc versions?

Note, we've used such performance improvement tricks for years.
It worked perfectly fine until now.
Has anything changed in how the gcc vectorizer works recently?


 
 Replace the non-portable mis-aligned stores with portable code like
 
 #define int2store_little_endian(s,A) memcpy((s), (A), 2)
 
 or gcc-specific code like
 
 struct __attribute__((__packed__)) packed_uint16 {
 uint16_t u16;
 };
 #define int2store_little_endian(s,A) ((struct packed_uint16*)(s))-u16 = (A)
 
 and then the vectorizer generates large `movdqu' stores, which is pretty
 much the best you can hope for unless you rewrite the code to avoid
 mis-aligned stores.


Unfortunately it's not possible to avoid mis-aligned stores due to the
project architecture.


I've read somewhere that gcc vectorizer generates two code branches,
for aligned memory and for non-aligned memory (but can't find
the reference now). Can you please confirm this?

Thanks.


[Bug tree-optimization/58135] New: [x86] Missed opportunities for partial SLP

2013-08-12 Thread ysrumyan at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58135

Bug ID: 58135
   Summary: [x86] Missed opportunities for partial SLP
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ysrumyan at gmail dot com

If we consider the following simple test-case 

int a[100];
void foo()
{
  a[0] = a[1] = a[2] = a[3] = 0;
}
SLP vectorization of basic block takes place:

gcc -S -O3 -m32  t.c -ftree-vectorizer-verbose=1  

t.c:4: note: Vectorized basic-block

but if we add at least one more assignment it won't be vectorized:

a[0] = a[1] = a[2] = a[3] = a[4] = 0;

t11.c:4: note: Build SLP failed: unrolling required in basic block SLP

It is clear that gcc can do partial BB vectorization, i.e. vectorize the first
4 assignments only.


[Bug tree-optimization/58039] -ftree-vectorizer makes a loop crash on a non-aligned memory

2013-08-12 Thread mikpe at it dot uu.se
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58039

--- Comment #5 from Mikael Pettersson mikpe at it dot uu.se ---
(In reply to Alexander Barkov from comment #4)
  The
  vectorizer turns those into larger and still mis-aligned `movdqa' stores,
  which x86 does not allow, hence the SEGV.
 
 Can you please clarify: is it a bug in the recent gcc versions?
 
 Note, we've used such performance improvement tricks for years.
 It worked perfectly fine until now.
 Has anything changed in how the gcc vectorizer works recently?

I know next to nothing about the vectorizer, so I cannot comment on this.

 Unfortunately it's not possible to avoid mis-aligned stores due to the
 project architecture.

Mis-aligned accesses are Ok, as long as they are expressed using the proper
mechanisms (memcpy, attribute packed, or pragma packed).

 I've read somewhere that gcc vectorizer generates two code branches,
 for aligned memory and for non-aligned memory (but can't find
 the reference now). Can you please confirm this?

I don't know, see above.


[Bug regression/58084] FAIL: gcc.dg/torture/pr8081.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (internal compiler error)

2013-08-12 Thread hubicka at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58084

Jan Hubicka hubicka at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||hubicka at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |hubicka at gcc dot 
gnu.org

--- Comment #4 from Jan Hubicka hubicka at gcc dot gnu.org ---
Mine...


[Bug lto/58108] [4.9 regression] 32-bit g++.dg/torture/covariant-1.C -O2 -flto FAILs

2013-08-12 Thread hubicka at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58108

Jan Hubicka hubicka at gcc dot gnu.org changed:

   What|Removed |Added

 CC||hubicka at gcc dot gnu.org

--- Comment #1 from Jan Hubicka hubicka at gcc dot gnu.org ---
Does this bug still reproduce (I fixed problem related to x86 local calls that
may fix this too)


[Bug libgomp/38724] Segfault caused by derived-type with allocatable component in private clause

2013-08-12 Thread janus at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38724

--- Comment #7 from janus at gcc dot gnu.org ---
see also
https://groups.google.com/forum/?fromgroups#!topic/comp.lang.fortran/vPs4MJamnCM


[Bug regression/58084] FAIL: gcc.dg/torture/pr8081.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (internal compiler error)

2013-08-12 Thread hubicka at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58084

--- Comment #5 from Jan Hubicka hubicka at gcc dot gnu.org ---
OK, the problem is that the return type of nested function is variable sized
type of the outer functions.  These types go to function sections and are not
merged.
We used to not ICE just by luck - RESTLT_DECL went to global section that
created yet another unmerged version of the type that got into RESULT_DECL.

This is not only problem of this kind and I am not quite sure what to do here:
either we need to invent way how to refer items in the other function section,
or we need to put all abstract origins into global stream completely.
The second would be very expensive...

In this partiuclar case we probably can just teach tree-inline to
VOID_CONVERT_EXPR when needed?


[Bug fortran/56655] [F03] ASSOCIATE construct with OpenMP triggers ICE

2013-08-12 Thread janus at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56655

--- Comment #4 from janus at gcc dot gnu.org ---
The final specification of OpenMP 4.0 has been published by now and apparently
supports the ASSOCIATE construct.


[Bug c/58136] New: Initialized static global variables cause segfault on AIX with debugging symbols

2013-08-12 Thread gcc at rkeene dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58136

Bug ID: 58136
   Summary: Initialized static global variables cause segfault on
AIX with debugging symbols
   Product: gcc
   Version: 4.8.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gcc at rkeene dot org

Tested with gcc 4.6.3 and gcc 4.8.1 with binutils 2.22

Program Listing #1 (test-1.c):
static unsigned int test = 3;
int main(int argc, char **argv) {
test = 4;
return(0);
}

Program Listing #2 (test-2.c):
static unsigned int test;
int main(int argc, char **argv) {
test = 4;
return(0);
}

Compiling program listing #1 (above) with the -gxcoff argument causes a
segfault.  Leaving off the -gxcoff argument, or not initializing the static
global (program listing #2, above) causes the program to not segfault.

$ powerpc-ibm-aix5.3.0.0-gcc -gxcoff -o test-1_g test-1.c
$ ./test-1_g
Segmentation fault (core dumped)

$ powerpc-ibm-aix5.3.0.0-gcc -o test-1 test-1.c
$ ./test-1

$ powerpc-ibm-aix5.3.0.0-gcc -gxcoff -o test-2_g test-2.c
$ ./test-2_g

$ powerpc-ibm-aix5.3.0.0-gcc -o test-2 test-2.c
$ ./test-2


[Bug fortran/38724] Segfault caused by derived-type with allocatable component in private clause

2013-08-12 Thread janus at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38724

janus at gcc dot gnu.org changed:

   What|Removed |Added

   Keywords||accepts-invalid
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2013-08-12
  Component|libgomp |fortran
 Ever confirmed|0   |1

--- Comment #8 from janus at gcc dot gnu.org ---
(In reply to Steve Kargl from comment #6)
 I agree gfortran should reject the program until we have some idea of
 the behavior with regards to OpenMP 4.0.

It seems that the final OpenMP 4.0 specification does not support allocatable
components. In particular it lists Allocatable enhancement as unsupported,
which supposedly refers to TR 15581 and therefore includes alloc. comp., see

http://openmp.org/wp/openmp-specifications/

So the test case should probably be rejected by the front end (alternatively:
support it as a GNU extension).


[Bug tree-optimization/58137] New: [trunk, ICE] full unroll + AVX2 vectorization

2013-08-12 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58137

Bug ID: 58137
   Summary: [trunk, ICE] full unroll + AVX2 vectorization
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kirill.yukhin at intel dot com

Created attachment 30635
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30635action=edit
Reproducer

Hello attached test produces ICE, when compiled as
$ gcc -S -O3 1.c -mavx2

It seems that full unroll or copyprop (or whatever) introduces something wrong.

1.c: In function 'more_xrv':
1.c:23:1: error: type mismatch in pointer plus expression
 more_xrv(void)
 ^
struct XRV *

struct XRV *

struct XRV *

vect_vec_iv_.15_88 = vect_cst_.13_60 + { 64B, 64B, 64B, 64B };
1.c:23:1: error: type mismatch in pointer plus expression
struct XRV *

struct XRV *

struct XRV *

...


[Bug tree-optimization/58137] [trunk, ICE] full unroll + AVX2 vectorization

2013-08-12 Thread kirill.yukhin at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58137

--- Comment #1 from Yukhin Kirill kirill.yukhin at intel dot com ---
Actually, this case come while debugging Spec2000's perl workload on AVX-512
changes (with bigger tripcount).


[Bug fortran/46271] [F03] OpenMP default(none) and procedure pointers

2013-08-12 Thread janus at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46271

janus at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||janus at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |janus at gcc dot gnu.org
Summary|OpenMP default(none) and|[F03] OpenMP default(none)
   |procedure pointers  |and procedure pointers

--- Comment #2 from janus at gcc dot gnu.org ---
Here is a simple patch to accept version A:


Index: gcc/fortran/openmp.c
===
--- gcc/fortran/openmp.c(revision 201653)
+++ gcc/fortran/openmp.c(working copy)
@@ -847,7 +847,7 @@ resolve_omp_clauses (gfc_code *code)
 for (n = omp_clauses-lists[list]; n; n = n-next)
   {
 n-sym-mark = 0;
-if (n-sym-attr.flavor == FL_VARIABLE)
+if (n-sym-attr.flavor == FL_VARIABLE || n-sym-attr.proc_pointer)
   continue;
 if (n-sym-attr.flavor == FL_PROCEDURE
  n-sym-result == n-sym
@@ -876,8 +876,6 @@ resolve_omp_clauses (gfc_code *code)
 if (el)
   continue;
   }
-if (n-sym-attr.proc_pointer)
-  continue;
   }
 gfc_error (Object '%s' is not a variable at %L, n-sym-name,
code-loc);


[Bug fortran/46271] [F03] OpenMP default(none) and procedure pointers

2013-08-12 Thread janus at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46271

--- Comment #3 from janus at gcc dot gnu.org ---
(In reply to mrestelli from comment #0)
 With version B:
 gfortran -fopenmp omp_test.f90 -o omp_test
 omp_test.f90: In function ‘test’:
 omp_test.f90:25:0: error: ‘pf’ not specified in enclosing parallel
 omp_test.f90:23:0: error: enclosing parallel

What is actually the problem here? That error message looks correct to me,
doesn't it?

[Bug target/57717] error: unrecognizable insn compiling ./strtod_l.c from glibc on powerpc-gnuspe

2013-08-12 Thread jules at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57717

jules at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jules at gcc dot gnu.org

--- Comment #7 from jules at gcc dot gnu.org ---
Here's another candidate patch:

http://gcc.gnu.org/ml/gcc-patches/2013-08/msg00668.html


[Bug fortran/52153] REAL128 gives extended precision, not quad precision

2013-08-12 Thread sgk at troutmask dot apl.washington.edu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52153

--- Comment #9 from Steve Kargl sgk at troutmask dot apl.washington.edu ---
On Mon, Aug 12, 2013 at 08:08:18AM +, latlon90180+gcc_bugzilla at gmail dot
com wrote:
 Is there any progress on this?
 REAL128 of gfortran4.8 is still 10.
 

Need a short example.  gfortran has supported a 128-bit real type
for quite some time (since 4.6).

real(4) a
real(8) b
real(10) c
real(16) d
print '(4(I0,1X))', digits(a), digits(b), digits(c), digits(d)
end

% gfortran46 -o z a.f90  ./z
24 53 53 113

PS: yes, the output is correct for real(10).  FreeBSD-i386's long double
only has 53-bits of precision.


[Bug fortran/52153] REAL128 gives extended precision, not quad precision

2013-08-12 Thread kargl at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52153

--- Comment #10 from kargl at gcc dot gnu.org ---
(In reply to Steve Kargl from comment #9)
 On Mon, Aug 12, 2013 at 08:08:18AM +, latlon90180+gcc_bugzilla at gmail
 dot com wrote:
  Is there any progress on this?
  REAL128 of gfortran4.8 is still 10.
  
 
 Need a short example.  gfortran has supported a 128-bit real type
 for quite some time (since 4.6).
 
 real(4) a
 real(8) b
 real(10) c
 real(16) d
 print '(4(I0,1X))', digits(a), digits(b), digits(c), digits(d)
 end
 
 % gfortran46 -o z a.f90  ./z
 24 53 53 113
 
 PS: yes, the output is correct for real(10).  FreeBSD-i386's long double
 only has 53-bits of precision.

Ignore.  I should have read the audit trail first.


[Bug fortran/46271] [F03] OpenMP default(none) and procedure pointers

2013-08-12 Thread janus at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46271

--- Comment #4 from janus at gcc dot gnu.org ---
(In reply to janus from comment #2)
 Here is a simple patch to accept version A:

... which regtests cleanly!


[Bug lto/58108] [4.9 regression] 32-bit g++.dg/torture/covariant-1.C -O2 -flto FAILs

2013-08-12 Thread ro at CeBiTec dot Uni-Bielefeld.DE
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58108

--- Comment #2 from ro at CeBiTec dot Uni-Bielefeld.DE ro at CeBiTec dot 
Uni-Bielefeld.DE ---
 --- Comment #1 from Jan Hubicka hubicka at gcc dot gnu.org ---
 Does this bug still reproduce (I fixed problem related to x86 local calls that
 may fix this too)

The failure still exists in a i386-pc-solaris2.10 bootstrap as of r201663.

Rainer


[Bug rtl-optimization/57451] Incorrect debug ranges emitted for -freorder-blocks-and-partition -g

2013-08-12 Thread ccoutant at google dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57451

--- Comment #9 from ccoutant at google dot com ---
 +  if (!active_insn_p (insn))
 +continue;

 I'm not clear on why this is needed. Is it because after the
 change_scope, insn will now be a NOTE? If that's it, just put the
 continue in the previous if clause.

 Because the notes were being skipped by the iteration over
 instructions, which previously only walked active instructions (notes
 are not active instructions). So to see the switch section note I had
 to walk all instructions, and just skip non-active instructions after
 I am done checking for the note of interest.

Oh, right. I didn't notice the change in the for loop.

-cary


[Bug c++/58138] New: #include random gives warning: macro __code_model_small__ is not used

2013-08-12 Thread sbergman at redhat dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58138

Bug ID: 58138
   Summary: #include random gives warning: macro
__code_model_small__ is not used
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sbergman at redhat dot com

At least with a trunk revision 201654 (aka LATEST-4.9) build (on Fedora 18
x86_64):

 $ cat test1.cc
 #include random
 namespace {}

 $ ~/gcc/LATEST-4.9/inst/bin/g++ -std=gnu++11 -Wunused-macros -c test1.cc
 test1.cc:2:12: warning: macro __code_model_small__ is not used 
 [-Wunused-macros]
  namespace {}
 ^

I was able to strip that down to the following excerpt of
~/gcc/LATEST-4.9/inst/lib/gcc/x86_64-unknown-linux-gnu/4.9.0/include/ia32intrin.h:

 $ cat test2.cc
 #include test2.h
 namespace {}

 $ cat test2.h
 #pragma GCC system_header
 #pragma GCC push_options
 #pragma GCC target(sse4.2)
 #pragma GCC pop_options

 $ ~/gcc/LATEST-4.9/inst/bin/g++ -Wunused-macros -c test2.cc test2.cc:2:12: 
 warning: macro __code_model_small__ is not used [-Wunused-macros]
  namespace {}
 ^

With a build of tags/gcc_4_8_1_release, compiling test1.cc does not give a
warning while test2.cc does.  And with a random old build of
branches/gcc-4_6-branch, compiling neither test1.cc nor test2.cc gives a
warning (replacing -std=gnu++11 with -std=gnu++0x when compiling test1.cc).


[Bug fortran/46271] [F03] OpenMP default(none) and procedure pointers

2013-08-12 Thread mrestelli at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46271

--- Comment #5 from mrestelli mrestelli at gmail dot com ---
(In reply to janus from comment #3)
 (In reply to mrestelli from comment #0)
  With version B:
  gfortran -fopenmp omp_test.f90 -o omp_test
  omp_test.f90: In function ‘test’:
  omp_test.f90:25:0: error: ‘pf’ not specified in enclosing parallel
  omp_test.f90:23:0: error: enclosing parallel
 
 What is actually the problem here? That error message looks correct to me,
 doesn't it?

Janus, you are probably right that version B should not compile. I
guess when I posted the bug report I was not sure which was the
correct version according to the OpenMP specifications, since fp is a
variable (requiring an OpenMP attribute), but it behaves like a
subroutine (so, no OpenMP attribute). Clearly however at least one of
the two versions should work, hence my pointing out that both
alternatives do not work. Well, at least this is my recollection,
since it was quite a while ago.

As a note, I mention that ifort (version 13.1) accepts both versions,
but maybe this is an issue with ifort itself.

Regards,
   Marco Restelli

[Bug fortran/46271] [F03] OpenMP default(none) and procedure pointers

2013-08-12 Thread janus at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46271

--- Comment #6 from janus at gcc dot gnu.org ---
Hi Marco,

 Janus, you are probably right that version B should not compile. I
 guess when I posted the bug report I was not sure which was the
 correct version according to the OpenMP specifications, since fp is a
 variable (requiring an OpenMP attribute), but it behaves like a
 subroutine (so, no OpenMP attribute).

well, since a procedure pointer can be assigned and change its value, I would
say it counts as a variable and one should make up one's mind whether it is
supposed to be shared or private in an OpenMP loop (as for any other variable,
this can clearly make a difference). Hence my interpretation that the error
message is correct.

However, I should note that I'm not much of an OpenMP expert and haven't
checked whether the OpenMP specifications makes any definitive statement about
this. It's merely my 'gut feeling'.


 As a note, I mention that ifort (version 13.1) accepts both versions,
 but maybe this is an issue with ifort itself.

ifort is not exactly known for it's strictness on invalid programs, and of
course it may have bugs. I don't know if this is allowed on purpose or if the
missing error is an oversight.

If ifort accepts the program, it would be interesting whether it treats the
procptr as private or shared with default(none), and whether this behavior is
documented somewhere (either in the OpenMP spec or the ifort docs).

Some people claim that documentation is the only thing that distinguishes a
feature from a bug ;)

Cheers,
Janus


[Bug target/58139] New: PowerPC volatile VSX register live across call

2013-08-12 Thread dje at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58139

Bug ID: 58139
   Summary: PowerPC volatile VSX register live across call
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: dje at gcc dot gnu.org

void tightness3_intrinsics2(double* A, double* B, int N) {
 __vector double * vA = (__vector double*)A;
 __vector double * vB = (__vector double*)B;
 __vector double va0, va1;
 double b0, b1, b2, b3;
 va0 = vA[0];
 va1 = vA[1];
 b0 = log(vec_extract(va0, 0));
 b1 = log(vec_extract(va0, 1));
 b2 = log(vec_extract(va1, 0));
 b3 = log(vec_extract(va1, 1));
 __vector double vb0 = {b0, b1};
 __vector double vb1 = {b2, b3};
 vB[0] = vb0;
 vB[1] = vb1;
}

 xxpermdi 1,63,63,2
 xxpermdi 30,30,29,0
 bl log
 nop
 addi 1,1,192
 li 0,-80
 stxvd2x 30,0,30

GCC should not expect VSX 30 to be preserved across the call to log().


[Bug target/58139] PowerPC volatile VSX register live across call

2013-08-12 Thread dje at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58139

David Edelsohn dje at gcc dot gnu.org changed:

   What|Removed |Added

 Target||powerpc*-*-*
 Status|UNCONFIRMED |NEW
   Keywords||wrong-code
   Last reconfirmed||2013-08-12
 CC||bergner at gcc dot gnu.org
   Host||powerpc*-*-*
 Ever confirmed|0   |1
  Known to fail||4.6.3, 4.7.3, 4.8.1
  Build||powerpc*-*-*

--- Comment #1 from David Edelsohn dje at gcc dot gnu.org ---
Confirmed.


[Bug c++/58140] New: -Wnon-virtual-dtor shouldn't fire for classes declared final

2013-08-12 Thread tudorb at fb dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58140

Bug ID: 58140
   Summary: -Wnon-virtual-dtor shouldn't fire for classes declared
final
   Product: gcc
   Version: 4.7.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tudorb at fb dot com

Created attachment 30636
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30636action=edit
Test case

In C++11, we can declare a class as final to indicate that it can't be
derived from. In that case, having a public non-virtual destructor is fine,
even if the class has virtual methods (no derived classes exist, so deleting an
instance via a pointer is always safe).

In the attached example, the warning should fire for NonFinalDerived, but not
for FinalDerived.


[Bug c++/58140] -Wnon-virtual-dtor shouldn't fire for classes declared final

2013-08-12 Thread tudorb at fb dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58140

--- Comment #1 from Tudor Bosman tudorb at fb dot com ---
(Tested with gcc 4.7.1, compiled with -std=c++11 -Wnon-virtual-dtor


[Bug middle-end/58134] -ftree-vectorizer-verbose=n shows vectroiyed loops only for N== 1 and N 2 but not for N==2

2013-08-12 Thread burnus at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58134

--- Comment #1 from Tobias Burnus burnus at gcc dot gnu.org ---
The reason is the following:
  dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
   Vectorized loop\n);

And in opts-global.c's dump_remap_tree_vectorizer_verbose:
  switch (value)
{
case 0:
  break;
case 1:
  remapped_opt_info = optimized;
  break;
case 2:
  remapped_opt_info = missed;
  break;
default:
  remapped_opt_info = all;
  break;
}

And dumpfile.h:
#define MSG_OPTIMIZED_LOCATIONS  (1  26)  /* -fopt-info optimized sources */
#define MSG_MISSED_OPTIMIZATION  (1  27)  /* missed opportunities */
#define MSG_NOTE (1  28)  /* general optimization info */
#define MSG_ALL (MSG_OPTIMIZED_LOCATIONS | MSG_MISSED_OPTIMIZATION \
 | MSG_NOTE)


[Bug middle-end/58134] [4.8/4.9 Regression] -ftree-vectorizer-verbose=n shows vectroiyed loops only for N== 1 and N 2 but not for N==2

2013-08-12 Thread burnus at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58134

Tobias Burnus burnus at gcc dot gnu.org changed:

   What|Removed |Added

 CC||singhai at gcc dot gnu.org
Summary|-ftree-vectorizer-verbose= |[4.8/4.9 Regression]
   |n shows vectroiyed loops   |-ftree-vectorizer-verbose=
   |only for N== 1 and N 2 but |n shows vectroiyed loops
   |not for N==2|only for N== 1 and N 2 but
   ||not for N==2

--- Comment #2 from Tobias Burnus burnus at gcc dot gnu.org ---
Using g++-4.7 -O3 -ftree-vectorizer-verbose=2 it works as one gets:
 7: LOOP VECTORIZED.

Seemingly caused by r193061


[Bug middle-end/58134] [4.8/4.9 Regression] -ftree-vectorizer-verbose=n shows vectroiyed loops only for N== 1 and N 2 but not for N==2

2013-08-12 Thread singhai at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58134

--- Comment #3 from Sharad Singhai singhai at gcc dot gnu.org ---
I think this is the intended behavior. While working on the new dump
infrastructure, I modified the behavior of -ftree-vectorizer-verbose.

Thus right now
-ftree-vectorizer-verbose=1 : dump info about optimized loops
...=2 : dump info about missed loops
...2 : dump info about optimized _and_ missed loops

Thus at 3 and greater, you are again seeing info available at 1. But really,
only 1 and 2 are meaningful. Anything higher is a combination of these two
kinds of information. This was a way to preserve compatibility with old
scripts, while deprecating this flag. I didn't see any tests relying on the old
behavior.

Here is the current documentation about this flag in gcc.info:

`-ftree-vectorizer-verbose=N'
 This option is deprecated and is implemented in terms of
 `-fopt-info'. Please use `-fopt-info-KIND' form instead, where
 KIND is one of the valid opt-info options. It prints additional
 optimization information.  For N=0 no diagnostic information is
 reported.  If N=1 the vectorizer reports each loop that got
 vectorized, and the total number of loops that got vectorized.  If
 N=2 the vectorizer reports locations which could not be vectorized
 and the reasons for those. For any higher verbosity levels all the
 analysis and transformation information from the vectorizer is
 reported.


[Bug c++/58140] -Wnon-virtual-dtor shouldn't fire for classes declared final

2013-08-12 Thread redi at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58140

Jonathan Wakely redi at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2013-08-12
 Ever confirmed|0   |1

--- Comment #2 from Jonathan Wakely redi at gcc dot gnu.org ---
This should be pretty simple to fix, but why use -Wnon-virtual-dtor anyway,
when -Wdelete-non-virtual-dtor is more accurate and more useful?


[Bug c++/58140] -Wnon-virtual-dtor shouldn't fire for classes declared final

2013-08-12 Thread redi at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58140

--- Comment #3 from Jonathan Wakely redi at gcc dot gnu.org ---
(In reply to Tudor Bosman from comment #0)
 In C++11, we can declare a class as final to indicate that it can't be
 derived from. In that case, having a public non-virtual destructor is fine,
 even if the class has virtual methods (no derived classes exist, so deleting
 an instance via a pointer is always safe).

N.B. this is only true if there's no base class with a public destructor, which
is true for your example, but not in general.


[Bug c/58141] New: [bfin]: ICE: Segmentation fault

2013-08-12 Thread canyon at recursivebliss dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58141

Bug ID: 58141
   Summary: [bfin]: ICE: Segmentation fault
   Product: gcc
   Version: 4.8.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: canyon at recursivebliss dot com

Created attachment 30637
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30637action=edit
Preprocessed source

Description of problem:
When compiling Das U-Boot for a Blackfin target, there is a internal compiler
error: Segmentation fault .

Version-Release number of selected component (if applicable):
bfin-uclinux-gcc 4.8.1

How reproducible:
Every time.

Steps to Reproduce:
1. git clone git://git.denx.de/u-boot.git
2. cd u-boot
3. make bf518f-ezbrd

Actual results:
main.c: In function 'builtin_run_command':
main.c:1434:1: internal compiler error: Segmentation fault
 }

Expected results:
Successful build.

Additional info:
If you comment out the call of process_macros in builtin_run_command the build
is successful.


[Bug tree-optimization/58137] [trunk, ICE] full unroll + AVX2 vectorization

2013-08-12 Thread bernd.edlinger at hotmail dot de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58137

Bernd Edlinger bernd.edlinger at hotmail dot de changed:

   What|Removed |Added

 CC||bernd.edlinger at hotmail dot 
de

--- Comment #2 from Bernd Edlinger bernd.edlinger at hotmail dot de ---
reproduced also with arm-none-eabi:

../arm-eabi/bin/arm-eabi-gcc -O3 -mfpu=neon -mfloat-abi=softfp 1.c


[Bug tree-optimization/58121] [4.9 regression] FAIL: cc1224a

2013-08-12 Thread ebotcazou at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58121

Eric Botcazou ebotcazou at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2013-08-12
 CC||ebotcazou at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Eric Botcazou ebotcazou at gcc dot gnu.org ---
I cannot reproduce:

=== acats tests ===

=== acats Summary ===
# of expected passes2320
# of unexpected failures0
Native configuration is ia64-unknown-linux-gnu

=== gnat tests ===


Running target unix

=== gnat Summary ===

# of expected passes1168
# of expected failures  18
# of unsupported tests  10


[Bug other/58133] GCC should emit arm assembly following the unified syntax

2013-08-12 Thread sven.koehler at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58133

--- Comment #1 from Sven sven.koehler at gmail dot com ---
It seems, that for targets like -mcpu=cortex-m4 the gcc does generate unified
syntax. So is the unified syntax only used for newer targets that use the
thumb2 instruction set whereas the divided syntax is used for older thumb1
targets?


[Bug c++/58142] New: _pthread_tsd_cleanup called before destructors are called

2013-08-12 Thread soonhok at cs dot cmu.edu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58142

Bug ID: 58142
   Summary: _pthread_tsd_cleanup called before destructors are
called
   Product: gcc
   Version: 4.8.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: soonhok at cs dot cmu.edu

Created attachment 30638
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30638action=edit
preprocessed input file

It seems when a thread is finished, its thread cleanup routine is called before
destructors of TLS(Thread Local Storage) variables are called and it causes
(possible) segmentation faults. I provided a simplified small program which
reproduces the behavior. Even if it doesn't generate a segmentation fault,
running valgrind over it indicates the same problem is going on in run-time.

This problem happens only on OSX. When I tried the same C++ code on Ubuntu12.04
with g++-4.8.1. There was no problem. I also tried with clang++-3.3 on OSX.
There was no problem either.


1. The exact version of GCC:

   gcc-4.8.1

2. the system type: 

   OSX 10.8.4, Darwin air 12.4.0 Darwin Kernel Version 12.4.0

3. the options given when GCC was configured/built:

   g++-4.8 -std=c++11 thread.cpp -O thread

4. the complete command line that triggers the bug;

   valgrind thread

5. the compiler output (error messages, warnings, etc.); and
the preprocessed file (*.i*) that triggers the bug, generated by adding
-save-temps to the complete compilation command, or, in the case of a bug
report for the GNAT front end, a complete set of source files (see below).

Preprocessed file is attached. Here is the original source code (much shorter):
==
#include thread
#include iostream
#include mutex
#include vector

static void foo() {
static thread_local std::vectorint v(1024);
if (v.size() != 1024) {
std::cerr  Error\n;
exit(1);
}
}

static void tst1() {
unsigned n = 5;
for (unsigned i = 0; i  n; i++) {
std::thread t([](){ foo(); });
t.join();
}
}

int main() {
tst1();
}
==

The following is the output of valgrind:

...

==18408== Invalid read of size 8
==18408==at 0x121B3: std::_Vector_baseint, std::allocatorint
::~_Vector_base() (in ./a.out)
==18408==by 0x12054: std::vectorint, std::allocatorint ::~vector()
(in ./a.out)
==18408==by 0xB9F5: (anonymous namespace)::run(void*) (in
/usr/local/lib/libstdc++.6.dylib)
==18408==by 0x29CA01: _pthread_exit (in /usr/lib/system/libsystem_c.dylib)
==18408==by 0x29C7AC: _pthread_start (in /usr/lib/system/libsystem_c.dylib)
==18408==by 0x2891E0: thread_start (in /usr/lib/system/libsystem_c.dylib)
==18408==  Address 0x1000257a8 is 8 bytes inside a block of size 32 free'd
==18408==at 0x5632: free (in
/usr/local/Cellar/valgrind/3.8.1/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==18408==by 0x1CAA12: emutls_destroy (in /usr/local/lib/libgcc_s.1.dylib)
==18408==by 0x101: ???
==18408==by 0xB0080E9F: ???
==18408==by 0xB008186F: ???
==18408==by 0x2A34DF: _pthread_tsd_cleanup (in
/usr/lib/system/libsystem_c.dylib)
==18408==by 0xB008105F: ???

...


[Bug target/58139] PowerPC volatile VSX register live across call

2013-08-12 Thread bergner at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58139

--- Comment #2 from Peter Bergner bergner at gcc dot gnu.org ---
This looks like a scheduling bug.  Just before sched2, we have:

(call_insn 29 28 31 2 (parallel [
(set (reg:DF 33 1)
(call (mem:SI (symbol_ref:DI (log) [flags 0x41] 
function_decl 0xfff92c41200 log) [0 __builtin_log S4 A8])
(const_int 64 [0x40])))
(use (const_int 0 [0]))
(clobber (reg:SI 65 lr))
]) bug.c:17 509 {*call_value_nonlocal_aix64}
 (expr_list:REG_EH_REGION (const_int 0 [0])
(nil))
(expr_list:REG_NON_LOCAL_GOTO (use (reg:DF 33 1))
(nil)))
(insn 31 29 34 2 (set (reg:V2DF 62 30 [orig:140 vb0 ] [140])
(unspec:V2DF [
(reg/v:DF 62 30 [orig:123 b0 ] [123])
(reg/v:DF 61 29 [orig:125 b1 ] [125])
] UNSPEC_VSX_CONCAT)) bug.c:18 920 {vsx_concat_v2df}
 (expr_list:REG_DEAD (reg/v:DF 61 29 [orig:125 b1 ] [125])
(expr_list:REG_EQUIV (mem:V2DF (reg/v/f:DI 30 30 [orig:133 B ] [133])
[2 MEM[(__vector double *)B_2(D)]+0 S16 A128])
(nil

Here, insn 31 sets VSX reg 62 (ie, fpr30,vsr30).  In DFmode, reg 62 is a
non-volatile register, but in V2DFmode, it is volatile.  After sched2, we have:

insn:TI 31 28 29 2 (set (reg:V2DF 62 30 [orig:140 vb0 ] [140])
(unspec:V2DF [
(reg/v:DF 62 30 [orig:123 b0 ] [123])
(reg/v:DF 61 29 [orig:125 b1 ] [125])
] UNSPEC_VSX_CONCAT)) bug.c:18 920 {vsx_concat_v2df}
 (expr_list:REG_DEAD (reg/v:DF 61 29 [orig:125 b1 ] [125])
(expr_list:REG_EQUIV (mem:V2DF (reg/v/f:DI 30 30 [orig:133 B ] [133])
[2 MEM[(__vector double *)B_2(D)]+0 S16 A128])
(nil
(call_insn 29 31 72 2 (parallel [
(set (reg:DF 33 1)
(call (mem:SI (symbol_ref:DI (log) [flags 0x41] 
function_decl 0xfff92c41200 log) [0 __builtin_log S4 A8])
(const_int 64 [0x40])))
(use (const_int 0 [0]))
(clobber (reg:SI 65 lr))
]) bug.c:17 509 {*call_value_nonlocal_aix64}
 (expr_list:REG_EH_REGION (const_int 0 [0])
(nil))
(expr_list:REG_NON_LOCAL_GOTO (use (reg:DF 33 1))
(nil)))

So it looks like the scheduler is somehow thinking that reg 62 is non-volatile
when it's really volatile in V2DFmode and moving it before the call which ends
up clobbering it.

Still digging.


[Bug c++/57416] internal compiler error: in gimple_expand_cfg, at cfgexpand.c:4575

2013-08-12 Thread paolo.carlini at oracle dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57416

--- Comment #8 from Paolo Carlini paolo.carlini at oracle dot com ---
The ICE is indeed fixed in mainline. I'm going to commit a (reduced) testcase
and close the issue.


[Bug c++/57416] internal compiler error: in gimple_expand_cfg, at cfgexpand.c:4575

2013-08-12 Thread paolo.carlini at oracle dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57416

Paolo Carlini paolo.carlini at oracle dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |4.9.0

--- Comment #9 from Paolo Carlini paolo.carlini at oracle dot com ---
Done.


[Bug go/58075] Unable to build go on ia64-hp-hpux11.31

2013-08-12 Thread pda at freeshell dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58075

--- Comment #2 from Paul Ackersviller pda at freeshell dot org ---
Thanks, I have sent this on to HP.

Should I report back a patch number, or whatever they end up responding with?


[Bug tree-optimization/58137] [trunk, ICE] full unroll + AVX2 vectorization

2013-08-12 Thread bernd.edlinger at hotmail dot de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58137

--- Comment #3 from Bernd Edlinger bernd.edlinger at hotmail dot de ---
Created attachment 30639
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30639action=edit
possible fix

This seems to be a bug in the constant folding of constant
vector values at forwprop4.

Could some one check if the generated code is now correct ?

Thanks.


[Bug go/58075] Unable to build go on ia64-hp-hpux11.31

2013-08-12 Thread ian at airs dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58075

--- Comment #3 from Ian Lance Taylor ian at airs dot com ---
Yes, please.  Thanks.


[Bug middle-end/58143] New: wrong code at -O3 on x86_64-linux-gnu

2013-08-12 Thread su at cs dot ucdavis.edu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58143

Bug ID: 58143
   Summary: wrong code at -O3 on x86_64-linux-gnu
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: su at cs dot ucdavis.edu

The current gcc trunk and gcc 4.8 produce wrong code for the following testcase
on x86_64-linux when compiled at -O3 (in both 32-bit and 64-bit modes). This is
a regression from 4.7.x.


$ gcc-trunk -v
gcc version 4.9.0 20130812 (experimental) [trunk revision 201658] (GCC) 
$ gcc-trunk -O2 small.c
$ a.out
0
$ gcc-4.7 -O3 small.c
$ a.out
0
$ gcc-trunk -O3 small.c
$ a.out
-1
$ gcc-4.8 -O3 small.c
$ a.out
-1
$ 


--


int printf (const char *, ...);

int a, b, c, d, e, f, g, h = 1, i;

int foo (int p)
{
  return p  0  a  -2147483647 - 1 - p ? 0 : 1;
}

int *bar ()
{
  int j; 
  i = h ? 0 : 1 % h;
  for (j = 0; j  1; j++)
for (d = 0; d; d++)
  for (e = 1; e;)
return 0;
  return 0;
}

int baz ()
{
  for (; b = 0; b--)
for (c = 1; c = 0; c--)
  {
int *k = c;
for (;;)
  {
for (f = 0; f  1; f++)
  {
g = foo (*k);
bar ();
  }
if (*k)
  break;
return 0;
  }
  }
  return 0;
}

int main ()
{
  baz ();
  printf (%d\n, b);
  return 0;
}


[Bug middle-end/58143] wrong code at -O3 on x86_64-linux-gnu

2013-08-12 Thread pinskia at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58143

--- Comment #1 from Andrew Pinski pinskia at gcc dot gnu.org ---
-2147483647 - 1 - p

Hmm, this overflows for p  1.


[Bug middle-end/58143] wrong code at -O3 on x86_64-linux-gnu

2013-08-12 Thread su at cs dot ucdavis.edu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58143

--- Comment #2 from Zhendong Su su at cs dot ucdavis.edu ---
Andrew, because of short-circuiting, when p = 0, the expression -2147483647 -
1 - p isn't actually evaluated. 

Thanks for looking into this so quickly! 

Zhendong


[Bug c++/58144] New: Receive virtual memory exhausted: Cannot allocate memory while compiling

2013-08-12 Thread amit.chitnis at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58144

Bug ID: 58144
   Summary: Receive virtual memory exhausted: Cannot allocate
memory while compiling
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: amit.chitnis at gmail dot com

g++ (GCC) 4.4.6 20130305 (Red Hat 4.4.6-4).

Steps to reproduce

1. create a small hello world program with iostream,stdio.h and stdlib.h
using namespace std;

2. create a big file (say 900M) named new in a location which is a part of
your include path.

3. Compile the hello world cpp and it should fail with the above error. This
seems to be because of the size and name of the file created in step 2 above.

 g++ -g -pthread -D_THREAD_SAFE -D_REENTRANT -I/opt/performance -o helloworld.o
-c helloworld.cpp

file new was created at location /opt/performance/


[Bug middle-end/58145] New: [Regression]: volatileness of write is discarded, perhaps in lim1 related to loop optimizations

2013-08-12 Thread hp at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145

Bug ID: 58145
   Summary: [Regression]: volatileness of write is discarded,
perhaps in lim1 related to loop optimizations
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Keywords: wrong-code
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hp at gcc dot gnu.org
Target: cris-*-*, crisv32-*-*

Created attachment 30640
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30640action=edit
Preprocessed code; compile at -O2, e.g. cc1 -O2 y.i -o y.s

The exact version in which the bug appeared is not yet triaged: it's present on
r201675 of trunk, r201652 of the 4.8 branch, r190527 of the 4.7 branch (!) but
appears to not be present in r135713 of the 4.3 branch (!).

The bug is that the volatileness of the dereference of the write (the
assignment through a pointer to a volatile structure) in function pb_out is
discarded, leaving a single write after the loop.  Note also that together with
the discarded-volatileness-bug there seems to be a missed-optimization-bug in
that the loop is redundant; the loop awkwardly computes iterates over 0..31 and
computes 1i but the intermediate computations aren't used; then the last
value is written after the loop. Editing the code to manually inline pb_out
makes no difference to the bug.

The wrong code is evident already in the .expand dump on trunk (according to
-da).  It is not present (according to -fdump-tree-all-all) in
y.i.096t.loopinit but appears present in y.i.097t.lim1.

Until someone (including myself) has repeated the observation for another
target, I'll set the target-specifier to cris*-* but it seems obviously
generic, affecting all targets.


Re: [patch, fortran] RFD: PR 56666 Allow suppression of zero-trip DO loop warning

2013-08-12 Thread Thomas Koenig
Hi Janus,

 OK for trunk?
 Looks good to m

Committed as rev. 201658; also committed a
snippet to the documentation.

Thanks for the review!

Thomas


[PATCH] TREE-SSA remove redundant condition checks in get_default_value

2013-08-12 Thread Zhouyi Zhou

In function get_default_value of tree-ssa-ccp.c,
261   else if (is_gimple_assign (stmt)
262/* Value-returning GIMPLE_CALL statements assign to
263   a variable, and are treated similarly to GIMPLE_ASSIGN.  */
264|| (is_gimple_call (stmt)
265 gimple_call_lhs (stmt) != NULL_TREE)
266|| gimple_code (stmt) == GIMPLE_PHI)
267 {
268   tree cst;
269   if (gimple_assign_single_p (stmt)
270DECL_P (gimple_assign_rhs1 (stmt))
271(cst = get_symbol_constant_value (gimple_assign_rhs1 (stmt
272 {
273   val.lattice_val = CONSTANT;
274   val.value = cst;
275 }
276   else
277 /* Any other variable defined by an assignment or a PHI node
278is considered UNDEFINED.  */
279 val.lattice_val = UNDEFINED;
if the stmt is a gimple call node or a gimple phi node, it will never satisfy
the condition gimple_assign_single_p (stmt). so there exists redundant condition
checks. The patch attached try to remove this.


Bootstrap passed. Regression tested on x86_64-unknown-linux-gnu (pc).

ChangeLog:
2013-08-13  Zhouyi Zhou  yizhouz...@ict.ac.cn
* tree-ssa-ccp.c (get_default_value): remove redundant condition checks 


-- 
Zhouyi Zhou yizhouz...@ict.ac.cn
diff --git a/gcc/tree-ssa-ccp.c b/gcc/tree-ssa-ccp.c
index 6472f48..7fbb687 100644
--- a/gcc/tree-ssa-ccp.c
+++ b/gcc/tree-ssa-ccp.c
@@ -258,12 +258,7 @@ get_default_value (tree var)
  val.mask = double_int_minus_one;
}
 }
-  else if (is_gimple_assign (stmt)
-  /* Value-returning GIMPLE_CALL statements assign to
- a variable, and are treated similarly to GIMPLE_ASSIGN.  */
-  || (is_gimple_call (stmt)
-   gimple_call_lhs (stmt) != NULL_TREE)
-  || gimple_code (stmt) == GIMPLE_PHI)
+  else if (is_gimple_assign (stmt))
 {
   tree cst;
   if (gimple_assign_single_p (stmt)
@@ -274,10 +269,18 @@ get_default_value (tree var)
  val.value = cst;
}
   else
-   /* Any other variable defined by an assignment or a PHI node
+   /* Any other variable defined by an assignment
   is considered UNDEFINED.  */
val.lattice_val = UNDEFINED;
 }
+  else if ((is_gimple_call (stmt)
+   gimple_call_lhs (stmt) != NULL_TREE)
+  || gimple_code (stmt) == GIMPLE_PHI)
+{
+  /*Variable defined by a call or a PHI node
+   is considered UNDEFINED. */
+  val.lattice_val = UNDEFINED;
+}
   else
 {
   /* Otherwise, VAR will never take on a constant value.  */


Re: [PATCH] x86-64 gcc generate wrong assembly instruction movabs for intel syntax

2013-08-12 Thread Uros Bizjak
Hello!

 movabs is incorrectly translated into mov [rax], -1, and causes
 compile error Error: ambiguous operand size for `mov' .
 It should be mov QWORD PTR [rax], -1

 Bootstrap passed. Regression tested on x86_64-unknown-linux-gnu (pc).

 2013-08-10  Perez Read netfirew...@gmail.com

 * config/i386/i386.md (*movabsmode_1) :  Add ptrsize PTR before
 operand 0 for intel asm alternative.

 * testsuite/gcc.target/i386/movabs-1.c : New test.

You should mention PR number in the ChangeLog.

Looks OK, but I think that for consistency this decoration should also
be added to *movabsmode_2 pattern.

Uros.


Re: [PATCH v2 00/18] resurrect automatic dependencies

2013-08-12 Thread Tom Tromey
 Tom == Tom Tromey tro...@redhat.com writes:

Tom This is a refresh of my series to resurrect automatic dependency
Tom tracking.

Ping.

Tom


Re: [PATCH] Fix PR57980

2013-08-12 Thread Marek Polacek
On Fri, Aug 09, 2013 at 08:40:00PM +0200, Richard Biener wrote:
 Marek Polacek pola...@redhat.com wrote:
 In this PR the problem was that when dealing with the gimple assign in
 the tailcall optimization, we, when the rhs operand is of a vector
 type, need to create -1 also of a vector type, but build_int_cst
 doesn't create vectors (ICEs).  Instead, we should use
 build_minus_one_cst
 because that can create even the VECTOR_TYPE constant (and, it can
 create even REAL_TYPE/COMPLEX_TYPE), as suggested by Marc.
 
 Regtested/bootstrapped on x86_64-linux, ok for trunk and 4.8?
 
 Ok. Double-check that this function exists on the branch please.

It does not :(.  So not backporting to 4.8...

Marek


Re: [wwwdocs] Add link to @gnutools on Twitter

2013-08-12 Thread James Greenhalgh
On Mon, Aug 12, 2013 at 01:20:03AM +0100, Gerald Pfeifer wrote:
 David suggested adding this link, and I think it fits nicely.

Does this also deserve a news post? I certainly found it to be
interesting news!

James



Re: [PATCH] x86-64 gcc generate wrong assembly instruction movabs for intel syntax

2013-08-12 Thread Perez Read
On Mon, Aug 12, 2013 at 2:52 PM, Uros Bizjak ubiz...@gmail.com wrote:
 Hello!

 movabs is incorrectly translated into mov [rax], -1, and causes
 compile error Error: ambiguous operand size for `mov' .
 It should be mov QWORD PTR [rax], -1

 Bootstrap passed. Regression tested on x86_64-unknown-linux-gnu (pc).

 2013-08-10  Perez Read netfirew...@gmail.com

 * config/i386/i386.md (*movabsmode_1) :  Add ptrsize PTR before
 operand 0 for intel asm alternative.

 * testsuite/gcc.target/i386/movabs-1.c : New test.

 You should mention PR number in the ChangeLog.

 Looks OK, but I think that for consistency this decoration should also
 be added to *movabsmode_2 pattern.

 Uros.

Hello,

After the test, I think we can skip this pattern.
Because the operand 0 must be the register, the assembler will
determine the size automatically.

Perez

Fixed ChangeLog
2013-08-10  Perez Read netfirew...@gmail.com

 PR target/58132
 * config/i386/i386.md (*movabsmode_1) :  Add ptrsize PTR before
 operand 0 for intel asm alternative.

 * testsuite/gcc.target/i386/movabs-1.c : New test.


Re: [PATCH] x86-64 gcc generate wrong assembly instruction movabs for intel syntax

2013-08-12 Thread Uros Bizjak
On Mon, Aug 12, 2013 at 11:24 AM, Perez Read netfirew...@gmail.com wrote:

 movabs is incorrectly translated into mov [rax], -1, and causes
 compile error Error: ambiguous operand size for `mov' .
 It should be mov QWORD PTR [rax], -1

 Bootstrap passed. Regression tested on x86_64-unknown-linux-gnu (pc).

 2013-08-10  Perez Read netfirew...@gmail.com

 * config/i386/i386.md (*movabsmode_1) :  Add ptrsize PTR before
 operand 0 for intel asm alternative.

 * testsuite/gcc.target/i386/movabs-1.c : New test.

 You should mention PR number in the ChangeLog.

 Looks OK, but I think that for consistency this decoration should also
 be added to *movabsmode_2 pattern.

 Uros.

 Hello,

 After the test, I think we can skip this pattern.
 Because the operand 0 must be the register, the assembler will
 determine the size automatically.

As said, I don't want two similar patterns with a different asm
template in i386.md. So, if decorating movabsmode_2 works OK, I
propose to change both patterns with your change.

Uros.


Backport from trune:

2013-08-12 Thread Andrew Haley
I think this one is obvious/trivial, but I'll ask anyway.

OK?

Andrew.


2013-08-12  Andrew Haley  a...@redhat.com

Backport from mainline:
* 2013-07-11  Andreas Schwab  sch...@suse.de

* config/aarch64/aarch64-linux.h (CPP_SPEC): Define.

Index: gcc/config/aarch64/aarch64-linux.h
===
--- gcc/config/aarch64/aarch64-linux.h  (revision 201661)
+++ gcc/config/aarch64/aarch64-linux.h  (working copy)
@@ -23,6 +23,8 @@

 #define GLIBC_DYNAMIC_LINKER /lib/ld-linux-aarch64.so.1

+#define CPP_SPEC %{pthread:-D_REENTRANT}
+
 #define LINUX_TARGET_LINK_SPEC  %{h*} \
%{static:-Bstatic}  \
%{shared:-shared}   \



RFA: AVR: Support building AVR Linux targets

2013-08-12 Thread Nick Clifton
Hi Dennis, Hi Anatoly, Hi Eric,

  I have run into a small problem building GCC for an AVR Linux target -
  glibc-c.o is not being built.  It turns out that the section handling
  avr-*-* in the config.gcc file is redefining tmake_file without
  allowing for the fact that t-glibc has already been added to it.

  The patch below is the obvious fix for this problem, but I have not
  committed it because it occurred to me that there might be some AVR
  specific reason for not including t-glibc.  So - is the patch OK, or
  is there some other way of fixing the problem ?

Cheers
  Nick

gcc/ChangeLog
2013-08-12  Nick Clifton  ni...@redhat.com

* config.gcc (avr-*-*): Allow for tmake_file not being empty.

Index: gcc/config.gcc
===
--- gcc/config.gcc  (revision 201658)
+++ gcc/config.gcc  (working copy)
@@ -1001,7 +1001,7 @@
tm_file=${tm_file} ${cpu_type}/avrlibc.h
tm_defines=${tm_defines} WITH_AVRLIBC
fi
-   tmake_file=avr/t-avr avr/t-multilib
+   tmake_file=${tmake_file} avr/t-avr avr/t-multilib
use_gcc_stdint=wrap
extra_gcc_objs=driver-avr.o avr-devices.o
extra_objs=avr-devices.o avr-log.o

  


[RFC] Bare bones of virtual call tracking

2013-08-12 Thread Jan Hubicka
Hi,
this patch represents bare bones of what I hope to give me possible targets
of a virtual call.

I basically added One Definition Rule based hash that unify all types that
are same in C++ sense (with LTO many of those are still not merged - I hope
that with few dumps I can improve the merging, too). So every type used in
virtual method declaration gets assigned odr_type entry.

Then I use BINFO_BASE_BINFOS to walk direct bases and produce a type inheritance
graph linking type with its bases but also with its derived types.

So I get:

jan@linux-9ure:~/trunk/build/gcc ./xgcc -B ./ -O2 devirt-1.C

 type 0: struct A
 defined at: devirt-1.C:7
 methods:
   virtual int A::foo(int)/0
 derived types:
   type 1: struct C
   defined at: devirt-1.C:20
   base odr type ids:  0
   methods:
 virtual int C::foo(int)/2

   type 2: struct B
   defined at: devirt-1.C:14
   base odr type ids:  0
   methods:
 virtual int B::foo(int)/1

I think in future I can also use this for LTO merging (i.e. merge binfos of all
types equivalent by ODR) and perhaps canonical types can be refined to honor
ODR when there is no non-ODR language type of same layout.

Now for single inheritance, I think my work is easy:

I have token and type of the virtual call taken from OBJ_TYPE_REF.  I think
I can just walk my inheritance graph now and on each entry look for method
with given token (I can take it from virtual table, or I can actually
use DECL_VINFO and complette my current partial tracking of them) and put
them into set.  Those should be all possible virtual call targets (defined
in current unit) of the call.

With multiple inheritance I need to adjust offsets.  I assume for every type,
I can simply walk binfos, look for mathing type of the call and look for method
at given token within the binfo.  This will be quadratic.

Other otion would be to track the offsets in my base to derived type link. But
I do not know how to obtain it, since BINFO_BASE_BINFOS do not track them.
Shall I look for TYPE_FIELDs instead? 
Does this approach seem to make sense?

Honza

Index: Makefile.in
===
--- Makefile.in (revision 201654)
+++ Makefile.in (working copy)
@@ -1275,6 +1275,7 @@
init-regs.o \
internal-fn.o \
ipa-cp.o \
+   ipa-devirt.o \
ipa-split.o \
ipa-inline.o \
ipa-inline-analysis.o \
@@ -2945,6 +2946,10 @@
$(TREE_PASS_H) $(GIMPLE_H) $(TARGET_H) $(GGC_H) pointer-set.h \
$(IPA_UTILS_H) tree-inline.h $(HASH_TABLE_H) profile.h $(PARAMS_H) \
$(LTO_STREAMER_H) $(DATA_STREAMER_H)
+ipa-devirt.o : ipa-devirt.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) 
$(CGRAPH_H) \
+   $(TREE_PASS_H) $(GIMPLE_H) $(TARGET_H) $(GGC_H) pointer-set.h \
+   $(IPA_UTILS_H) tree-inline.h $(HASH_TABLE_H) profile.h $(PARAMS_H) \
+   $(LTO_STREAMER_H) $(DATA_STREAMER_H)
 ipa-prop.o : ipa-prop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
langhooks.h $(GGC_H) $(TARGET_H) $(CGRAPH_H) $(IPA_PROP_H) $(DIAGNOSTIC_H) \
$(TREE_FLOW_H) $(TM_H) $(TREE_PASS_H) $(FLAGS_H) $(TREE_H) \
Index: ipa-devirt.c
===
--- ipa-devirt.c(revision 0)
+++ ipa-devirt.c(working copy)
@@ -0,0 +1,267 @@
+/* Basic IPA optimizations and utilities.
+   Copyright (C) 2003-2013 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+http://www.gnu.org/licenses/.  */
+
+#include config.h
+#include system.h
+#include coretypes.h
+#include tm.h
+#include cgraph.h
+#include tree-pass.h
+#include gimple.h
+#include ggc.h
+#include flags.h
+#include pointer-set.h
+#include target.h
+#include tree-iterator.h
+#include pointer-set.h
+#include hash-table.h
+#include params.h
+#include tree-pretty-print.h
+
+struct odr_type_d
+{
+  int id;
+  vectree types;
+  pointer_set_t *types_set;
+  vecstruct odr_type_d * bases;
+  vecstruct odr_type_d * derived_types;
+  vecstruct cgraph_node * methods;
+};
+
+typedef odr_type_d *odr_type;
+
+/* One Definition Rule hashtable helpers.  */
+
+struct odr_hasher 
+{
+  typedef odr_type_d value_type;
+  typedef odr_type_d compare_type;
+  static inline hashval_t hash (const value_type *);
+  static inline bool equal (const value_type *, const compare_type *);
+  static inline void remove (value_type *);
+};
+
+/* Return the computed hashcode for ODR_TYPE.  */
+
+inline hashval_t
+odr_hasher::hash (const 

Re: RFA: AVR: Support building AVR Linux targets

2013-08-12 Thread Denis Chertykov
2013/8/12 Nick Clifton ni...@redhat.com:
 Hi Dennis, Hi Anatoly, Hi Eric,

   I have run into a small problem building GCC for an AVR Linux target -
   glibc-c.o is not being built.  It turns out that the section handling
   avr-*-* in the config.gcc file is redefining tmake_file without
   allowing for the fact that t-glibc has already been added to it.

   The patch below is the obvious fix for this problem, but I have not
   committed it because it occurred to me that there might be some AVR
   specific reason for not including t-glibc.

I can't remember such reasons.

  So - is the patch OK, or
   is there some other way of fixing the problem ?

 Cheers
   Nick

 gcc/ChangeLog
 2013-08-12  Nick Clifton  ni...@redhat.com

 * config.gcc (avr-*-*): Allow for tmake_file not being empty.

 Index: gcc/config.gcc

Please Apply.

Denis.


Re: [PATCH, PR 57748] Set mode of structures with zero sized arrays to be BLK

2013-08-12 Thread David Abdurachmanov
Hi,

Ping. Any news of the following patch being included into the trunk?

Thanks,
david

On Aug 2, 2013, at 1:45 PM, Martin Jambor wrote:

 Hi,
 
 while compute_record_mode in stor-layout.c makes sure it assigns BLK
 mode to structs with flexible arrays, it has no such provisions for
 zero length arrays
 (http://gcc.gnu.org/onlinedocs/gcc-4.8.1/gcc/Zero-Length.html).  I
 think that in order to avoid problems and surprises like PR 57748
 (where this triggered code that was intended for small structures that
 fit into a scalar mode and ICEd), we should assign both variable array
 possibilities the same mode.
 
 Bootstrapped and tested on x86_64-linux without any problems.  OK for
 trunk and the 4.8 branch?  (I'm not sure about the 4.7, this PR does
 not happen there despite the wrong mode so I'd ignore it for now.)
 
 Thanks,
 
 Martin
 
 
 2013-08-01  Martin Jambor  mjam...@suse.cz
 
   PR middle-end/57748
   * stor-layout.c (compute_record_mode): Treat zero-sized array fields
   like incomplete types.
 
 testsuite/
   * gcc.dg/torture/pr57748.c: New test.
 
 
 *** /tmp/lV6Ba8_stor-layout.c Thu Aug  1 16:28:25 2013
 --- gcc/stor-layout.c Thu Aug  1 15:36:18 2013
 *** compute_record_mode (tree type)
 *** 1604,1610 
   integer_zerop (TYPE_SIZE (TREE_TYPE (field)
 || ! host_integerp (bit_position (field), 1)
 || DECL_SIZE (field) == 0
 !   || ! host_integerp (DECL_SIZE (field), 1))
   return;
 
/* If this field is the whole struct, remember its mode so
 --- 1604,1612 
   integer_zerop (TYPE_SIZE (TREE_TYPE (field)
 || ! host_integerp (bit_position (field), 1)
 || DECL_SIZE (field) == 0
 !   || ! host_integerp (DECL_SIZE (field), 1)
 !   || (TREE_CODE (TREE_TYPE (field)) == ARRAY_TYPE
 !tree_low_cst (DECL_SIZE (field), 1) == 0))
   return;
 
/* If this field is the whole struct, remember its mode so
 *** /dev/null Tue Jun  4 12:34:56 2013
 --- gcc/testsuite/gcc.dg/torture/pr57748.cThu Aug  1 15:42:14 2013
 ***
 *** 0 
 --- 1,45 
 + /* PR middle-end/57748 */
 + /* { dg-do run } */
 + 
 + #include stdlib.h
 + 
 + extern void abort (void);
 + 
 + typedef long long V
 +   __attribute__ ((vector_size (2 * sizeof (long long)), may_alias));
 + 
 + typedef struct S { V a; V b[0]; } P __attribute__((aligned (1)));
 + 
 + struct __attribute__((packed)) T { char c; P s; };
 + 
 + void __attribute__((noinline, noclone))
 + check (struct T *t)
 + {
 +   if (t-s.b[0][0] != 3 || t-s.b[0][1] != 4)
 + abort ();
 + }
 + 
 + int __attribute__((noinline, noclone))
 + get_i (void)
 + {
 +   return 0;
 + }
 + 
 + void __attribute__((noinline, noclone))
 + foo (P *p)
 + {
 +   V a = { 3, 4 };
 +   int i = get_i();
 +   p-b[i] = a;
 + }
 + 
 + int
 + main ()
 + {
 +   struct T *t = (struct T *) malloc (128);
 + 
 +   foo (t-s);
 +   check (t);
 + 
 +   return 0;
 + }



Re: [PATCH, PR 57748] Set mode of structures with zero sized arrays to be BLK

2013-08-12 Thread David Abdurachmanov
Hi,

Ping. Any news of the following patch being included into the trunk?

Thanks,
david

On Aug 2, 2013, at 1:45 PM, Martin Jambor wrote:

 Hi,
 
 while compute_record_mode in stor-layout.c makes sure it assigns BLK
 mode to structs with flexible arrays, it has no such provisions for
 zero length arrays
 (http://gcc.gnu.org/onlinedocs/gcc-4.8.1/gcc/Zero-Length.html).  I
 think that in order to avoid problems and surprises like PR 57748
 (where this triggered code that was intended for small structures that
 fit into a scalar mode and ICEd), we should assign both variable array
 possibilities the same mode.
 
 Bootstrapped and tested on x86_64-linux without any problems.  OK for
 trunk and the 4.8 branch?  (I'm not sure about the 4.7, this PR does
 not happen there despite the wrong mode so I'd ignore it for now.)
 
 Thanks,
 
 Martin
 
 
 2013-08-01  Martin Jambor  mjam...@suse.cz
 
   PR middle-end/57748
   * stor-layout.c (compute_record_mode): Treat zero-sized array fields
   like incomplete types.
 
 testsuite/
   * gcc.dg/torture/pr57748.c: New test.
 
 
 *** /tmp/lV6Ba8_stor-layout.c Thu Aug  1 16:28:25 2013
 --- gcc/stor-layout.c Thu Aug  1 15:36:18 2013
 *** compute_record_mode (tree type)
 *** 1604,1610 
   integer_zerop (TYPE_SIZE (TREE_TYPE (field)
 || ! host_integerp (bit_position (field), 1)
 || DECL_SIZE (field) == 0
 !   || ! host_integerp (DECL_SIZE (field), 1))
   return;
 
/* If this field is the whole struct, remember its mode so
 --- 1604,1612 
   integer_zerop (TYPE_SIZE (TREE_TYPE (field)
 || ! host_integerp (bit_position (field), 1)
 || DECL_SIZE (field) == 0
 !   || ! host_integerp (DECL_SIZE (field), 1)
 !   || (TREE_CODE (TREE_TYPE (field)) == ARRAY_TYPE
 !tree_low_cst (DECL_SIZE (field), 1) == 0))
   return;
 
/* If this field is the whole struct, remember its mode so
 *** /dev/null Tue Jun  4 12:34:56 2013
 --- gcc/testsuite/gcc.dg/torture/pr57748.cThu Aug  1 15:42:14 2013
 ***
 *** 0 
 --- 1,45 
 + /* PR middle-end/57748 */
 + /* { dg-do run } */
 + 
 + #include stdlib.h
 + 
 + extern void abort (void);
 + 
 + typedef long long V
 +   __attribute__ ((vector_size (2 * sizeof (long long)), may_alias));
 + 
 + typedef struct S { V a; V b[0]; } P __attribute__((aligned (1)));
 + 
 + struct __attribute__((packed)) T { char c; P s; };
 + 
 + void __attribute__((noinline, noclone))
 + check (struct T *t)
 + {
 +   if (t-s.b[0][0] != 3 || t-s.b[0][1] != 4)
 + abort ();
 + }
 + 
 + int __attribute__((noinline, noclone))
 + get_i (void)
 + {
 +   return 0;
 + }
 + 
 + void __attribute__((noinline, noclone))
 + foo (P *p)
 + {
 +   V a = { 3, 4 };
 +   int i = get_i();
 +   p-b[i] = a;
 + }
 + 
 + int
 + main ()
 + {
 +   struct T *t = (struct T *) malloc (128);
 + 
 +   foo (t-s);
 +   check (t);
 + 
 +   return 0;
 + }



Re: Fwd: [x86, PATCH] More effecient code for short unsigned conversion to float-point.

2013-08-12 Thread Kirill Yukhin
On 12 Aug 16:12, Yuri Rumyantsev wrote:

Hello,
part of the thread was accidentally removed from gcc-patches.

I've comitted Yuri's patch into ML: 
http://gcc.gnu.org/ml/gcc-cvs/2013-08/msg00272.html

As far as discussion was out of ML - feel free to object.

Thanks, K

 -- Forwarded message --
 From: Uros Bizjak ubiz...@gmail.com
 Date: 2013/8/7
 Subject: Re: [x86, PATCH] More effecient code for short unsigned
 conversion to float-point.
 To: Yuri Rumyantsev ysrum...@gmail.com
 
 
 Ah, OK, I see where I did a thinko.
 
 The patch looks OK, then.
 
 Uros.
 


[AArch64] Fix name of macros called in the vdup_lane Neon intrinsics

2013-08-12 Thread James Greenhalgh

Ugh. Typos in arm_neon.h macro names mean that scalar intrinsics end
up calling macros which don't exist.

So wherever I have written vget_laneq I should have written
vgetq_lane.

This gets fixed by:
http://gcc.gnu.org/ml/gcc-patches/2013-08/msg00535.html
which I was testing at the same time.

But, yuck that shouldn't have happened.

Tested on aarch64-none-elf with no regressions.

OK?

Thanks,
James

---
gcc/

* config/aarch64/arm_none.h
(vdupbhsd_lane_su8,16,32,64): Fix macro call.diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 73a5400..4a480fb 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -19780,49 +19780,49 @@ vcvtpq_u64_f64 (float64x2_t __a)
 __extension__ static __inline int8x1_t __attribute__ ((__always_inline__))
 vdupb_lane_s8 (int8x16_t a, int const b)
 {
-  return __aarch64_vget_laneq_s8 (a, b);
+  return __aarch64_vgetq_lane_s8 (a, b);
 }
 
 __extension__ static __inline uint8x1_t __attribute__ ((__always_inline__))
 vdupb_lane_u8 (uint8x16_t a, int const b)
 {
-  return __aarch64_vget_laneq_u8 (a, b);
+  return __aarch64_vgetq_lane_u8 (a, b);
 }
 
 __extension__ static __inline int16x1_t __attribute__ ((__always_inline__))
 vduph_lane_s16 (int16x8_t a, int const b)
 {
-  return __aarch64_vget_laneq_s16 (a, b);
+  return __aarch64_vgetq_lane_s16 (a, b);
 }
 
 __extension__ static __inline uint16x1_t __attribute__ ((__always_inline__))
 vduph_lane_u16 (uint16x8_t a, int const b)
 {
-  return __aarch64_vget_laneq_u16 (a, b);
+  return __aarch64_vgetq_lane_u16 (a, b);
 }
 
 __extension__ static __inline int32x1_t __attribute__ ((__always_inline__))
 vdups_lane_s32 (int32x4_t a, int const b)
 {
-  return __aarch64_vget_laneq_s32 (a, b);
+  return __aarch64_vgetq_lane_s32 (a, b);
 }
 
 __extension__ static __inline uint32x1_t __attribute__ ((__always_inline__))
 vdups_lane_u32 (uint32x4_t a, int const b)
 {
-  return __aarch64_vget_laneq_u32 (a, b);
+  return __aarch64_vgetq_lane_u32 (a, b);
 }
 
 __extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
 vdupd_lane_s64 (int64x2_t a, int const b)
 {
-  return __aarch64_vget_laneq_s64 (a, b);
+  return __aarch64_vgetq_lane_s64 (a, b);
 }
 
 __extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
 vdupd_lane_u64 (uint64x2_t a, int const b)
 {
-  return __aarch64_vget_laneq_s64 (a, b);
+  return __aarch64_vgetq_lane_u64 (a, b);
 }
 
 /* vld1 */

Re: [C++ PATCH] Grammar fix in pt.c comments.

2013-08-12 Thread Dodji Seketeli
Thank you for this patch, Adam.

Adam Butcher a...@jessamine.co.uk a écrit:

   * pt.c: Grammar fix in comments (it's to its).

FWIW, this change seems to fall under the obvious rule and thus, ought
to be committed.


 ---
  gcc/cp/pt.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

 diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
 index ce899ef..78b7a97 100644
 --- a/gcc/cp/pt.c
 +++ b/gcc/cp/pt.c
 @@ -1986,7 +1986,7 @@ determine_specialization (tree template_id,
 tree decl_arg_types;
  
 /* This is an ordinary member function.  However, since
 -  we're here, we can assume it's enclosing class is a
 +  we're here, we can assume its enclosing class is a
template class.  For example,
  
  template typename T struct S { void f(); };
 @@ -4337,7 +4337,7 @@ check_default_tmpl_args (tree decl, tree parms, bool 
 is_primary,
 || DECL_INITIALIZED_IN_CLASS_P (decl)))
  /* We already checked these parameters when the template was
 declared, so there's no need to do it again now.  This function
 -   was defined in class scope, but we're processing it's body now
 +   was defined in class scope, but we're processing its body now
 that the class is complete.  */
  return true;
  
 @@ -7555,7 +7555,7 @@ lookup_template_class_1 (tree d1, tree arglist, tree 
 in_decl, tree context,
   the one of #0.
  
   When we encounter #1, we want to store the partial instantiation
 - of M (templateclass T Sint::MT) in it's CLASSTYPE_TI_TEMPLATE.
 + of M (templateclass T Sint::MT) in its CLASSTYPE_TI_TEMPLATE.
  
   For all cases other than this explicit specialization of member of a
   class template, we just want to store the most general template into

-- 
Dodji


Re: [RFC Patch, Aarch64] : Macros for profile code generation to enable gprof support

2013-08-12 Thread Matthew Gretton-Dann
Marcus,

On 9 August 2013 18:17, Marcus Shawcroft marcus.shawcr...@arm.com wrote:
 On 03/08/13 19:01, Venkataramanan Kumar wrote:


 2013-08-02  Venkataramanan Kumar  venkataramanan.ku...@linaro.org

   * config/aarch64/aarch64.h (MCOUNT_NAME): Define.
 (NO_PROFILE_COUNTERS): Likewise.
 (PROFILE_HOOK): Likewise.
 (FUNCTION_PROFILER): Likewise.
  *  config/aarch64/aarch64.c (aarch64_function_profiler): Remove.
 .

 regards,
 Venkat.


 Hi Venkat,

 Looking at the various other ports it looks that the majority choose to use
 FUNCTION_PROFILER_HOOK rather than PROFILE_HOOK.

 Using PROFILE_HOOK to inject a regular call to to _mcount() means that all
 arguments passed in registers in every function will be spilled and reloaded
 because the _mcount call will kill the caller save registers.

 Using the FUNCTION_PROFILER_HOOK and taking care not to kill the caller save
 registers would be less invasive.  The LR argument to _mcount would need to
 be passed in a temporary register, say x9 and _mcount would also need to
 ensure caller save registers are saved and restored.

 The latter seems to be a better option to me, is there compelling reason to
 choose PROFILE_HOOK over FUNCTION_PROFILER_HOOK ??

(I think you mean FUNCTION_PROFILER rather than FUNCTION_PROFILER_HOOK
in all the above.)

Using either PROFILE_HOOK or FUNCTION_PROFILER results in a call chain
that looks like the following (assuming the C Library is glibc):

 Function - _mcount - _mcount_internal.

Where _mcount_internal is the C function that does the real work and
is provided in glibc.  Importantly this means that _mcount_internal
follows the normal ABI - so we have to save the caller saved registers
somewhere.

Using FUNCTION_PROFILER requires us to write assembler which saves and
restores all caller saved registers every time it is called, and
requires (as you say) a special ABI.  This means _mcount ends up being
a piece of assembly that saves all caller-saved registers (i.e.
parameter-passing  temporary registers) and then makes the call to
_mcount internal before restoring everything on _mcount's return.

Using PROFILE_HOOK will cause the compiler to do all the heavy
lifting, and it will do the minimum required (for example with a
function with one parameter it will only save and restore x0).
_mcount in this case can be a simple function that sets up some
parameters and calls _mcount_internal (or even _mcount could just
alias _mcount_internal).

As to which of PROFILE_HOOK or FUNCTION_PROFILER are the right way
(TM) - I don't know - the documentation isn't very clear at all.
PROFILE_HOOK was introduced to support profiling for AIX 4.3.
http://gcc.gnu.org/ml/gcc-patches/2000-12/msg00580.html is the initial
patch, with a reworked patch here:
http://gcc.gnu.org/ml/gcc-patches/2001-02/msg00112.html. The final
commit happening on 2001-02-05.  The patch was introduced because it
was impossible to make FUNCTION_PROFILER work for AIX 4.3 and so a new
hook that worked earlier in the compiler was needed.  There doesn't
seem to have been a discussion about preferring one form over the
other.

In conclusion - I prefer the PROFILE_HOOK method because it makes the
compiler do all the work, and results in less impact on stack usage
and performance.  FUNCTION_PROFILER may impact the code generated by
the compiler less and produce a smaller overall image - but I'm not
sure that's more beneficial.

Thanks,

Matt


-- 
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-d...@linaro.org


Re: [PATCH] x86-64 gcc generate wrong assembly instruction movabs for intel syntax

2013-08-12 Thread Perez Read
On Mon, Aug 12, 2013 at 5:26 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Mon, Aug 12, 2013 at 11:24 AM, Perez Read netfirew...@gmail.com wrote:

 movabs is incorrectly translated into mov [rax], -1, and causes
 compile error Error: ambiguous operand size for `mov' .
 It should be mov QWORD PTR [rax], -1

 Bootstrap passed. Regression tested on x86_64-unknown-linux-gnu (pc).

 2013-08-10  Perez Read netfirew...@gmail.com

 * config/i386/i386.md (*movabsmode_1) :  Add ptrsize PTR before
 operand 0 for intel asm alternative.

 * testsuite/gcc.target/i386/movabs-1.c : New test.

 You should mention PR number in the ChangeLog.

 Looks OK, but I think that for consistency this decoration should also
 be added to *movabsmode_2 pattern.

 Uros.

 Hello,

 After the test, I think we can skip this pattern.
 Because the operand 0 must be the register, the assembler will
 determine the size automatically.

 As said, I don't want two similar patterns with a different asm
 template in i386.md. So, if decorating movabsmode_2 works OK, I
 propose to change both patterns with your change.

 Uros.

Sorry for forgetting to Cc the mailing list.

There are new patch and changelog. Add ptrsize PTR to both patterns.
Bootstrap passed,  Regression tested on x86_64-unknown-linux-gnu (pc).

2013-08-10  Perez Read netfirew...@gmail.com

 PR target/58132

 * config/i386/i386.md (*movabsmode_1) :  Add ptrsize PTR before
 operand 0 for intel asm alternative.

 * testsuite/gcc.target/i386/movabs-1.c : New test.

2013-08-12  Perez Read netfirew...@gmail.com

 PR target/58132

 * config/i386/i386.md (*movabsmode_2) :  Add ptrsize PTR before
 operand 1 for intel asm alternative.


Thanks,
Perez


movabs.patch
Description: Binary data


Re: [PATCH] Convert more passes to new dump framework

2013-08-12 Thread Teresa Johnson
On Tue, Aug 6, 2013 at 10:23 PM, Teresa Johnson tejohn...@google.com wrote:
 On Tue, Aug 6, 2013 at 9:29 AM, Teresa Johnson tejohn...@google.com wrote:
 On Tue, Aug 6, 2013 at 9:01 AM, Martin Jambor mjam...@suse.cz wrote:
 Hi,

 On Tue, Aug 06, 2013 at 07:14:42AM -0700, Teresa Johnson wrote:
 On Tue, Aug 6, 2013 at 5:37 AM, Martin Jambor mjam...@suse.cz wrote:
  On Mon, Aug 05, 2013 at 10:37:00PM -0700, Teresa Johnson wrote:
  This patch ports messages to the new dump framework,
 
  It would be great this new framework was documented somewhere.  I lost
  track of what was agreed it would be and from the uses in the
  vectorizer I was never quite sure how to utilize it in other passes.

 Cc'ing Sharad who implemented this - Sharad, is this documented on a
 wiki or elsewhere?

 Thanks


 
  I'd also like to point out two other minor things inline:
 
  [...]
 
  2013-08-06  Teresa Johnson  tejohn...@google.com
  Dehao Chen  de...@google.com
 
  * dumpfile.c (dump_loc): Add column number to output, make 
  newlines
  consistent.
  * dumpfile.h (OPTGROUP_OTHER): Add and enable under 
  OPTGROUP_ALL.
  * ipa-inline-transform.c (clone_inlined_nodes):
  (cgraph_node_opt_info): New function.
  (cgraph_node_call_chain): Ditto.
  (dump_inline_decision): Ditto.
  (inline_call): Invoke dump_inline_decision.
  * doc/invoke.texi: Document optall -fopt-info flag.
  * profile.c (read_profile_edge_counts): Use new dump framework.
  (compute_branch_probabilities): Ditto.
  * passes.c (pass_manager::register_one_dump_file): Use 
  OPTGROUP_OTHER
  when pass not in any opt group.
  * value-prof.c (check_counter): Use new dump framework.
  (find_func_by_funcdef_no): Ditto.
  (check_ic_target): Ditto.
  * coverage.c (get_coverage_counts): Ditto.
  (coverage_init): Setup new dump framework.
  * ipa-inline.c (inline_small_functions): Set is_in_ipa_inline.
  * ipa-inline.h (is_in_ipa_inline): Declare.
 
  * testsuite/gcc.dg/pr40209.c: Use -fopt-info.
  * testsuite/gcc.dg/pr26570.c: Ditto.
  * testsuite/gcc.dg/pr32773.c: Ditto.
  * testsuite/g++.dg/tree-ssa/dom-invalid.C (struct C): Ditto.
 
 
  [...]
 
  Index: ipa-inline-transform.c
  ===
  --- ipa-inline-transform.c  (revision 201461)
  +++ ipa-inline-transform.c  (working copy)
  @@ -192,6 +192,108 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
   }
 
 
  +#define MAX_INT_LENGTH 20
  +
  +/* Return NODE's name and profile count, if available.  */
  +
  +static const char *
  +cgraph_node_opt_info (struct cgraph_node *node)
  +{
  +  char *buf;
  +  size_t buf_size;
  +  const char *bfd_name = lang_hooks.dwarf_name (node-symbol.decl, 0);
  +
  +  if (!bfd_name)
  +bfd_name = unknown;
  +
  +  buf_size = strlen (bfd_name) + 1;
  +  if (profile_info)
  +buf_size += (MAX_INT_LENGTH + 3);
  +
  +  buf = (char *) xmalloc (buf_size);
  +
  +  strcpy (buf, bfd_name);
  +
  +  if (profile_info)
  +sprintf (buf, %s (HOST_WIDEST_INT_PRINT_DEC), buf, 
  node-count);
  +  return buf;
  +}
 
  I'm not sure if output of this function is aimed only at the user or
  if it is supposed to be used by gcc developers as well.  If the
  latter, an incredibly useful thing is to also dump node-symbol.order
  too.  We usually dump it after / sign separating it from node name.
  It is invaluable when examining decisions in C++ code where you can
  have lots of clones of a node (and also because existing dumps print
  it, it is easy to combine them).

 The output is useful for both power users doing performance tuning of
 their application, and by gcc developers. Adding the id is not so
 useful for the former, but I agree that it is very useful for compiler
 developers. In fact, in the google branch version we emit more verbose
 information (the lipo module id and the funcdef_no) to help uniquely
 identify the routines and to aid in post-processing by humans and
 tools. So it is probably useful to add something similar here too. Is
 the node-symbol.order more or less unique than the funcdef_no? I see
 that you added a patch a few months ago to print the
 node-symbol.order in the function header, and it also has the
 advantage as you note of matching up with existing ipa dumps.

 node-symbol.order is unique and if I remember correctly, it is not
 even recycled.  Clones, inline clones, thunks, every symbol table node
 gets its own symbol order so it should be more unique than funcdef_no.
 On the other hand it may be a bit cryptic for users but at the same
 time it is only one number.

 Ok, I am going to go ahead and add this to the output.



 
  [...]
 
  Index: ipa-inline.c
  ===
  --- ipa-inline.c(revision 201461)
  +++ ipa-inline.c

[PATCH] Possible fix for PR57717 (PowerPC E500v2)

2013-08-12 Thread Julian Brown
Hi,

At present, mainline fails to build a PowerPC E500v2 cross-compiler for
me because of the bug described in PR57717. The attached patch is a
possible fix for that, although I have been struggling to obtain good
evidence that it is correct due to lack of a working current baseline.

Without the patch, the partially-built compiler ICEs during a
cross-build trying to reload a TImode load instruction: I think this is
because the RTL generated by the clause modified by the attached patch
in rs6000_legitimize_reload_address is not valid for TARGET_E500_DOUBLE.
Simply disallowing all greater-than UNITS_PER_WORD-sized modes seems to
suffice to fix this.

I have tested on current mainline with the candidate patch in
http://gcc.gnu.org/bugzilla//show_bug.cgi?id=57717#c3 and compared the
results with my patch: this gives the same results. I configured with:

[...] --enable-e500_double --with-long-double-128 --with-cpu=8548
--disable-decimal-float --disable-libvtv

with a target of powerpc-linux-gnuspe (this is with our internal build
tools, which unfortunately I can't share), and tested on real hardware.0
(The last two options given are just working around build errors.) The
other test cases in PR57717 appear to work correctly with my patch too.

Unfortunately results show significant degradation relative to r189800
(before the patch identified in PR57717 was applied), though I believe
this to be due to a cause other than my patch (there seems to be some
kind of stack corruption in execute tests -- I've not yet tracked this
down). Also -- possibly related -- I had to add a hack to
rs6000_dwarf_register_span to get through the build, i.e.:

@@ -28940,6 +28940,9 @@ rs6000_dwarf_register_span (rtx reg)
   unsigned regno = REGNO (reg);
   enum machine_mode mode = GET_MODE (reg);
 
+  /* FIXME: This function causes an ICE when emitting Dwarf.  */
+  return NULL_RTX;
+
   if (TARGET_SPE
regno  32
(SPE_VECTOR_MODE (GET_MODE (reg))

I am not proposing that particular patch for committing, of course.

OK to commit, or any comments? If anyone's in a position to do some
further testing on the patch, I'd be grateful for that!

Thanks,

Julian

ChangeLog

gcc/
* config/rs6000/rs6000.c (rs6000_legitimize_reload_address): Don't
perform invalid legitimization on greater-than-word-size modes for
TARGET_E500_DOUBLE.
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c	(revision 201609)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -6930,9 +6930,7 @@ rs6000_legitimize_reload_address (rtx x,
GET_CODE (XEXP (x, 1)) == CONST_INT
reg_offset_p
!SPE_VECTOR_MODE (mode)
-   !(TARGET_E500_DOUBLE  (mode == DFmode || mode == TFmode
-  || mode == DDmode || mode == TDmode
-  || mode == DImode))
+   !(TARGET_E500_DOUBLE  GET_MODE_SIZE (mode)  UNITS_PER_WORD)
(!VECTOR_MODE_P (mode) || VECTOR_MEM_NONE_P (mode)))
 {
   HOST_WIDE_INT val = INTVAL (XEXP (x, 1));


Re: [PATCH] x86-64 gcc generate wrong assembly instruction movabs for intel syntax

2013-08-12 Thread Perez Read
On Mon, Aug 12, 2013 at 9:51 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Mon, Aug 12, 2013 at 3:39 PM, Perez Read netfirew...@gmail.com wrote:

 movabs is incorrectly translated into mov [rax], -1, and causes
 compile error Error: ambiguous operand size for `mov' .
 It should be mov QWORD PTR [rax], -1

 Bootstrap passed. Regression tested on x86_64-unknown-linux-gnu (pc).

 2013-08-10  Perez Read netfirew...@gmail.com

 * config/i386/i386.md (*movabsmode_1) :  Add ptrsize PTR 
 before
 operand 0 for intel asm alternative.

 * testsuite/gcc.target/i386/movabs-1.c : New test.

 You should mention PR number in the ChangeLog.

 Looks OK, but I think that for consistency this decoration should also
 be added to *movabsmode_2 pattern.

 Uros.

 Hello,

 After the test, I think we can skip this pattern.
 Because the operand 0 must be the register, the assembler will
 determine the size automatically.

 As said, I don't want two similar patterns with a different asm
 template in i386.md. So, if decorating movabsmode_2 works OK, I
 propose to change both patterns with your change.

 Uros.

 There are new patch and changelog. Add ptrsize PTR to both patterns.
 Bootstrap passed,  Regression tested on x86_64-unknown-linux-gnu (pc).

 2013-08-10  Perez Read netfirew...@gmail.com

  PR target/58132

  * config/i386/i386.md (*movabsmode_1) :  Add ptrsize PTR before
  operand 0 for intel asm alternative.

  * testsuite/gcc.target/i386/movabs-1.c : New test.

 2013-08-12  Perez Read netfirew...@gmail.com

  PR target/58132

  * config/i386/i386.md (*movabsmode_2) :  Add ptrsize PTR before
  operand 1 for intel asm alternative.

 Just merge these two ChangeLog entries.

 OK with this change.

 BTW: Do you have SVN committ access? Otherwise, I will take care to
 merge your change.

 Uros.

Ok, and I add a space before second ptrsize PTR, which corrects
the coding style.
I don't have SVN committ access, so thanks for helping me.

2013-08-12  Perez Read netfirew...@gmail.com

 PR target/58132
 * config/i386/i386.md (*movabsmode_1) :  Add ptrsize PTR before
 operand 0 for intel asm alternative.
 (*movabsmode_2): Ditto for operand 1.

 * testsuite/gcc.target/i386/movabs-1.c : New test.


Thanks,
Perez


movabs.patch
Description: Binary data


Commit: M32R: Fix config problem building m32r-linux toolchains

2013-08-12 Thread Nick Clifton
Hi Guys,

  I am applying the patch below to fix a small problem building
  m32r-linux toolchains - the glibc-c.o object file was not being built
  because the definition of tmake_file in M32R section of config.gcc was
  not allowing for the inclusion of t-glibc.

Cheers
  Nick

gcc/ChangeLog
2013-08-12  Nick Clifton  ni...@redhat.com

* config.gcc (m32r-linux): Allow for tmake_file not being empty.

Index: gcc/config.gcc
===
--- gcc/config.gcc  (revision 201658)
+++ gcc/config.gcc  (working copy)
@@ -1705,8 +1705,7 @@
;;
 m32r-*-linux*)
tm_file=dbxelf.h elfos.h gnu-user.h linux.h glibc-stdint.h ${tm_file} 
m32r/linux.h
-   # We override the tmake_file for linux -- why?
-   tmake_file=m32r/t-linux t-slibgcc
+   tmake_file=${tmake_file} m32r/t-linux t-slibgcc
gnu_ld=yes
if test x$enable_threads = xyes; then
thread_file='posix'
@@ -1714,8 +1713,7 @@
;;
 m32rle-*-linux*)
tm_file=dbxelf.h elfos.h gnu-user.h linux.h glibc-stdint.h 
m32r/little.h ${tm_file} m32r/linux.h
-   # We override the tmake_file for linux -- why?
-   tmake_file=m32r/t-linux t-slibgcc
+   tmake_file=${tmake_file} m32r/t-linux t-slibgcc
gnu_ld=yes
if test x$enable_threads = xyes; then
thread_file='posix'


Re: [RFC] Bare bones of virtual call tracking

2013-08-12 Thread Jason Merrill

On 08/12/2013 08:16 AM, Jan Hubicka wrote:

With multiple inheritance I need to adjust offsets.


It's not clear to me that you need to worry about that in your search. 
A call through a particular vptr can only call overrides that go into a 
vtable that vptr can point to, and you can look up any thunk adjustments 
from the vtable.



+  /* First skip wrappers that C++ FE puts randomly into types.  */
+  while (TREE_CODE (t) == TYPE_DECL
+ DECL_ORIGINAL_TYPE (t))


How can you get a decl in your types array?

Jason



  1   2   >