[Bug lto/50494] gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto

2013-03-06 Thread rguenth at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494



Richard Biener  changed:



   What|Removed |Added



 Status|REOPENED|RESOLVED

 Resolution||FIXED



--- Comment #36 from Richard Biener  2013-03-06 
08:45:22 UTC ---

Fixed.


[Bug lto/50494] gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto

2013-03-06 Thread rguenth at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494



--- Comment #35 from Richard Biener  2013-03-06 
08:38:50 UTC ---

Author: rguenth

Date: Wed Mar  6 08:38:46 2013

New Revision: 196487



URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=196487

Log:

2013-03-06  Richard Biener  



PR middle-end/50494

* tree-vect-data-refs.c (vect_can_force_dr_alignment_p):

Do not adjust alignment of DECL_IN_CONSTANT_POOL decls.



Revert

2013-02-13  Richard Biener  



PR lto/50494

* varasm.c (output_constant_def_1): Get the decl representing

the constant as argument.

(output_constant_def): Wrap output_constant_def_1.

(make_decl_rtl): Use output_constant_def_1 with the decl

representing the constant.

(build_constant_desc): Optionally re-use a decl already

representing the constant.

(tree_output_constant_def): Adjust.



Modified:

trunk/gcc/ChangeLog

trunk/gcc/tree-vect-data-refs.c

trunk/gcc/varasm.c


[Bug lto/50494] gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto

2013-03-05 Thread ebotcazou at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494



--- Comment #34 from Eric Botcazou  2013-03-05 
16:05:44 UTC ---

Created attachment 29587

  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29587

Patch to restore LTO bootstrap with Ada + comment tweaks



OK, this is the patch I've tested, but it's your call.


[Bug lto/50494] gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto

2013-03-05 Thread rguenther at suse dot de


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494



--- Comment #33 from rguenther at suse dot de  
2013-03-05 15:48:03 UTC ---

On Tue, 5 Mar 2013, ebotcazou at gcc dot gnu.org wrote:



> 

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494

> 

> --- Comment #32 from Eric Botcazou  2013-03-05 
> 15:32:15 UTC ---

> > But in all places I found we check TREE_ASM_WRITTEN on DECL_INITIAL

> > of the SYMBOL_REF_DECL ...

> 

> Nope, maybe_output_constant_def_contents has:

> 

>   rtx symbol = XEXP (desc->rtl, 0);

>   tree exp = desc->value;

> 

>   if (flag_syntax_only)

> return;

> 

>   if (TREE_ASM_WRITTEN (exp))

> /* Already output; don't do it again.  */

> return;

> 

> so the DECL_INITIAL of the SYMBOL_REF_DECL must be desc->value.



Ah, ok ... too many smart TREE_ASM_WRITTEN bits around ...



> > So it must be pure luck that we survived LTO bootstrap before my

> > patch (as far as I understand things).  Before my patch we created

> > yet another decl for the constant pool entry.  With my patch

> > we will have one less (we still have the decls from the constant

> > pool entries that end up being shared in the LTRANS unit).

> 

> We use LTO on heavy Ada applications with an unmodified 4.7 compiler (modulo a

> patch for -g) so I don't think that luck has anything to do here.



Fine, so the above might be the only issue.



> > So in the end I can agree to that my patch doesn't really fix

> > the original issue fully.  So we can as well revert it and

> > instead avoid messing with alignment of the constant pool entries.

> 

> That would be my preference.



Which of course pessimizes code generation and probably causes

some testsuite fallout for ppc (the original reported spurious

fails).



> > Hmm, maybe.  Then, why do we copy the constant in the first place ...

> > 

> > Thus,

> > 

> > Index: varasm.c

> > ===

> > --- varasm.c(revision 196462)

> > +++ varasm.c(working copy)

> > @@ -3087,7 +3087,7 @@ build_constant_desc (tree exp, tree decl

> >int labelno;

> > 

> >desc = ggc_alloc_constant_descriptor_tree ();

> > -  desc->value = copy_constant (exp);

> > +  desc->value = exp;

> > 

> >/* Propagate marked-ness to copied constant.  */

> >if (flag_mudflap && mf_marked_p (exp))

> > 

> > should be an "equivalent" fix.

> 

> This call to copy_constant has been there for ages though. so a little bit of

> archeology would probably be in order before removing it.



;)  Did that, it's there since forever - well, I traced it back to

the point we only deferred string constants:



 37459  jakub p = (struct deferred_string *)

 37459  jakub xmalloc (sizeof (struct 

deferred_string));

 37459  jakub 

 37459  jakub p->exp = copy_constant (exp);

 37459  jakub p->label = desc->label;



I'm LTO bootstrapping



  desc->value = decl ? exp : copy_constant (exp);



and doing a regular bootstrap with the copy_constant completely removed

at the moment.  Just curious ...



> In the meantime, I've successfully bootstrapped my patchlet so we can also go

> for it.



I'm fine with that.



As I concluded that the original fix didn't fix the alignment issue

(well, not for all possible cases at least) reverting the original

fix works for me as well.



I'm working on a patch to avoid re-aligning constant pool entries.


[Bug lto/50494] gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto

2013-03-05 Thread ebotcazou at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494



--- Comment #32 from Eric Botcazou  2013-03-05 
15:32:15 UTC ---

> But in all places I found we check TREE_ASM_WRITTEN on DECL_INITIAL

> of the SYMBOL_REF_DECL ...



Nope, maybe_output_constant_def_contents has:



  rtx symbol = XEXP (desc->rtl, 0);

  tree exp = desc->value;



  if (flag_syntax_only)

return;



  if (TREE_ASM_WRITTEN (exp))

/* Already output; don't do it again.  */

return;



so the DECL_INITIAL of the SYMBOL_REF_DECL must be desc->value.



> So it must be pure luck that we survived LTO bootstrap before my

> patch (as far as I understand things).  Before my patch we created

> yet another decl for the constant pool entry.  With my patch

> we will have one less (we still have the decls from the constant

> pool entries that end up being shared in the LTRANS unit).



We use LTO on heavy Ada applications with an unmodified 4.7 compiler (modulo a

patch for -g) so I don't think that luck has anything to do here.



> So in the end I can agree to that my patch doesn't really fix

> the original issue fully.  So we can as well revert it and

> instead avoid messing with alignment of the constant pool entries.



That would be my preference.



> Hmm, maybe.  Then, why do we copy the constant in the first place ...

> 

> Thus,

> 

> Index: varasm.c

> ===

> --- varasm.c(revision 196462)

> +++ varasm.c(working copy)

> @@ -3087,7 +3087,7 @@ build_constant_desc (tree exp, tree decl

>int labelno;

> 

>desc = ggc_alloc_constant_descriptor_tree ();

> -  desc->value = copy_constant (exp);

> +  desc->value = exp;

> 

>/* Propagate marked-ness to copied constant.  */

>if (flag_mudflap && mf_marked_p (exp))

> 

> should be an "equivalent" fix.



This call to copy_constant has been there for ages though. so a little bit of

archeology would probably be in order before removing it.



In the meantime, I've successfully bootstrapped my patchlet so we can also go

for it.


[Bug lto/50494] gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto

2013-03-05 Thread rguenther at suse dot de


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494



--- Comment #31 from rguenther at suse dot de  
2013-03-05 15:09:06 UTC ---

On Tue, 5 Mar 2013, ebotcazou at gcc dot gnu.org wrote:



> 

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494

> 

> --- Comment #28 from Eric Botcazou  2013-03-05 
> 14:23:19 UTC ---

> > Hmm, but when I use the same contents for the two arrays in my simple

> > testcase I do get only a single .LC0 output referenced from two places.

> > We will end up sharing the same RTL for both (unmerged) DECLs - but

> > I don't see how this can be a problem?  Maybe we fail to set

> > TREE_ASM_WRITTEN on the duplicate and output it anyway via other 

> > mechanisms?

> 

> We simply fail to set TREE_ASM_WRITTEN on the DECL_INITIAL (decl) because it 
> is

> not shared anymore.  So probably:



But in all places I found we check TREE_ASM_WRITTEN on DECL_INITIAL

of the SYMBOL_REF_DECL ...



So it must be pure luck that we survived LTO bootstrap before my

patch (as far as I understand things).  Before my patch we created

yet another decl for the constant pool entry.  With my patch

we will have one less (we still have the decls from the constant

pool entries that end up being shared in the LTRANS unit).



So in the end I can agree to that my patch doesn't really fix

the original issue fully.  So we can as well revert it and

instead avoid messing with alignment of the constant pool entries.



But then the underlying issue with multiple decls refering to

the same constant pool entry with LTO remains, it just happens

that it isn't triggered anymore.



> Index: varasm.c

> ===

> --- varasm.c(revision 196416)

> +++ varasm.c(working copy)

> @@ -3112,7 +3112,6 @@ build_constant_desc (tree exp, tree decl

>  LTO mode.  Instead we set the flag that will be recognized in

>  make_decl_rtl.  */

>DECL_IN_CONSTANT_POOL (decl) = 1;

> -  DECL_INITIAL (decl) = desc->value;

>/* ??? CONSTANT_ALIGNMENT hasn't been updated for vector types on most

>  architectures so use DATA_ALIGNMENT as well, except for strings.  */

>if (TREE_CODE (exp) == STRING_CST)

> @@ -3125,6 +3124,8 @@ build_constant_desc (tree exp, tree decl

> align_variable (decl, 0);

>  }

> 

> +  DECL_INITIAL (decl) = desc->value;

> +

>/* Now construct the SYMBOL_REF and the MEM.  */

>if (use_object_blocks_p ())

>  {



Hmm, maybe.  Then, why do we copy the constant in the first place ...



Thus,



Index: varasm.c

===

--- varasm.c(revision 196462)

+++ varasm.c(working copy)

@@ -3087,7 +3087,7 @@ build_constant_desc (tree exp, tree decl

   int labelno;



   desc = ggc_alloc_constant_descriptor_tree ();

-  desc->value = copy_constant (exp);

+  desc->value = exp;



   /* Propagate marked-ness to copied constant.  */

   if (flag_mudflap && mf_marked_p (exp))



should be an "equivalent" fix.


[Bug lto/50494] gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto

2013-03-05 Thread ebotcazou at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494



--- Comment #30 from Eric Botcazou  2013-03-05 
14:39:00 UTC ---

> So we can revert the part of the patch that ends up not creating

> a new decl but only transfer DECL_ALIGN.  But then we still don't

> "merge" the decls we use to refer to the constants, so I wonder

> how creating less decls can fix this issue at all ...



That would be worse, DECL_ALIGN should _not_ be fiddled with for constant pool

entries in the first place since the constant/DECL_INITIAL is shared.



> Note that merging constants but not decls also can end up

> creating bogusly aligned constants.  In fact it seems to me

> that we need to arrange for the LTO path

> 

>   /* If this variable belongs to the global constant pool, retrieve the

>  pre-computed RTL or recompute it in LTO mode.  */

>   if (TREE_CODE (decl) == VAR_DECL && DECL_IN_CONSTANT_POOL (decl))

> {

>   SET_DECL_RTL (decl, output_constant_def_1 (DECL_INITIAL (decl),

>  decl, 1));

>   return;

> }

> 

> to never share a constant pool entry ... :/



We should simply not touch DECL_IN_CONSTANT_POOL variables, since they are not

regular VAR_DECLs but only represent the underlying constant.


[Bug lto/50494] gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto

2013-03-05 Thread rguenther at suse dot de


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494



--- Comment #29 from rguenther at suse dot de  
2013-03-05 14:26:00 UTC ---

On Tue, 5 Mar 2013, rguenther at suse dot de wrote:



> 

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494

> 

> --- Comment #27 from rguenther at suse dot de  
> 2013-03-05 13:58:15 UTC ---

> On Tue, 5 Mar 2013, ebotcazou at gcc dot gnu.org wrote:

> 

> > 

> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494

> > 

> > --- Comment #26 from Eric Botcazou  
> > 2013-03-05 13:51:28 UTC ---

> > > The question is why we don't hit lto-lang.c:lto_set_decl_assembler_name

> > > mangling of !TREE_PUBLIC decls when streaming in the decl for the constant

> > > pool entries (or when computing the assembler name at some point).

> > > I suppose we never compute decl-assembler-name for the constant pool 
> > > entries

> > > but emit them in a special way?

> > 

> > Right, see make_decl_rtl:

> > 

> >   /* If this variable belongs to the global constant pool, retrieve the

> >  pre-computed RTL or recompute it in LTO mode.  */

> >   if (TREE_CODE (decl) == VAR_DECL && DECL_IN_CONSTANT_POOL (decl))

> > {

> >   SET_DECL_RTL (decl, output_constant_def_1 (DECL_INITIAL (decl),

> >  decl, 1));

> >   return;

> > }

> > 

> > > At least build_constant_desc seems to create a raw SYMBOL_RER using the 
> > > raw

> > > created label?  But then, as we don't stream the constant-descs, the RTL 

> > > should refer to unique labels (but DECL_NAME and the SYMBOL_REF symbol do 
> > > not

> > > agree).

> > 

> > I think the problem is that, with your patch, the DECLs are not unified when

> > they have the same DECL_INITIAL (decl), even if they have the same RTL in 
> > the

> > end.

> 

> Hmm, but when I use the same contents for the two arrays in my simple

> testcase I do get only a single .LC0 output referenced from two places.

> We will end up sharing the same RTL for both (unmerged) DECLs - but

> I don't see how this can be a problem?  Maybe we fail to set

> TREE_ASM_WRITTEN on the duplicate and output it anyway via other 

> mechanisms?



So in fact the issue must be that we _do_ share the constant pool

entry but not the decl we use to refer to it.  Which isn't very

easily fixable (well, it _is_ fixable, but ... )



OTOH in assemble_variable we properly use



  /* If this symbol belongs to the tree constant pool, output the constant

 if it hasn't already been written.  */

  if (TREE_CONSTANT_POOL_ADDRESS_P (symbol))

{

  tree decl = SYMBOL_REF_DECL (symbol);

  if (!TREE_ASM_WRITTEN (DECL_INITIAL (decl)))

output_constant_def_contents (symbol);

  return;

}



so any constant pool entry shouldn't be emitted multiple times ...



> But I can reproduce the Ada LTO bootstrap failure ...



So we can revert the part of the patch that ends up not creating

a new decl but only transfer DECL_ALIGN.  But then we still don't

"merge" the decls we use to refer to the constants, so I wonder

how creating less decls can fix this issue at all ...



Note that merging constants but not decls also can end up

creating bogusly aligned constants.  In fact it seems to me

that we need to arrange for the LTO path



  /* If this variable belongs to the global constant pool, retrieve the

 pre-computed RTL or recompute it in LTO mode.  */

  if (TREE_CODE (decl) == VAR_DECL && DECL_IN_CONSTANT_POOL (decl))

{

  SET_DECL_RTL (decl, output_constant_def_1 (DECL_INITIAL (decl),

 decl, 1));

  return;

}



to never share a constant pool entry ... :/



Bah.  A reduced testcase would be nice to have ...


[Bug lto/50494] gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto

2013-03-05 Thread ebotcazou at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494



--- Comment #28 from Eric Botcazou  2013-03-05 
14:23:19 UTC ---

> Hmm, but when I use the same contents for the two arrays in my simple

> testcase I do get only a single .LC0 output referenced from two places.

> We will end up sharing the same RTL for both (unmerged) DECLs - but

> I don't see how this can be a problem?  Maybe we fail to set

> TREE_ASM_WRITTEN on the duplicate and output it anyway via other 

> mechanisms?



We simply fail to set TREE_ASM_WRITTEN on the DECL_INITIAL (decl) because it is

not shared anymore.  So probably:



Index: varasm.c

===

--- varasm.c(revision 196416)

+++ varasm.c(working copy)

@@ -3112,7 +3112,6 @@ build_constant_desc (tree exp, tree decl

 LTO mode.  Instead we set the flag that will be recognized in

 make_decl_rtl.  */

   DECL_IN_CONSTANT_POOL (decl) = 1;

-  DECL_INITIAL (decl) = desc->value;

   /* ??? CONSTANT_ALIGNMENT hasn't been updated for vector types on most

 architectures so use DATA_ALIGNMENT as well, except for strings.  */

   if (TREE_CODE (exp) == STRING_CST)

@@ -3125,6 +3124,8 @@ build_constant_desc (tree exp, tree decl

align_variable (decl, 0);

 }



+  DECL_INITIAL (decl) = desc->value;

+

   /* Now construct the SYMBOL_REF and the MEM.  */

   if (use_object_blocks_p ())

 {


[Bug lto/50494] gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto

2013-03-05 Thread rguenther at suse dot de


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494



--- Comment #27 from rguenther at suse dot de  
2013-03-05 13:58:15 UTC ---

On Tue, 5 Mar 2013, ebotcazou at gcc dot gnu.org wrote:



> 

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494

> 

> --- Comment #26 from Eric Botcazou  2013-03-05 
> 13:51:28 UTC ---

> > The question is why we don't hit lto-lang.c:lto_set_decl_assembler_name

> > mangling of !TREE_PUBLIC decls when streaming in the decl for the constant

> > pool entries (or when computing the assembler name at some point).

> > I suppose we never compute decl-assembler-name for the constant pool entries

> > but emit them in a special way?

> 

> Right, see make_decl_rtl:

> 

>   /* If this variable belongs to the global constant pool, retrieve the

>  pre-computed RTL or recompute it in LTO mode.  */

>   if (TREE_CODE (decl) == VAR_DECL && DECL_IN_CONSTANT_POOL (decl))

> {

>   SET_DECL_RTL (decl, output_constant_def_1 (DECL_INITIAL (decl),

>  decl, 1));

>   return;

> }

> 

> > At least build_constant_desc seems to create a raw SYMBOL_RER using the raw

> > created label?  But then, as we don't stream the constant-descs, the RTL 

> > should refer to unique labels (but DECL_NAME and the SYMBOL_REF symbol do 
> > not

> > agree).

> 

> I think the problem is that, with your patch, the DECLs are not unified when

> they have the same DECL_INITIAL (decl), even if they have the same RTL in the

> end.



Hmm, but when I use the same contents for the two arrays in my simple

testcase I do get only a single .LC0 output referenced from two places.

We will end up sharing the same RTL for both (unmerged) DECLs - but

I don't see how this can be a problem?  Maybe we fail to set

TREE_ASM_WRITTEN on the duplicate and output it anyway via other 

mechanisms?



But I can reproduce the Ada LTO bootstrap failure ...


[Bug lto/50494] gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto

2013-03-05 Thread ebotcazou at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494



--- Comment #26 from Eric Botcazou  2013-03-05 
13:51:28 UTC ---

> The question is why we don't hit lto-lang.c:lto_set_decl_assembler_name

> mangling of !TREE_PUBLIC decls when streaming in the decl for the constant

> pool entries (or when computing the assembler name at some point).

> I suppose we never compute decl-assembler-name for the constant pool entries

> but emit them in a special way?



Right, see make_decl_rtl:



  /* If this variable belongs to the global constant pool, retrieve the

 pre-computed RTL or recompute it in LTO mode.  */

  if (TREE_CODE (decl) == VAR_DECL && DECL_IN_CONSTANT_POOL (decl))

{

  SET_DECL_RTL (decl, output_constant_def_1 (DECL_INITIAL (decl),

 decl, 1));

  return;

}



> At least build_constant_desc seems to create a raw SYMBOL_RER using the raw

> created label?  But then, as we don't stream the constant-descs, the RTL 

> should refer to unique labels (but DECL_NAME and the SYMBOL_REF symbol do not

> agree).



I think the problem is that, with your patch, the DECLs are not unified when

they have the same DECL_INITIAL (decl), even if they have the same RTL in the

end.


[Bug lto/50494] gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto

2013-03-05 Thread rguenth at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494



--- Comment #25 from Richard Biener  2013-03-05 
13:19:53 UTC ---

Btw, I cannot reproduce the issue with



t.c:



void bar (int *);

void foo (void)

{

  int a[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0 };

  bar (a);

}



t2.c



void bar (int *);

void baz (void)

{

  int a[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0,

  1, 2, 3, 4, 5, 6, 7, 8, 9, 0 };

  bar (a);

}



t3.c:



void bar (int *a) { __builtin_printf ("%d\n", a[25]); }

int main()

{

  foo ();

  baz ();

  return 0;

}



and



gcc t.c t2.c t3.c -flto



I see three .LC0 being streamed in at WPA stage, shipped to a single

LTRANS unit and there getting .LC0, .LC1, and .LC2 symbols by means

of the existing build_constant_desc (they have all .LCO DECL_NAME decls,

but hey - even that would be fixed by my suggested patch.



Trying LTO Ada bootstrap now.  But I really fail to see how it should

break.


[Bug lto/50494] gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto

2013-03-05 Thread rguenth at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494



--- Comment #24 from Richard Biener  2013-03-05 
10:18:01 UTC ---

(In reply to comment #23)

> > How can the patch cause a name collision when all the patch does is

> > avoid creating a new decl...?  They are static and thus should be

> > mangled.

> 

> They clearly aren't.

> 

> > Well, and if we want a new decl just to re-assign a unique name here

> > then we want to copy over alignment information.  That is, LTO should

> > handle constant-pool entries by _not_ streaming the decl then.  Honza,

> > how are those supposed to make the symtab -> WPA -> LTRANS transition

> > anyway?

> 

> The irony being that the initial implementation didn't stream the DECL but was

> changed because the varpool was just starting to being streamed as well. :-)



Yes, we do refer to the in the IL so we need to stream them (in some way).



The question is why we don't hit lto-lang.c:lto_set_decl_assembler_name

mangling of !TREE_PUBLIC decls when streaming in the decl for the constant

pool entries (or when computing the assembler name at some point).

I suppose we never compute decl-assembler-name for the constant pool entries

but emit them in a special way?  At least build_constant_desc seems to

create a raw SYMBOL_RER using the raw created label?  But then, as we

don't stream the constant-descs, the RTL should refer to unique labels

(but DECL_NAME and the SYMBOL_REF symbol do not agree).



Which means that eventually the following would fix it:



Index: gcc/varasm.c

===

--- gcc/varasm.c(revision 196451)

+++ gcc/varasm.c(working copy)

@@ -3124,6 +3124,11 @@ build_constant_desc (tree exp, tree decl

   else

align_variable (decl, 0);

 }

+  /* If we already had a decl for this constant pool entry adjust its

+ label to be unique within this translation unit and to make it

+ consistent with the symbol-ref symbol we use below.  */

+   else

+ DECL_NAME (decl) = get_identifier (label);



   /* Now construct the SYMBOL_REF and the MEM.  */

   if (use_object_blocks_p ())



can you check that?



Thanks.


[Bug lto/50494] gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto

2013-03-04 Thread ebotcazou at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494



--- Comment #23 from Eric Botcazou  2013-03-04 
16:10:53 UTC ---

> How can the patch cause a name collision when all the patch does is

> avoid creating a new decl...?  They are static and thus should be

> mangled.



They clearly aren't.



> Well, and if we want a new decl just to re-assign a unique name here

> then we want to copy over alignment information.  That is, LTO should

> handle constant-pool entries by _not_ streaming the decl then.  Honza,

> how are those supposed to make the symtab -> WPA -> LTRANS transition

> anyway?



The irony being that the initial implementation didn't stream the DECL but was

changed because the varpool was just starting to being streamed as well. :-)


[Bug lto/50494] gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto

2013-03-04 Thread rguenth at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494



Richard Biener  changed:



   What|Removed |Added



 CC||hubicka at gcc dot gnu.org



--- Comment #22 from Richard Biener  2013-03-04 
15:42:18 UTC ---

(In reply to comment #21)

> Reopened, the fix breaks LTO bootstrap with Ada:

> 

> /tmp/cc5Nhl9J.s: Assembler messages:

> /tmp/cc5Nhl9J.s:6206: Error: symbol `.LC0' is already defined

> /tmp/cc5Nhl9J.s:6219: Error: symbol `.LC0' is already defined

> /tmp/cc5Nhl9J.s:6996: Error: symbol `.LC0' is already defined

> make[4]: *** [/tmp/ccuFb8kg.ltrans2.ltrans.o] Error 1

> make[4]: *** Waiting for unfinished jobs

> /tmp/ccf05Tt7.s: Assembler messages:

> /tmp/ccf05Tt7.s:5618: Error: symbol `.LC5' is already defined

> /tmp/ccf05Tt7.s:5629: Error: symbol `.LC5' is already defined

> make[4]: *** [/tmp/ccuFb8kg.ltrans8.ltrans.o] Error 1

> lto-wrapper: make returned 2 exit status

> /home/eric/install/gcc/x86_64-suse-linux/bin/ld: lto-wrapper failed

> collect2: error: ld returned 1 exit status

> make[3]: *** [gnatbind] Error 1

> make[3]: *** Waiting for unfinished jobs

> 

> The reason for the name collision seems obvious enough...  I still think the

> best solution is not to fiddle with the alignment of DECL_IN_CONSTANT_POOL

> objects at this point, since the machinery assumes that their alignment 
> doesn't

> matter (or more precisely that they only need to have the alignment of their

> type, which is correct as far as I know).



How can the patch cause a name collision when all the patch does is

avoid creating a new decl...?  They are static and thus should be

mangled.



Well, and if we want a new decl just to re-assign a unique name here

then we want to copy over alignment information.  That is, LTO should

handle constant-pool entries by _not_ streaming the decl then.  Honza,

how are those supposed to make the symtab -> WPA -> LTRANS transition

anyway?


[Bug lto/50494] gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto

2013-03-04 Thread ebotcazou at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494



Eric Botcazou  changed:



   What|Removed |Added



 Status|RESOLVED|REOPENED

 Resolution|FIXED   |



--- Comment #21 from Eric Botcazou  2013-03-04 
15:30:50 UTC ---

Reopened, the fix breaks LTO bootstrap with Ada:



/tmp/cc5Nhl9J.s: Assembler messages:

/tmp/cc5Nhl9J.s:6206: Error: symbol `.LC0' is already defined

/tmp/cc5Nhl9J.s:6219: Error: symbol `.LC0' is already defined

/tmp/cc5Nhl9J.s:6996: Error: symbol `.LC0' is already defined

make[4]: *** [/tmp/ccuFb8kg.ltrans2.ltrans.o] Error 1

make[4]: *** Waiting for unfinished jobs

/tmp/ccf05Tt7.s: Assembler messages:

/tmp/ccf05Tt7.s:5618: Error: symbol `.LC5' is already defined

/tmp/ccf05Tt7.s:5629: Error: symbol `.LC5' is already defined

make[4]: *** [/tmp/ccuFb8kg.ltrans8.ltrans.o] Error 1

lto-wrapper: make returned 2 exit status

/home/eric/install/gcc/x86_64-suse-linux/bin/ld: lto-wrapper failed

collect2: error: ld returned 1 exit status

make[3]: *** [gnatbind] Error 1

make[3]: *** Waiting for unfinished jobs



The reason for the name collision seems obvious enough...  I still think the

best solution is not to fiddle with the alignment of DECL_IN_CONSTANT_POOL

objects at this point, since the machinery assumes that their alignment doesn't

matter (or more precisely that they only need to have the alignment of their

type, which is correct as far as I know).


[Bug lto/50494] gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto

2013-02-14 Thread rguenth at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494



Richard Biener  changed:



   What|Removed |Added



 Status|ASSIGNED|RESOLVED

 Resolution||FIXED

   Target Milestone|--- |4.8.0



--- Comment #20 from Richard Biener  2013-02-14 
12:25:02 UTC ---

Fixed for 4.8.


[Bug lto/50494] gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto

2013-02-14 Thread rguenth at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494



--- Comment #19 from Richard Biener  2013-02-14 
12:24:26 UTC ---

Author: rguenth

Date: Thu Feb 14 12:24:12 2013

New Revision: 196050



URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=196050

Log:

2013-02-14  Richard Biener  



PR lto/50494

* varasm.c (output_constant_def_1): Get the decl representing

the constant as argument.

(output_constant_def): Wrap output_constant_def_1.

(make_decl_rtl): Use output_constant_def_1 with the decl

representing the constant.

(build_constant_desc): Optionally re-use a decl already

representing the constant.

(tree_output_constant_def): Adjust.



Modified:

trunk/gcc/ChangeLog

trunk/gcc/varasm.c


[Bug lto/50494] gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto

2013-02-14 Thread rguenther at suse dot de


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494



--- Comment #18 from rguenther at suse dot de  
2013-02-14 10:41:07 UTC ---

On Thu, 14 Feb 2013, ebotcazou at gcc dot gnu.org wrote:



> 

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494

> 

> --- Comment #17 from Eric Botcazou  2013-02-14 
> 09:18:46 UTC ---

> > Because as it is seen as a regular VAR_DECL by optimizers the

> > vectorizer (well, IPA increase-alignment in this case) chooses to

> > bump its alignment to be able to use aligned vector moves.

> > 

> > Thus we need to honor the increased DECL_ALIGN when outputting the

> > constant pool, otherwise we generate wrong-code.

> 

> I see, thanks.  Perhaps the safest solution is to prevent IPA from increasing

> the alignment if DECL_IN_CONSTANT_POOL, as initializations of aggregate 
> objects

> are presumably not supposed to happen in hot spots.



It's also done by the vectorizer pass (for -fno-section-anchors).

I believe it's wrong to not honor DECL_ALIGN of decl in



  /* If this variable belongs to the global constant pool, retrieve the

 pre-computed RTL or recompute it in LTO mode.  */

  if (TREE_CODE (decl) == VAR_DECL && DECL_IN_CONSTANT_POOL (decl))

{

  SET_DECL_RTL (decl, output_constant_def (DECL_INITIAL (decl), 1));

  return;



which is what happens here.  Thus, if we say DECL_IN_CONSTANT_POOL

decls may not have their alignment changed we should assert that

(but my patch just honors DECL_ALIGN and avoids creation of yet

another DECL_IN_CONSTANT_POOL decl ...).  After all we also

use DECL_ALIGN information if anybody inspects the address

(which may happen if we elide the local static to the initializer

if we can see it is never changed - I suppose we cannot do that

at the moment)



Btw, if properly aligned the block-move can use vector code as well

(not sure if any target does that though).



Richard.


[Bug lto/50494] gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto

2013-02-14 Thread ebotcazou at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494



--- Comment #17 from Eric Botcazou  2013-02-14 
09:18:46 UTC ---

> Because as it is seen as a regular VAR_DECL by optimizers the

> vectorizer (well, IPA increase-alignment in this case) chooses to

> bump its alignment to be able to use aligned vector moves.

> 

> Thus we need to honor the increased DECL_ALIGN when outputting the

> constant pool, otherwise we generate wrong-code.



I see, thanks.  Perhaps the safest solution is to prevent IPA from increasing

the alignment if DECL_IN_CONSTANT_POOL, as initializations of aggregate objects

are presumably not supposed to happen in hot spots.


[Bug lto/50494] gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto

2013-02-14 Thread rguenther at suse dot de


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494



--- Comment #16 from rguenther at suse dot de  
2013-02-14 08:56:12 UTC ---

On Wed, 13 Feb 2013, ebotcazou at gcc dot gnu.org wrote:



> 

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494

> 

> --- Comment #14 from Eric Botcazou  2013-02-13 
> 22:13:42 UTC ---

> > In LTO:

> > 

> >   /* If this variable belongs to the global constant pool, retrieve the

> >  pre-computed RTL or recompute it in LTO mode.  */

> >   if (TREE_CODE (decl) == VAR_DECL && DECL_IN_CONSTANT_POOL (decl))

> > {

> >   SET_DECL_RTL (decl, output_constant_def (DECL_INITIAL (decl), 1));

> >   return;

> > 

> > but obviously output_constant_def when just getting DECL_INITIAL cannot

> > honor any special alignment requirements of decl.  It will simply get

> > a new decl with standard alignment.

> > 

> > We know decl is already the decl associated with the constant, so we

> > should just re-use it.

> 

> Why does the alignment of a DECL_IN_CONSTANT_POOL matter here exactly?



Because as it is seen as a regular VAR_DECL by optimizers the

vectorizer (well, IPA increase-alignment in this case) chooses to

bump its alignment to be able to use aligned vector moves.



Thus we need to honor the increased DECL_ALIGN when outputting the

constant pool, otherwise we generate wrong-code.


[Bug lto/50494] gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto

2013-02-13 Thread meissner at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494



--- Comment #15 from Michael Meissner  2013-02-13 
22:38:12 UTC ---

The patch does align the .rodata section to 16 byte alignment, but the code to

load up the auto vector from constant memory does not do vectorization.



If I use -fno-section-anchors, it aligns .rodata to 4 byte alignment, and does

not vectorize the code.



If I use -fno-merge-constants, it aligns .rodata to 16 byte alignment, and does

vectorize the code.



If I use -fno-merge-constants without -flto, it aligns .rodata to 16 byte

alignment, but it uses unaligned vector loads/stores.



So the patch does help in that the tests now pass that were randomly failing.



While it would be nice if we could get the initialization to be vectorized, I'm

not how performance critical this is.



Eric: if the alignment of the constant data that is used to initialize the auto

array is a mismatch, and you use Altivec instructions, when the compiler

auto-vectorizes the copy, the wrong data gets used.


[Bug lto/50494] gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto

2013-02-13 Thread ebotcazou at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494



--- Comment #14 from Eric Botcazou  2013-02-13 
22:13:42 UTC ---

> In LTO:

> 

>   /* If this variable belongs to the global constant pool, retrieve the

>  pre-computed RTL or recompute it in LTO mode.  */

>   if (TREE_CODE (decl) == VAR_DECL && DECL_IN_CONSTANT_POOL (decl))

> {

>   SET_DECL_RTL (decl, output_constant_def (DECL_INITIAL (decl), 1));

>   return;

> 

> but obviously output_constant_def when just getting DECL_INITIAL cannot

> honor any special alignment requirements of decl.  It will simply get

> a new decl with standard alignment.

> 

> We know decl is already the decl associated with the constant, so we

> should just re-use it.



Why does the alignment of a DECL_IN_CONSTANT_POOL matter here exactly?


[Bug lto/50494] gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto

2013-02-13 Thread rguenth at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494



Richard Biener  changed:



   What|Removed |Added



 CC||ebotcazou at gcc dot

   ||gnu.org



--- Comment #13 from Richard Biener  2013-02-13 
10:55:59 UTC ---

Eric added the original feature.


[Bug lto/50494] gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto

2013-02-13 Thread rguenth at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494



--- Comment #12 from Richard Biener  2013-02-13 
10:55:15 UTC ---

Created attachment 29436

  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29436

patch



-fsection-anchors enables pass_ipa_increase_alignment (the vectorizer can

not align globals after any function was assembled, so we do that upfront).



In LTO:



  /* If this variable belongs to the global constant pool, retrieve the

 pre-computed RTL or recompute it in LTO mode.  */

  if (TREE_CODE (decl) == VAR_DECL && DECL_IN_CONSTANT_POOL (decl))

{

  SET_DECL_RTL (decl, output_constant_def (DECL_INITIAL (decl), 1));

  return;



but obviously output_constant_def when just getting DECL_INITIAL cannot

honor any special alignment requirements of decl.  It will simply get

a new decl with standard alignment.



We know decl is already the decl associated with the constant, so we

should just re-use it.



So I believe the attached should fix this.  Can PPC people plaease test?


[Bug lto/50494] gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto

2013-02-13 Thread rguenther at suse dot de


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494



--- Comment #11 from rguenther at suse dot de  
2013-02-13 09:05:16 UTC ---

On Tue, 12 Feb 2013, meissner at gcc dot gnu.org wrote:



> 

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494

> 

> --- Comment #10 from Michael Meissner  
> 2013-02-12 19:16:56 UTC ---

> If -fno-merge-constants (and the default -fsection-anchors) is used, then the

> correct alignment for the table is set (and Altivec memory instructions are

> used).

> 

> At a guess, it is likely be in the gimplify_init_constructor function in

> gimplify.c.



There is special code to handle initializers, it may be well that with

LTO we fail to do the correct thing here.


[Bug lto/50494] gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto

2013-02-12 Thread meissner at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494



--- Comment #10 from Michael Meissner  2013-02-12 
19:16:56 UTC ---

If -fno-merge-constants (and the default -fsection-anchors) is used, then the

correct alignment for the table is set (and Altivec memory instructions are

used).



At a guess, it is likely be in the gimplify_init_constructor function in

gimplify.c.


[Bug lto/50494] gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto

2013-02-12 Thread meissner at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494



--- Comment #9 from Michael Meissner  2013-02-12 
19:07:18 UTC ---

The -fsection-anchors option appears to be important.  If I use

-fsection-anchors (which is default for powerpc64-linux), LTO does not align

the .rodata section, but uses Altivec memory instructions.  If I use

-fno-section-anchors, the .rodata section is not aligned, but it doesn't use

Altivec memory instructions, so the test passes.


[Bug lto/50494] gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto

2013-02-12 Thread rguenth at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494



Richard Biener  changed:



   What|Removed |Added



 Status|NEW |ASSIGNED

 AssignedTo|unassigned at gcc dot   |rguenth at gcc dot gnu.org

   |gnu.org |



--- Comment #8 from Richard Biener  2013-02-12 
18:36:12 UTC ---

I will have a looksee tomorrow.


[Bug lto/50494] gcc.dg/vect/vect-reduc-2char.c fails spuriously on ppc with -flto

2013-02-12 Thread meissner at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50494



Michael Meissner  changed:



   What|Removed |Added



  Component|target  |lto



--- Comment #7 from Michael Meissner  2013-02-12 
18:25:38 UTC ---

I am switching this to LTO instead of target, as it appears to be an LTO bug. 

Before LTO is run, the alignment of the .rodata section is 16 byte alignment

since the array used to initialize the auto array is copied with altivec

instructions.  After LTO, the alignment of the .rodata section is 4 bytes.  The

powerpc Altivec instructions ignore the bottom 4 bits of the address, and so

depending on what else is linked, the test will randomly fail or succeed.



I added attachments from compiling slp-perm-1.c with -O3 -mcpu=power6 -maltivec

-save-temps to give the asm files.