Re: RFC [1/2] divmod transform

2016-05-29 Thread Prathamesh Kulkarni
On 27 May 2016 at 17:31, Richard Biener  wrote:
> On Fri, 27 May 2016, Prathamesh Kulkarni wrote:
>
>> On 27 May 2016 at 15:45, Richard Biener  wrote:
>> > On Wed, 25 May 2016, Prathamesh Kulkarni wrote:
>> >
>> >> On 25 May 2016 at 12:52, Richard Biener  wrote:
>> >> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
>> >> >
>> >> >> On 24 May 2016 at 19:39, Richard Biener  wrote:
>> >> >> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
>> >> >> >
>> >> >> >> On 24 May 2016 at 17:42, Richard Biener  wrote:
>> >> >> >> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
>> >> >> >> >
>> >> >> >> >> On 23 May 2016 at 17:35, Richard Biener 
>> >> >> >> >>  wrote:
>> >> >> >> >> > On Mon, May 23, 2016 at 10:58 AM, Prathamesh Kulkarni
>> >> >> >> >> >  wrote:
>> >> >> >> >> >> Hi,
>> >> >> >> >> >> I have updated my patch for divmod (attached), which was 
>> >> >> >> >> >> originally
>> >> >> >> >> >> based on Kugan's patch.
>> >> >> >> >> >> The patch transforms stmts with code TRUNC_DIV_EXPR and 
>> >> >> >> >> >> TRUNC_MOD_EXPR
>> >> >> >> >> >> having same operands to divmod representation, so we can cse 
>> >> >> >> >> >> computation of mod.
>> >> >> >> >> >>
>> >> >> >> >> >> t1 = a TRUNC_DIV_EXPR b;
>> >> >> >> >> >> t2 = a TRUNC_MOD_EXPR b
>> >> >> >> >> >> is transformed to:
>> >> >> >> >> >> complex_tmp = DIVMOD (a, b);
>> >> >> >> >> >> t1 = REALPART_EXPR (complex_tmp);
>> >> >> >> >> >> t2 = IMAGPART_EXPR (complex_tmp);
>> >> >> >> >> >>
>> >> >> >> >> >> * New hook divmod_expand_libfunc
>> >> >> >> >> >> The rationale for introducing the hook is that different 
>> >> >> >> >> >> targets have
>> >> >> >> >> >> incompatible calling conventions for divmod libfunc.
>> >> >> >> >> >> Currently three ports define divmod libfunc: c6x, spu and arm.
>> >> >> >> >> >> c6x and spu follow the convention of libgcc2.c:__udivmoddi4:
>> >> >> >> >> >> return quotient and store remainder in argument passed as 
>> >> >> >> >> >> pointer,
>> >> >> >> >> >> while the arm version takes two arguments and returns both
>> >> >> >> >> >> quotient and remainder having mode double the size of the 
>> >> >> >> >> >> operand mode.
>> >> >> >> >> >> The port should hence override the hook expand_divmod_libfunc
>> >> >> >> >> >> to generate call to target-specific divmod.
>> >> >> >> >> >> Ports should define this hook if:
>> >> >> >> >> >> a) The port does not have divmod or div insn for the given 
>> >> >> >> >> >> mode.
>> >> >> >> >> >> b) The port defines divmod libfunc for the given mode.
>> >> >> >> >> >> The default hook default_expand_divmod_libfunc() generates 
>> >> >> >> >> >> call
>> >> >> >> >> >> to libgcc2.c:__udivmoddi4 provided the operands are unsigned 
>> >> >> >> >> >> and
>> >> >> >> >> >> are of DImode.
>> >> >> >> >> >>
>> >> >> >> >> >> Patch passes bootstrap+test on x86_64-unknown-linux-gnu and
>> >> >> >> >> >> cross-tested on arm*-*-*.
>> >> >> >> >> >> Bootstrap+test in progress on arm-linux-gnueabihf.
>> >> >> >> >> >> Does this patch look OK ?
>> >> >> >> >> >
>> >> >> >> >> > diff --git a/gcc/targhooks.c b/gcc/targhooks.c
>> >> >> >> >> > index 6b4601b..e4a021a 100644
>> >> >> >> >> > --- a/gcc/targhooks.c
>> >> >> >> >> > +++ b/gcc/targhooks.c
>> >> >> >> >> > @@ -1965,4 +1965,31 @@ default_optab_supported_p (int, 
>> >> >> >> >> > machine_mode,
>> >> >> >> >> > machine_mode, optimization_type)
>> >> >> >> >> >return true;
>> >> >> >> >> >  }
>> >> >> >> >> >
>> >> >> >> >> > +void
>> >> >> >> >> > +default_expand_divmod_libfunc (bool unsignedp, machine_mode 
>> >> >> >> >> > mode,
>> >> >> >> >> > +  rtx op0, rtx op1,
>> >> >> >> >> > +  rtx *quot_p, rtx *rem_p)
>> >> >> >> >> >
>> >> >> >> >> > functions need a comment.
>> >> >> >> >> >
>> >> >> >> >> > ISTR it was suggested that ARM change to libgcc2.c__udivmoddi4 
>> >> >> >> >> > style?  In that
>> >> >> >> >> > case we could avoid the target hook.
>> >> >> >> >> Well I would prefer adding the hook because that's more easier 
>> >> >> >> >> -;)
>> >> >> >> >> Would it be ok for now to go with the hook ?
>> >> >> >> >> >
>> >> >> >> >> > +  /* If target overrides expand_divmod_libfunc hook
>> >> >> >> >> > +then perform divmod by generating call to the 
>> >> >> >> >> > target-specifc divmod
>> >> >> >> >> > libfunc.  */
>> >> >> >> >> > +  if (targetm.expand_divmod_libfunc != 
>> >> >> >> >> > default_expand_divmod_libfunc)
>> >> >> >> >> > +   return true;
>> >> >> >> >> > +
>> >> >> >> >> > +  /* Fall back to using libgcc2.c:__udivmoddi4.  */
>> >> >> >> >> > +  return (mode == DImode && unsignedp);
>> >> >> >> >> >
>> >> >> >> >> > I don't understand this - we know optab_libfunc returns 
>> >> >> >> >> > non-NULL for 'mode'
>> >> >> >> >> > but still restrict this to DImode && unsigned?  Also if
>> >> >> >> >> > targetm.expand_divmod_libfunc
>> >> >> >> >> > is not the default we expect the target to handle all modes?
>> >> >> >> >> Ah indeed, the chec

Re: [PATCH 5/5] workaround for PR70427

2016-05-29 Thread Andi Kleen
On Mon, May 30, 2016 at 02:34:03AM +0200, Jan Hubicka wrote:
> > diff --git a/gcc/ipa-profile.c b/gcc/ipa-profile.c
> > index da17bcd..c7d7792 100644
> > --- a/gcc/ipa-profile.c
> > +++ b/gcc/ipa-profile.c
> > @@ -201,6 +201,8 @@ ipa_profile_generate_summary (void)
> > if (h->hvalue.counters[2])
> >   {
> > struct cgraph_edge * e = node->get_edge (stmt);
> > +   if (!e)
> > + continue;
> 
> This is odd. I do not think auto-fdo produces indirect call histograms and the
> edges should be present here.  Do you know from where the histogram is 
> created?

I don't know. How would I find out?

It should be reproducible by applying the patchkit and running make 
autoprofiledbootstrap

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only


Re: [PATCH 1/5] Add gcc-auto-profile script

2016-05-29 Thread Andi Kleen
On Mon, May 30, 2016 at 02:39:06AM +0200, Jan Hubicka wrote:
> > 
> > Since maintaining the script would be somewhat tedious (needs changes
> > every time a new CPU comes out) I auto generated it from the online
> > Intel event database. The script to do that is in contrib and can be
> > rerun.
> 
> I guess we need to figure out how to ship this to users.  At the moment
> the script will tell you to rebuild when it meets new CPU, but it reffers
> to gcc sources which is not the best place.

I don't have a better solution.

> 
> Also the script should insteall when it is documented in invoke.texi

I wasn't sure if it should be installed or not, but opted for not. 
For now I can remove the documentation.

> 
> What happens when you ahve wrong perf?

You mean with the manual example? It will just error out,
because it requires Google's special patched version.
Also the event name is not always the same, so it's only working
on some CPUs.

With my script perf should work, but you need LBR support in the kernel for
using -b. LBRs are a model specific feature, and this needs:

- The kernel is new enough and knows about your CPU model
(dmesg | grep Perf.*LBR outputs something)

- LBRs are typically not virtualized[1], so it usually does not work
in VMs

When the LBR support is not there the profiling run will not
work at all.

-Andi

[1] There are some patches for KVM, but they are not mainline so far.

-- 
a...@linux.intel.com -- Speaking for myself only


Re: [PATCH 1/5] Add gcc-auto-profile script

2016-05-29 Thread Jan Hubicka
Andi,
thanks a lot for working on the auto-fdo bootstrap. It is badly needed to
have some coverage for this feature.  I don't think I can approve the
build machinery changes.

> From: Andi Kleen 
> 
> Using autofdo is currently something difficult. It requires using the
> model specific branches taken event, which differs on different CPUs.
> The example shown in the manual requires a special patched version of
> perf that is non standard, and also will likely not work everywhere.
> 
> This patch adds a new gcc-auto-profile script that figures out the
> correct event and runs perf.
> 
> This is needed to actually make use of autofdo in a generic way
> in the build system and in the test suite.
> 
> Since maintaining the script would be somewhat tedious (needs changes
> every time a new CPU comes out) I auto generated it from the online
> Intel event database. The script to do that is in contrib and can be
> rerun.

I guess we need to figure out how to ship this to users.  At the moment
the script will tell you to rebuild when it meets new CPU, but it reffers
to gcc sources which is not the best place.

Also the script should insteall when it is documented in invoke.texi

What happens when you ahve wrong perf?

Honza


Re: [PATCH2][PR71252] Fix insertion point of stmt_to_insert

2016-05-29 Thread kugan



On 28/05/16 01:28, Kugan Vivekanandarajah wrote:

Hi Richard,

This fix insertion point of stmt_to_insert based on your comments. In
insert_stmt_before_use , I now use find_insert_point such that we
insert the stmt_to_insert after its operands are defined. This means
that we now have to insert before and insert after in other cases.

I also factored out uses of insert_stmt_before_use.

I tested this with:
./build/gcc/f951 cp2k_single_file.f90 -O3 -ffast-math -march=westmere

I am running bootstrap and regression testing on x86-64-linux gnu. Is
this OK for trunk if the testing is fine ? I will also test with other
test cases from relevant PRs



Hi,

Here is the tested patch. I made a slight change to reflect the paten 
used everywhere else in reassoc.


i.e., in insert_stmt_before_use, after finding the insertion point I now do:

+  if (stmt == insert_point)
+gsi_insert_before (&gsi, stmt_to_insert, GSI_NEW_STMT);
+  else
+insert_stmt_after (stmt_to_insert, insert_point);

This is consistent with what is done in other places.

I tested the patch with CPU2006 and bootstrapped and regression tested 
for x86-64-none-linux-gnu with no new regressions.


Is this OK for trunk?

Thanks,
Kugan

gcc/testsuite/ChangeLog:

2016-05-28  Kugan Vivekanandarajah  

PR middle-end/71269
PR middle-end/71292
* gcc.dg/tree-ssa/pr71269.c: New test.
* gcc.dg/tree-ssa/pr71292.c: New test.

gcc/ChangeLog:

2016-05-28  Kugan Vivekanandarajah  

PR middle-end/71269
PR middle-end/71252
* tree-ssa-reassoc.c (insert_stmt_before_use): Use find_insert_point so
that inserted stmt will not dominate stmts that defines its operand.
(rewrite_expr_tree): Add stmt_to_insert before adding the use stmt.
(rewrite_expr_tree_parallel): Likewise.
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71269.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr71269.c
index e69de29..4dceaaa 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71269.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71269.c
@@ -0,0 +1,10 @@
+/* PR middle-end/71269 */
+/* { dg-do compile } */
+/* { dg-options "-O1" } */
+
+int a, b, c;
+void  fn2 (int);
+void fn1 ()
+{
+  fn2 (sizeof 0 + c + a + b + b);
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71292.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr71292.c
index e69de29..1a25d93 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71292.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71292.c
@@ -0,0 +1,12 @@
+/* PR middle-end/71292 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+unsigned long a;
+long b, d;
+int c;
+void fn1 ()
+{
+  unsigned long e = a + c;
+  b = d + e + a + 8;
+}
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index c9ed679..bc4b55a 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -1777,16 +1777,6 @@ eliminate_redundant_comparison (enum tree_code opcode,
   return false;
 }
 
-/* If the stmt that defines operand has to be inserted, insert it
-   before the use.  */
-static void
-insert_stmt_before_use (gimple *stmt, gimple *stmt_to_insert)
-{
-  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
-  gimple_set_uid (stmt_to_insert, gimple_uid (stmt));
-  gsi_insert_before (&gsi, stmt_to_insert, GSI_NEW_STMT);
-}
-
 
 /* Transform repeated addition of same values into multiply with
constant.  */
@@ -3799,6 +3789,29 @@ find_insert_point (gimple *stmt, tree rhs1, tree rhs2)
   return stmt;
 }
 
+/* If the stmt that defines operand has to be inserted, insert it
+   before the use.  */
+static void
+insert_stmt_before_use (gimple *stmt, gimple *stmt_to_insert)
+{
+  gcc_assert (is_gimple_assign (stmt_to_insert));
+  tree rhs1 = gimple_assign_rhs1 (stmt_to_insert);
+  tree rhs2 = gimple_assign_rhs2 (stmt_to_insert);
+  gimple *insert_point = find_insert_point (stmt, rhs1, rhs2);
+  gimple_stmt_iterator gsi = gsi_for_stmt (insert_point);
+  gimple_set_uid (stmt_to_insert, gimple_uid (insert_point));
+
+  /* If the insert point is not stmt, then insert_point would be
+ the point where operand rhs1 or rhs2 is defined. In this case,
+ stmt_to_insert has to be inserted afterwards. This would
+ only happen when the stmt insertion point is flexible. */
+  if (stmt == insert_point)
+gsi_insert_before (&gsi, stmt_to_insert, GSI_NEW_STMT);
+  else
+insert_stmt_after (stmt_to_insert, insert_point);
+}
+
+
 /* Recursively rewrite our linearized statements so that the operators
match those in OPS[OPINDEX], putting the computation in rank
order.  Return new lhs.  */
@@ -3835,6 +3848,12 @@ rewrite_expr_tree (gimple *stmt, unsigned int opindex,
  print_gimple_stmt (dump_file, stmt, 0, 0);
}
 
+ /* If the stmt that defines operand has to be inserted, insert it
+before the use.  */
+ if (oe1->stmt_to_insert)
+   insert_stmt_before_use (stmt, oe1->stmt_to_insert);
+ if (oe2->stmt_to_insert)
+   insert_stmt_before_use (stmt, oe2->stmt_to_insert);
  /* 

Re: [PATCH 5/5] workaround for PR70427

2016-05-29 Thread Jan Hubicka
> From: Andi Kleen 
> 
> This makes autofdo bootstrap not crash.
> 
> This is probably not the right fix, but for now it works for me.
> Not for submission.
> ---
>  gcc/ipa-profile.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/gcc/ipa-profile.c b/gcc/ipa-profile.c
> index da17bcd..c7d7792 100644
> --- a/gcc/ipa-profile.c
> +++ b/gcc/ipa-profile.c
> @@ -201,6 +201,8 @@ ipa_profile_generate_summary (void)
>   if (h->hvalue.counters[2])
> {
>   struct cgraph_edge * e = node->get_edge (stmt);
> + if (!e)
> +   continue;

This is odd. I do not think auto-fdo produces indirect call histograms and the
edges should be present here.  Do you know from where the histogram is created?

Honza
>   if (e && !e->indirect_unknown_callee)
> continue;
>   e->indirect_info->common_target_id
> -- 
> 2.8.2


Re: [PATCH 2/5] Don't cause ICEs when auto profile file is not found with checking

2016-05-29 Thread Jan Hubicka
> From: Andi Kleen 
> 
> Currently, on a checking enabled compiler when -fauto-profile does
> not find the profile feedback file it errors out with assertation
> failures. Add proper errors for this case.
> 
> gcc/:
> 
> 2016-05-21  Andi Kleen  
> 
>   * auto-profile.c (read_profile): Replace asserts with errors
>   when file does not exist.
>   * gcov-io.c (gcov_read_words): Dito.

OK,
thanks!

Honza


[C++ Patch] PR 71109 ("Misleading diagnostic message with 'virtual' used in out-of-line definitions of class template member functions")

2016-05-29 Thread Paolo Carlini

Hi,

submitter noticed that for wrong uses of 'virtual' outside of template 
classes (B in the testcase) vs plain classes (A) we wrongly emit the 
"templates may not be %" error message. Simply checking 
current_class_type seems enough to solve the problem. Case C in the 
extended testcase double checks that we still give the "templates may 
not be %" error message when appropriate. Tested x86_64-linux.


Thanks,
Paolo.

//
/cp
2016-05-30  Paolo Carlini  

PR c++/71099
* parser.c (cp_parser_function_specifier_opt): Use current_class_type
to improve the diagnostic about wrong uses of 'virtual'.

/testsuite
2016-05-30  Paolo Carlini  

PR c++/71099
* g++.dg/parse/virtual1.C: New.
Index: cp/parser.c
===
--- cp/parser.c (revision 236863)
+++ cp/parser.c (working copy)
@@ -12888,7 +12888,8 @@ cp_parser_function_specifier_opt (cp_parser* parse
   /* 14.5.2.3 [temp.mem]
 
 A member function template shall not be virtual.  */
-  if (PROCESSING_REAL_TEMPLATE_DECL_P ())
+  if (PROCESSING_REAL_TEMPLATE_DECL_P ()
+ && current_class_type)
error_at (token->location, "templates may not be %");
   else
set_and_check_decl_spec_loc (decl_specs, ds_virtual, token);
Index: testsuite/g++.dg/parse/virtual1.C
===
--- testsuite/g++.dg/parse/virtual1.C   (revision 0)
+++ testsuite/g++.dg/parse/virtual1.C   (working copy)
@@ -0,0 +1,25 @@
+// PR c++/71099
+
+struct A {
+  virtual void foo();
+};
+
+virtual void A::foo() {}  // { dg-error "'virtual' outside class" }
+
+template
+struct B {
+  virtual void foo();
+};
+
+template
+virtual void B::foo() {}  // { dg-error "'virtual' outside class" }
+
+template
+struct C {
+  template
+  virtual void foo();  // { dg-error "templates may not be 'virtual'" }
+};
+
+template
+template
+virtual void C::foo() {}  // { dg-error "'virtual' outside class" }


Re: [PING^2] Re: Updated autofdo bootstrap and testing patches

2016-05-29 Thread Andi Kleen
Andi Kleen  writes:

Ping^2!

> Andi Kleen  writes:
>
> Ping!
>
>> Here's an updated version of the patchkit to enable autofdo bootstrap
>> and testing. It also fixes some autofdo issues. The last patch is more a 
>> workaround
>> (to make autofdo bootstrap not ICE), but may need a better fix.
>>
>> The main motivation is to get better test coverage for autofdo 
>> and also an useful benchmark (speed of generated compiler) for it. 
>> If you want the absolutely fastest compiler using profiledbootstrap
>> is still the way to go.
>>
>> I addressed most of the earlier review comments. The python script
>> is still python 2 for better compatibility with old systems.
>>
>> Ok to commit?
>>
>>
>

-- 
a...@linux.intel.com -- Speaking for myself only


PING^4: [PATCH] PR target/70454: Build x86 libgomp with -march=i486 or better

2016-05-29 Thread H.J. Lu
On Fri, May 20, 2016 at 8:04 AM, H.J. Lu  wrote:
> On Mon, May 9, 2016 at 5:52 AM, H.J. Lu  wrote:
>> On Mon, May 2, 2016 at 6:46 AM, H.J. Lu  wrote:
>>> On Mon, Apr 25, 2016 at 1:36 PM, H.J. Lu  wrote:
 If x86 libgomp isn't compiled with -march=i486 or better, append
 -march=i486 XCFLAGS for x86 libgomp build.

 Tested on i686 with and without --with-arch=i386.  Tested on
 x86-64 with and without --with-arch_32=i386.  OK for trunk?


 H.J.
 ---
 PR target/70454
 * configure.tgt (XCFLAGS): Append -march=i486 to compile x86
 libgomp if needed.
 ---
  libgomp/configure.tgt | 36 
  1 file changed, 16 insertions(+), 20 deletions(-)

 diff --git a/libgomp/configure.tgt b/libgomp/configure.tgt
 index 77e73f0..c876e80 100644
 --- a/libgomp/configure.tgt
 +++ b/libgomp/configure.tgt
 @@ -67,28 +67,24 @@ if test x$enable_linux_futex = xyes; then
 ;;

  # Note that bare i386 is not included here.  We need cmpxchg.
 -i[456]86-*-linux*)
 +i[456]86-*-linux* | x86_64-*-linux*)
 config_path="linux/x86 linux posix"
 -   case " ${CC} ${CFLAGS} " in
 - *" -m64 "*|*" -mx32 "*)
 -   ;;
 - *)
 -   if test -z "$with_arch"; then
 - XCFLAGS="${XCFLAGS} -march=i486 -mtune=${target_cpu}"
 +   # Need i486 or better.
 +   cat > conftestx.c <>>> +#if defined __x86_64__ || defined __i486__ || defined __pentium__ \
 +  || defined __pentiumpro__ || defined __pentium4__ \
 +  || defined __geode__ || defined __SSE__
 +# error Need i486 or better
 +#endif
 +EOF
 +   if ${CC} ${CFLAGS} -c -o conftestx.o conftestx.c > /dev/null 2>&1; 
 then
 +   if test "${target_cpu}" = x86_64; then
 +   XCFLAGS="${XCFLAGS} -march=i486 -mtune=generic"
 +   else
 +   XCFLAGS="${XCFLAGS} -march=i486 -mtune=${target_cpu}"
 fi
 -   esac
 -   ;;
 -
 -# Similar jiggery-pokery for x86_64 multilibs, except here we
 -# can't rely on the --with-arch configure option, since that
 -# applies to the 64-bit side.
 -x86_64-*-linux*)
 -   config_path="linux/x86 linux posix"
 -   case " ${CC} ${CFLAGS} " in
 - *" -m32 "*)
 -   XCFLAGS="${XCFLAGS} -march=i486 -mtune=generic"
 -   ;;
 -   esac
 +   fi
 +   rm -f conftestx.c conftestx.o
 ;;

  # Note that sparcv7 and sparcv8 is not included here.  We need cas.
 --
 2.5.5

>>>
>>> PING.
>>>
>>
>> PING.
>>
>
> PING.
>

PING.


-- 
H.J.


PING: [PATCH] Load external function address via GOT slot

2016-05-29 Thread H.J. Lu
On Mon, May 9, 2016 at 10:37 AM, H.J. Lu  wrote:
> On Fri, Apr 22, 2016 at 6:03 AM, Uros Bizjak  wrote:
>> On Fri, Apr 22, 2016 at 2:54 PM, H.J. Lu  wrote:
>>> For -fno-plt, we load the external function address via the GOT slot
>>> so that linker won't create an PLT entry for extern function address.
>>>
>>> Tested on x86-64. I also built GCC with -fno-plt.  It removes 99% PLT
>>> entries.  OK for trunk?
>>>
>>> H.J.
>>> --
>>> gcc/
>>>
>>> PR target/pr67400
>>> * config/i386/i386-protos.h (ix86_force_load_from_GOT_p): New.
>>> * config/i386/i386.c (ix86_force_load_from_GOT_p): New function.
>>> (ix86_legitimate_address_p): Allow UNSPEC_GOTPCREL for
>>> ix86_force_load_from_GOT_p returns true.
>>> (ix86_print_operand_address): Support UNSPEC_GOTPCREL if
>>> ix86_force_load_from_GOT_p returns true.
>>> (ix86_expand_move): Load the external function address via the
>>> GOT slot if ix86_force_load_from_GOT_p returns true.
>>> * config/i386/predicates.md (x86_64_immediate_operand): Return
>>> false if ix86_force_load_from_GOT_p returns true.
>>>
>>> gcc/testsuite/
>>>
>>> PR target/pr67400
>>> * gcc.target/i386/pr67400-1.c: New test.
>>> * gcc.target/i386/pr67400-2.c: Likewise.
>>> * gcc.target/i386/pr67400-3.c: Likewise.
>>> * gcc.target/i386/pr67400-4.c: Likewise.
>>
>> Please get someone that knows this linker magic to review the
>> functionality first. Maybe Jakub can help?
>>
>
> Hi Jakub,
>
> Can you review this patch?
>
> Thanks.

PING.

-- 
H.J.


Re: [PATCH][i386] Add -march=native support for VIA nano CPUs

2016-05-29 Thread J. Mayer
Hello,

On Sun, 2016-05-29 at 21:12 +0200, Uros Bizjak wrote:
> Hello!
> 
> > 
> > When trying to compile using -march=native on a VIA nano CPU, gcc
> > selects "-march=core2" "-mtune=i386" then is unable to compile, as
> > this
> > creates a conflicts between 32 bits and 64 bits compilation modes,
> > as
> > show by the following test:
> [...]
> 
> > 
> > --- gcc/config/i386/driver-i386.c.origÂÂ2015-02-02
> > 05:20:49.0
> > +0100
> > +++ gcc/config/i386/driver-i386.cÂÂÂ2015-08-23
> > 01:11:03.0
> > +0200
> > @@ -601,15 +601,20 @@
> > ÂÂswitch (family)
> > {
> > case 6:
> > -Âif (model > 9)
> The patch was corrupted by your mailer. But - can you please open a
> bugreport, and refer reposted patch to this bugreport? This way, the
> problem (and the patch) won't get forgotten.
> 
> Uros.
> 

Sorry for that, might be because of UTF-8 encoding.
I already opened a bug many monthes ago, ID 67310:

I just updated the patch against current git repository, the only
difference between previous versions are diff offsets.

Jocelyn

---

diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-
i386.c
index b121466..662709e 100644
--- a/gcc/config/i386/driver-i386.c
+++ b/gcc/config/i386/driver-i386.c
@@ -642,15 +642,20 @@ const char *host_detect_local_cpu (int argc,
const char **argv)
  switch (family)
{
case 6:
- if (model > 9)
-   /* Use the default detection procedure.  */
+ if (has_longmode)
processor = PROCESSOR_GENERIC;
- else if (model == 9)
-   cpu = "c3-2";
- else if (model >= 6)
-   cpu = "c3";
  else
-   processor = PROCESSOR_GENERIC;
+   {
+ if (model > 9)
+   /* Use the default detection procedure.  */
+   processor = PROCESSOR_GENERIC;
+ else if (model == 9)
+   cpu = "c3-2";
+ else if (model >= 6)
+   cpu = "c3";
+ else
+   processor = PROCESSOR_GENERIC;
+   }
  break;
case 5:
  if (has_3dnow)
@@ -664,6 +669,8 @@ const char *host_detect_local_cpu (int argc, const
char **argv)
  /* We have no idea.  */
  processor = PROCESSOR_GENERIC;
}
+   } else {
+ processor = PROCESSOR_GENERIC;
}
 }
   else
@@ -894,7 +901,12 @@ const char *host_detect_local_cpu (int argc, const
char **argv)
   if (arch)
{
  if (has_ssse3)
-   cpu = "core2";
+   {
+ if (vendor == signature_CENTAUR_ebx)
+   cpu = "x86-64";
+ else
+   cpu = "core2";
+   }
  else if (has_sse3)
{
  if (has_longmode)



Re: [PATCH] match.pd: optimize unsigned mul overflow check

2016-05-29 Thread Marc Glisse

On Sat, 28 May 2016, Alexander Monakov wrote:


For unsigned A, B, 'A > -1 / B' is a nice predicate for checking whether 'A*B'
overflows (or 'B && A > -1 / B' if B may be zero).  Let's optimize it to an
invocation of __builtin_mul_overflow to avoid the divide operation.


I forgot to ask earlier: what does this give for modes / platforms where 
umulv4 does not have a specific implementation? Is the generic 
implementation worse than A>-1/B, in which case we may want to check 
optab_handler before doing the transformation? Or is it always at least as 
good?


(I didn't ask because I was assuming the latter, but I am not 100% 
certain)


--
Marc Glisse


[RFC PATCH, i386]: Use "lock orl $0, -4(%esp)" in mfence_nosse

2016-05-29 Thread Uros Bizjak
Hello!

As explained in PR71245, comment #3 [1], it is better to use offset -4
to a %esp to implement a non-SSE memory fence instruction:

-q-

I guess it costs a code byte for a disp8 in the addressing mode, but
it avoids adding a lot of latency to a critical path involving a
spill/reload to (%esp), in functions where there is something at
(%esp).

If it's an object larger than 4B, the lock orl could even cause a
store-forwarding stall when the object is reloaded.  (e.g. a double or
a vector).

Ideally we could do the  lock orl  on some padding between two locals,
or on something in memory that wasn't going to be loaded soon, to
avoid touching more stack memory (which might be in the next page
down).  But we still want to do it on a cache line that's hot, so
going way up above our own stack frame isn't good either.

-/q-

Attached RFC patch implements this proposal.

2016-05-29  Uros Bizjak  

* config/i386/sync.md (mfence_nosse): Use "lock orl $0, -4(%esp)".

Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Any other opinion on this issue? The linux kernel also implements
memory fence like the above proposal.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71245#c3

Uros.
Index: config/i386/sync.md
===
--- config/i386/sync.md (revision 236863)
+++ config/i386/sync.md (working copy)
@@ -98,7 +98,7 @@
(unspec:BLK [(match_dup 0)] UNSPEC_MFENCE))
(clobber (reg:CC FLAGS_REG))]
   "!(TARGET_64BIT || TARGET_SSE2)"
-  "lock{%;} or{l}\t{$0, (%%esp)|DWORD PTR [esp], 0}"
+  "lock{%;} or{l}\t{$0, -4(%%esp)|DWORD PTR [esp-4], 0}"
   [(set_attr "memory" "unknown")])
 
 (define_expand "mem_thread_fence"


Re: [patch] doc/sourcebuild.texi (Directives): Remove extra closing braces.

2016-05-29 Thread Gerald Pfeifer
On Sat, 16 Jan 2016, Jonathan Wakely wrote:
> This removes stray closing braces in the docs for dg-error, dg-warning 
> etc.
> 
> OK for trunk?

Yes.

Sorry for the delay.  I expected someone else to pick this up for 
review/approval, but now noticed that this patch apparently has 
not been committed yet:

  commit 1cb064263cfcfa14da81585886750f01a5611c7e
  Author: Jonathan Wakely 
  Date:   Sat Jan 16 00:11:27 2016 +

* doc/sourcebuild.texi (Directives): Remove extra closing braces.

Gerald


[PATCH, i386]: Fix PR71245, atomic load/store bounces the data to the stack using fild/fistp

2016-05-29 Thread Uros Bizjak
Hello!

As shown in the PR, when moving DFmode value to/from a FP register, we
don't need to bounce it with an atomic DImode fild/fistp in case of
-march=pentium. DFmode move is atomic by itself.

2016-05-29  Uros Bizjak  

PR target/71245
* config/i386/sync.md (define_peephole2 atomic_storedi_fpu):
New peepholes to remove unneeded fild/fistp pairs.
(define_peephole2 atomic_loaddi_fpu): Ditto.

testsuite/ChangeLog:

2016-05-29  Uros Bizjak  

PR target/71245
* gcc.target/i386/pr71245-1.c: New test.
* gcc.target/i386/pr71245-2.c: Ditto.

Bootstrapped on x86_64-linux-gnu and regression tested with -m32/march=pentium.

Committed to mainline SVN.

Uros.
Index: config/i386/sync.md
===
--- config/i386/sync.md (revision 236861)
+++ config/i386/sync.md (working copy)
@@ -210,6 +210,34 @@
   DONE;
 })
 
+(define_peephole2
+  [(set (match_operand:DF 0 "fp_register_operand")
+   (unspec:DF [(match_operand:DI 1 "memory_operand")]
+  UNSPEC_FILD_ATOMIC))
+   (set (match_operand:DI 2 "memory_operand")
+   (unspec:DI [(match_dup 0)]
+  UNSPEC_FIST_ATOMIC))
+   (set (match_operand:DF 3 "fp_register_operand")
+   (match_operand:DF 4 "memory_operand"))]
+  "!TARGET_64BIT
+   && peep2_reg_dead_p (2, operands[0])
+   && rtx_equal_p (operands[4], adjust_address_nv (operands[2], DFmode, 0))"
+  [(set (match_dup 3) (match_dup 5))]
+  "operands[5] = gen_lowpart (DFmode, operands[1]);")
+
+(define_peephole2
+  [(set (match_operand:DI 0 "sse_reg_operand")
+   (match_operand:DI 1 "memory_operand"))
+   (set (match_operand:DI 2 "memory_operand")
+   (match_dup 0))
+   (set (match_operand:DF 3 "fp_register_operand")
+   (match_operand:DF 4 "memory_operand"))]
+  "!TARGET_64BIT
+   && peep2_reg_dead_p (2, operands[0])
+   && rtx_equal_p (operands[4], adjust_address_nv (operands[2], DFmode, 0))"
+  [(set (match_dup 3) (match_dup 5))]
+  "operands[5] = gen_lowpart (DFmode, operands[1]);")
+
 (define_expand "atomic_store"
   [(set (match_operand:ATOMIC 0 "memory_operand")
(unspec:ATOMIC [(match_operand:ATOMIC 1 "nonimmediate_operand")
@@ -298,6 +326,34 @@
   DONE;
 })
 
+(define_peephole2
+  [(set (match_operand:DF 0 "memory_operand")
+   (match_operand:DF 1 "fp_register_operand"))
+   (set (match_operand:DF 2 "fp_register_operand")
+   (unspec:DF [(match_operand:DI 3 "memory_operand")]
+  UNSPEC_FILD_ATOMIC))
+   (set (match_operand:DI 4 "memory_operand")
+   (unspec:DI [(match_dup 2)]
+  UNSPEC_FIST_ATOMIC))]
+  "!TARGET_64BIT
+   && peep2_reg_dead_p (3, operands[2])
+   && rtx_equal_p (operands[0], adjust_address_nv (operands[3], DFmode, 0))"
+  [(set (match_dup 5) (match_dup 1))]
+  "operands[5] = gen_lowpart (DFmode, operands[4]);")
+
+(define_peephole2
+  [(set (match_operand:DF 0 "memory_operand")
+   (match_operand:DF 1 "fp_register_operand"))
+   (set (match_operand:DI 2 "sse_reg_operand")
+   (match_operand:DI 3 "memory_operand"))
+   (set (match_operand:DI 4 "memory_operand")
+   (match_dup 2))]
+  "!TARGET_64BIT
+   && peep2_reg_dead_p (3, operands[2])
+   && rtx_equal_p (operands[0], adjust_address_nv (operands[3], DFmode, 0))"
+  [(set (match_dup 5) (match_dup 1))]
+  "operands[5] = gen_lowpart (DFmode, operands[4]);")
+
 ;; ??? You'd think that we'd be able to perform this via FLOAT + FIX_TRUNC
 ;; operations.  But the fix_trunc patterns want way more setup than we want
 ;; to provide.  Note that the scratch is DFmode instead of XFmode in order
Index: testsuite/gcc.target/i386/pr71245-1.c
===
--- testsuite/gcc.target/i386/pr71245-1.c   (nonexistent)
+++ testsuite/gcc.target/i386/pr71245-1.c   (working copy)
@@ -0,0 +1,22 @@
+/* PR target/71245 */
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-O2 -march=pentium -mno-sse -mfpmath=387" } */
+
+typedef union
+{
+  unsigned long long ll;
+  double d;
+} u_t;
+
+u_t d = { .d = 5.0 };
+
+void foo_d (void)
+{
+  u_t tmp;
+  
+  tmp.ll = __atomic_load_n (&d.ll, __ATOMIC_SEQ_CST);
+  tmp.d += 1.0;
+  __atomic_store_n (&d.ll, tmp.ll, __ATOMIC_SEQ_CST);
+}
+
+/* { dg-final { scan-assembler-not "(fistp|fild)" } } */
Index: testsuite/gcc.target/i386/pr71245-2.c
===
--- testsuite/gcc.target/i386/pr71245-2.c   (nonexistent)
+++ testsuite/gcc.target/i386/pr71245-2.c   (working copy)
@@ -0,0 +1,22 @@
+/* PR target/71245 */
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-O2 -march=pentium -msse -mno-sse2 -mfpmath=387" } */
+
+typedef union
+{
+  unsigned long long ll;
+  double d;
+} u_t;
+
+u_t d = { .d = 5.0 };
+
+void foo_d (void)
+{
+  u_t tmp;
+  
+  tmp.ll = __atomic_load_n (&d.ll, __ATOMIC_SEQ_CST);
+  tmp.d += 1.0;
+  __atomic_store_n (&d.ll, tmp.ll, __ATOMIC_SEQ_CST);
+}
+
+/* { dg-final { scan-assembler-not "

Re: [PATCH] gcc/config/tilegx/tilegx.c (tilegx_function_profiler): Save r10 to stack before call mcount

2016-05-29 Thread Mike Stump
On May 29, 2016, at 3:39 AM, cheng...@emindsoft.com.cn wrote:
> 
> r10 may also be as parameter for the nested function, so need save it
> before call mcount.

mcount can have a special abi where it preserves more registers than one would 
otherwise expect.  I'm wondering if you know what registers it saves or doesn't 
touch?  Does this fix any bug found by running tests, or just by inspection?

Re: [PATCH][i386] Add -march=native support for VIA nano CPUs

2016-05-29 Thread Uros Bizjak
Hello!

> When trying to compile using -march=native on a VIA nano CPU, gcc
> selects "-march=core2" "-mtune=i386" then is unable to compile, as this
> creates a conflicts between 32 bits and 64 bits compilation modes, as
> show by the following test:

[...]

> --- gcc/config/i386/driver-i386.c.origÂÂ2015-02-02 05:20:49.0
> +0100
> +++ gcc/config/i386/driver-i386.cÂÂÂ2015-08-23 01:11:03.0
> +0200
> @@ -601,15 +601,20 @@
> ÂÂswitch (family)
> {
> case 6:
> -Âif (model > 9)

The patch was corrupted by your mailer. But - can you please open a
bugreport, and refer reposted patch to this bugreport? This way, the
problem (and the patch) won't get forgotten.

Uros.


Make vectorizer to use likely upper bounds

2016-05-29 Thread Jan Hubicka
Hi,
this patch makes vectorizer to give up when likely maximal number of iterations
is low.

boostrapped/regtested x86_64-linux, will commit it once benchmark machines
pick up the earlier changes.
* tree-vect-loop.c (vect_analyze_loop_2): Use
likely_max_stmt_executions_int.
Index: tree-vect-loop.c
===
--- tree-vect-loop.c(revision 236850)
+++ tree-vect-loop.c(working copy)
@@ -1945,7 +1945,7 @@ start_over:
 LOOP_VINFO_INT_NITERS (loop_vinfo));
 
   HOST_WIDE_INT max_niter
-= max_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
+= likely_max_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
   if ((LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
&& (LOOP_VINFO_INT_NITERS (loop_vinfo) < vectorization_factor))
   || (max_niter != -1


Fix profile updating after complette unrolling

2016-05-29 Thread Jan Hubicka
Hi,
this patch fixes profile updating after complette unroll.  Present code expects
that unrolling is done only in the case there is exit which counts iteration
and the number of iterations is n_unroll.  This is no longer true because we
can derive upper bounds on number of iterations from array sizes and unroll
based on that.  In that case we will still adjust probabilities of one of the
exit edges from the loop in random ways.

It would pehraps make more sense to communicate the per-exit iteration count
analysis with the main loop duplication machinery, but I would like to do
things incrementally. There are many profile updating issues in loop code and
it seem to make sense to first fix low hanging fruit and add testcases ensuring
consistency.

Bootstrapped/regtested x86_64-linux, plan to commit after the benchmark machines
picks up my previous fix to predict.c.

Similar bugs exists in the peeling path, too, will fix that separately.

There are three main paths through the updating:
  - when we know number of iterations precisely
  - when we only have upper bound
  - when we have upper bound that is lower than number of iterations determined
for specific exit.

I added testcases only for later two tests.  Testcase for first triggers
updating errors in loop header duplication. I will look into that, too.

Honza

* predict.h (force_edge_cold): Declare.
* predict.c (force_edge_cold): New function.
* tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Fix profile
updating.
(canonicalize_loop_induction_variables): Fix formating.

* gcc.dg/tree-ssa/cunroll-12.c: New testcase.
* gcc.dg/tree-ssa/cunroll-13.c: New testcase.
Index: predict.h
===
--- predict.h   (revision 236850)
+++ predict.h   (working copy)
@@ -91,5 +91,6 @@ extern tree build_predict_expr (enum br_
 extern const char *predictor_name (enum br_predictor);
 extern void rebuild_frequencies (void);
 extern void report_predictor_hitrates (void);
+extern void force_edge_cold (edge, bool);
 
 #endif  /* GCC_PREDICT_H */
Index: predict.c
===
--- predict.c   (revision 236862)
+++ predict.c   (working copy)
@@ -3249,3 +3249,99 @@ report_predictor_hitrates (void)
   loop_optimizer_finalize ();
 }
 
+/* Force edge E to be cold.
+   If IMPOSSIBLE is true, for edge to have count and probability 0 otherwise
+   keep low probability to represent possible error in a guess.  This is used
+   i.e. in case we predict loop to likely iterate given number of times but
+   we are not 100% sure.
+
+   This function locally updates profile without attempt to keep global
+   consistency which can not be reached in full generality without full profile
+   rebuild from probabilities alone.  Doing so is not necessarily a good idea
+   because frequencies and counts may be more realistic then probabilities.
+
+   In some cases (such as for elimination of early exits during full loop
+   unrolling) the caller can ensure that profile will get consistent
+   afterwards.  */
+
+void
+force_edge_cold (edge e, bool impossible)
+{
+  gcov_type count_sum = 0;
+  int prob_sum = 0;
+  edge_iterator ei;
+  edge e2;
+  gcov_type old_count = e->count;
+  int old_probability = e->probability;
+  gcov_type gcov_scale = REG_BR_PROB_BASE;
+  int prob_scale = REG_BR_PROB_BASE;
+
+  /* If edge is already improbably or cold, just return.  */
+  if (e->probability <= impossible ? PROB_VERY_UNLIKELY : 0
+  && (!impossible || !e->count))
+return;
+  FOR_EACH_EDGE (e2, ei, e->src->succs)
+if (e2 != e)
+  {
+   count_sum += e2->count;
+   prob_sum += e2->probability;
+  }
+
+  /* If there are other edges out of e->src, redistribute probabilitity
+ there.  */
+  if (prob_sum)
+{
+  e->probability
+= MIN (e->probability, impossible ? 0 : PROB_VERY_UNLIKELY);
+  if (old_probability)
+   e->count = RDIV (e->count * e->probability, old_probability);
+  else
+e->count = MIN (e->count, impossible ? 0 : 1);
+
+  if (count_sum)
+   gcov_scale = RDIV ((count_sum + old_count - e->count) * 
REG_BR_PROB_BASE,
+  count_sum);
+  prob_scale = RDIV ((REG_BR_PROB_BASE - e->probability) * 
REG_BR_PROB_BASE,
+prob_sum);
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file, "Making edge %i->%i %s by redistributing "
+"probability to other edges.\n",
+e->src->index, e->dest->index,
+impossible ? "imposisble" : "cold");
+  FOR_EACH_EDGE (e2, ei, e->src->succs)
+   if (e2 != e)
+ {
+   e2->count = RDIV (e2->count * gcov_scale, REG_BR_PROB_BASE);
+   e2->probability = RDIV (e2->probability * prob_scale,
+   REG_BR_PROB_BASE);
+ }
+}
+  /* If all edges out of e->src a

Fix maybe_hot_frequency_p

2016-05-29 Thread Jan Hubicka
Hi,
this patch makes maybe_hot_frequency_p to use multiplication instead of 
division.
This eliminates roundoff errors that are quite serious in this case: 
HOT_BB_FREQUENCY_FRACTION is 1000 while frequencies are scaled from 0...1.
If there is loop in function then most likely no BB is considered cold. 
Hopefully
I will get rid of those fixed point arithmetic issues, but lets first fix the
bugs.

Bootstrapped/regtested x86_64-linux, will commit it shortly.

Honza

* predict.c (maybe_hot_frequency_p): Avoid division.
Index: predict.c
===
--- predict.c   (revision 236850)
+++ predict.c   (working copy)
@@ -115,8 +115,8 @@ maybe_hot_frequency_p (struct function *
 return false;
   if (PARAM_VALUE (HOT_BB_FREQUENCY_FRACTION) == 0)
 return false;
-  if (freq < (ENTRY_BLOCK_PTR_FOR_FN (fun)->frequency
- / PARAM_VALUE (HOT_BB_FREQUENCY_FRACTION)))
+  if (freq * PARAM_VALUE (HOT_BB_FREQUENCY_FRACTION)
+  < ENTRY_BLOCK_PTR_FOR_FN (fun)->frequency)
 return false;
   return true;
 }


Contents of PO file 'cpplib-6.1.0.sr.po'

2016-05-29 Thread Translation Project Robot


cpplib-6.1.0.sr.po.gz
Description: Binary data
The Translation Project robot, in the
name of your translation coordinator.



New Serbian PO file for 'cpplib' (version 6.1.0)

2016-05-29 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'cpplib' has been submitted
by the Serbian team of translators.  The file is available at:

http://translationproject.org/latest/cpplib/sr.po

(This file, 'cpplib-6.1.0.sr.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/cpplib/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/cpplib.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: [PATCH] match.pd: optimize unsigned mul overflow check

2016-05-29 Thread Marc Glisse

On Sat, 28 May 2016, Alexander Monakov wrote:


For unsigned A, B, 'A > -1 / B' is a nice predicate for checking whether 'A*B'
overflows (or 'B && A > -1 / B' if B may be zero).  Let's optimize it to an
invocation of __builtin_mul_overflow to avoid the divide operation.


Hmm, that division by zero thing is a good point. I may be confusing with 
dereferencing a null pointer, but I believe that some languages catch the 
corresponding signal, so by removing that division you would be changing 
the behavior. I wish we had a -fno-divisions-by-zero or equivalent, but 
otherwise this may require an extra check like tree_expr_nonzero_p, 
although we are quite inconsistent about this (we don't simplify x/x to 1, 
but we do simplify 0%x to 0 if x is not (yet) known to be the constant 0). 
We'll see what the reviewers think...


Any plan on optimizing the 'B && ovf' form?

--
Marc Glisse


Re: libiberty: Don't needlessly clear xmemdup allocated memory

2016-05-29 Thread DJ Delorie

Ok then.  Thanks!


Re: libiberty: Don't needlessly clear xmemdup allocated memory

2016-05-29 Thread Alan Modra
On Sat, May 28, 2016 at 10:12:19PM -0400, DJ Delorie wrote:
> 
> Alan Modra  writes:
> > * xmemdup.c (xmemdup): Use xmalloc rather than xcalloc.
> 
> In glibc at least, calloc can be faster than memset if the kernel is
> pre-zero-ing pages.  Thus, in those cases, your change makes the code
> slower by adding an unneeded memset.  Have you considered these cases?

Actually, I didn't consider that..  I was looking at usage of xmemdup
in binutils, gdb and gcc, and noticed that a lot of the calls don't
need to clear any memory, and those that do only need to clear at most
two bytes.  ie. All uses have alloc_size <= copy_size + 2.

So in real-world usage of xmemdup, I think the possible gain of fresh
sbrk memory resulting in no internal calloc memset is minimal at best
and of course if calloc is reusing freed memory then it will
internally memset to zero.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [v3 PATCH] Protect allocator-overloads of tuple-from-tuple constructors from cases that would create dangling references.

2016-05-29 Thread Ville Voutilainen
On 28 May 2016 at 21:25, Ville Voutilainen  wrote:
> The fix to avoid binding dangling references to temporaries for tuple's
> constructors that take tuples of different type didn't include the fix
> for allocator overloads. That was just lazy, and I should feel ashamed.
> This patch fixes it, and takes us one step further to pass libc++'s testsuite
> for tuple. The added _NonNestedTuple checks could actually be folded
> into the recently-added _TMCT alias, but I'll do that as a separate cleanup
> patch. For now, this should do as an easy and straightforward fix.
>
> Tested on Linux-x64.
>
> 2016-05-28  Ville Voutilainen  
>
> Protect allocator-overloads of tuple-from-tuple constructors
> from cases that would create dangling references.
> * include/std/tuple (tuple(allocator_arg_t, const _Alloc&,
>  const tuple<_UElements...>&), tuple(allocator_arg_t, const _Alloc&,
>  tuple<_UElements...>&&)): Add a check for _NonNestedTuple.
> * testsuite/20_util/tuple/cons/nested_tuple_construct.cc: Adjust.

Since Jonathan is going to be out-of-reach for next week due to a
well-deserved holiday, would it be ok
if Paolo approves such patches?


[PING] [PATCH] Fix ICE with x87 asm operands (PR inline-asm/68843)

2016-05-29 Thread Bernd Edlinger
Hi,

ping for the RTL optimization stuff.

The problem here is that the code in reg-stack.c
pretty much everywhere assumes that the stack registers
do not have gaps.  IMHO it is not worth to fix the
register allocation in a way that would be necessary for that
configuration to work correctly.

So this patch tries just to detect a situation that can't
possibly work and exit cleanly without raising any ICEs.

In this case we have a regstack of st(1) only. That is
temp_stack.top=0 and temp_stack.reg[0]=FIRST_STACK_REG+1.

So it is just by luck that the assertion in line 2522
triggers, because immediately before that we already
do std::swap (temp_stack[j], temp_stack[k]) with
j=-1 and k=0 in line 2118.  That is because of:

j = (temp_stack.top
 - (REGNO (recog_data.operand[i]) - FIRST_STACK_REG))

This formula only works, if you can assume that st(1) can
only be used in a stack that has at least two elements.

Likewise the return statement in get_hard_regnum:

return i >= 0 ? (FIRST_STACK_REG + regstack->top - i) : -1;

Which is also the same formula, and in this case
it returns FIRST_STACK_REG although the stack only has
one element FIRST_STACK_REG+1 aka st(1).



On 05/22/16 22:02, Uros Bizjak wrote:
> On Sun, May 22, 2016 at 9:00 AM, Bernd Edlinger
>  wrote:
>> Hi!
>>
>> as described in the PR there are several non-intuitive rules that
>> one has to follow to avoid ICEs with x87 asm operands.
>>
>> This patch adds an explicit rule, that avoids ICE in the first test case and
>> removes an unnecessary error message in the second test case.
>>
>>
>> Boot-strapped and regression-tested on x86_64-pc-linux-gnu.
>> OK for trunk?
>
> This patch is actually dealing with two separate problems
>
> This part:
>
> @@ -607,7 +631,7 @@ check_asm_stack_operands (rtx_insn *insn)
>record any earlyclobber.  */
>
> for (i = n_outputs; i < n_outputs + n_inputs; i++)
> -if (op_alt[i].matches == -1)
> +if (op_alt[i].matches == -1 && STACK_REG_P (recog_data.operand[i]))
> {
>int j;
>
> is OK, although, I'd written it as:
>
>
> +if (STACK_REG_P (recog_data.operand[i]) && op_alt[i].matches == -1)
>
> with slightly simplified testcase:
>
> --cut here--
> int
> __attribute__((noinline, noclone))
> test (double y)
> {
>int a, b;
>asm ("fistpl (%1)\n\t"
> "movl (%1), %0"
> : "=r" (a)
> : "r" (&b), "t" (y)
> : "st");
>return a;
> }
>
> int
> main ()
> {
>int t = -10;
>
>if (test (t) != t)
>  __builtin_abort ();
>return 0;
> }
> --cut here--
>
> BTW: It looks to me you also don't need all-memory clobber here.
>

right.

> This part is OK, with a testcase you provided it borders on obvious.
> However, you will need rtl-optimization approval for the other
> problem.
>
> Uros.
>

I changed the patch according to your comments, thanks.

I have the updated patch attached.


Thanks
Bernd.
gcc:
2016-05-22  Bernd Edlinger  

PR inline-asm/68843
* reg-stack.c (check_asm_stack_operands): Explicit input arguments
must be grouped on top of stack.  Don't force early clobber
on ordinary reg outputs.

testsuite:
2016-05-22  Bernd Edlinger  

PR inline-asm/68843
* gcc.target/i386/pr68843-1.c: New test.
* gcc.target/i386/pr68843-2.c: New test.
Index: gcc/reg-stack.c
===
--- gcc/reg-stack.c	(revision 236597)
+++ gcc/reg-stack.c	(working copy)
@@ -97,6 +97,9 @@
 	All implicitly popped input regs must be closer to the top of
 	the reg-stack than any input that is not implicitly popped.
 
+	All explicitly referenced input operands may not "skip" a reg.
+	Otherwise we can have holes in the stack.
+
3. It is possible that if an input dies in an insn, reload might
   use the input reg for an output reload.  Consider this example:
 
@@ -461,6 +464,7 @@ check_asm_stack_operands (rtx_insn *insn)
 
   char reg_used_as_output[FIRST_PSEUDO_REGISTER];
   char implicitly_dies[FIRST_PSEUDO_REGISTER];
+  char explicitly_used[FIRST_PSEUDO_REGISTER];
 
   rtx *clobber_reg = 0;
   int n_inputs, n_outputs;
@@ -568,6 +572,7 @@ check_asm_stack_operands (rtx_insn *insn)
  popped.  */
 
   memset (implicitly_dies, 0, sizeof (implicitly_dies));
+  memset (explicitly_used, 0, sizeof (explicitly_used));
   for (i = n_outputs; i < n_outputs + n_inputs; i++)
 if (STACK_REG_P (recog_data.operand[i]))
   {
@@ -581,6 +586,8 @@ check_asm_stack_operands (rtx_insn *insn)
 
 	if (j < n_clobbers || op_alt[i].matches >= 0)
 	  implicitly_dies[REGNO (recog_data.operand[i])] = 1;
+	else if (reg_class_size[(int) op_alt[i].cl] == 1)
+	  explicitly_used[REGNO (recog_data.operand[i])] = 1;
   }
 
   /* Search for first non-popped reg.  */
@@ -600,6 +607,23 @@ check_asm_stack_operands (rtx_insn *insn)
   malformed_asm = 1;
 }
 
+  /* Search for first not-explicitly used reg.  */
+  for (i = FIRST_STACK_REG; i < LAST_STACK_REG + 1; i++)
+if (! implic

Re: Revert gcc r227962

2016-05-29 Thread JonY
On 5/23/2016 16:56, JonY wrote:
> On 5/20/2016 06:36, JonY wrote:
>> On 5/20/2016 02:11, Jeff Law wrote:
>>> So if we make this change (revert 227962), my understanding is that
>>> cygwin bootstraps will fail because they won't find kernel32 and perhaps
>>> other libraries.
>>>
>>> Jeff
>>>
>>
>> I'll need to double check with trunk but gcc-5.3.0 built OK without it.
>> The other alternative is to search /usr/lib before w32api.
>>
>>
> 
> yep it reached stage 3 but failed from another error building target
> libraries (libcilkrts), meaning it was able to find the w32api libraries
> even with this patch reverted.
> 

Has it been reverted? I managed to bootstrap after disabling the failing
libraries.




signature.asc
Description: OpenPGP digital signature


[PATCH, PR69067] Remove assert in get_def_bb_for_const

2016-05-29 Thread Tom de Vries

Hi,

this patch fixes graphite PR69067, a 6/7 regression.


I.

Consider the following test-case, compiled with -O1 -floop-nest-optimize 
-flto:

...
int a1, c1, cr, kt;
int aa[2];

int
ce (void)
{
  while (a1 < 1) {
int g8;
for (g8 = 0; g8 < 3; ++g8)
  if (c1 != 0)
cr = aa[a1 * 2] = kt;
for (c1 = 0; c1 < 2; ++c1)
  aa[c1] = cr;
++a1;
  }
  return 0;
}

int
main (void)
{
  return ce (aa);
}
...

At graphite0, there's a loop with header bb4, which conditionally 
executes bb 5:

...
  :

  :
  # g8_39 = PHI <0(13), g8_19(3)>
  # cr_lsm.1_4 = PHI 
  # cr_lsm.2_22 = PHI 
  if (c1_lsm.3_35 != 0)
goto ;
  else
goto ;

  :
  aa[_3] = 0;

  :
  # cr_lsm.1_33 = PHI 
  # cr_lsm.2_11 = PHI 
  g8_19 = g8_39 + 1;
  if (g8_19 <= 2)
goto ;
  else
goto ;
...


II.

The graphite transformation moves the condition '(P_35 <= -1 || P_35 >= 
1)' out of the loop:

...
[scheduler] original ast:
{
  for (int c0 = 0; c0 <= 2; c0 += 1) {
S_4(c0);
if (P_35 <= -1 || P_35 >= 1)
  S_5(c0);
S_6(c0);
  }
  S_7();
  for (int c0 = 0; c0 <= 1; c0 += 1)
S_8(c0);
}

[scheduler] AST generated by isl:
{
  if (P_35 <= -1) {
for (int c0 = 0; c0 <= 2; c0 += 1)
  S_5(c0);
  } else if (P_35 >= 1)
for (int c0 = 0; c0 <= 2; c0 += 1)
  S_5(c0);
  for (int c0 = 0; c0 <= 2; c0 += 1) {
S_4(c0);
S_6(c0);
  }
  S_7();
  for (int c0 = 0; c0 <= 1; c0 += 1)
S_8(c0);
}
...

When instantiating the ast back to gimple, we run into an assert:
...
pr-graphite-4.c: In function ‘ce’:
pr-graphite-4.c:5:1: internal compiler error: in get_def_bb_for_const, 
at graphite-isl-ast-to-gimple.c:1795

 ce()
 ^
...


III.

What happens is the following: in copy_cond_phi_args we try to copy the 
arguments of phi in bb6 to the arguments of new_phi in bb 46

...
(gdb) call debug_gimple_stmt (phi)
cr_lsm.1_33 = PHI 
(gdb) call debug_gimple_stmt (new_phi)
cr_lsm.1_62 = PHI <(28), (47)>
...

While handling the '0' phi argument in add_phi_arg_for_new_expr we 
trigger this bit of code and call get_def_bb_for_const with bb.index == 
46 and old_bb.index == 5:

...
  /* If the corresponding def_bb could not be found the entry will
 be NULL.  */
  if (TREE_CODE (old_phi_args[i]) == INTEGER_CST)
def_pred[i]
  = get_def_bb_for_const (new_bb,
  gimple_phi_arg_edge (phi, i)->src);
...

Neither of the two copies of bb 5 dominates bb 46, so we run into the 
assert at the end:

...
/* Returns a basic block that could correspond to where a constant was
  defined in the original code.  In the original code OLD_BB had the
  definition, we need to find which basic block out of the copies of
  old_bb, in the new region, should a definition correspond to if it
  has to reach BB.  */

basic_block translate_isl_ast_to_gimple::
get_def_bb_for_const (basic_block bb, basic_block old_bb) const
{
  vec  *bbs = region->copied_bb_map->get (old_bb);

  if (!bbs || bbs->is_empty ())
return NULL;

  if (1 == bbs->length ())
return (*bbs)[0];

  int i;
  basic_block b1 = NULL, b2;
  FOR_EACH_VEC_ELT (*bbs, i, b2)
{
  if (b2 == bb)
return bb;

  /* BB and B2 are in two unrelated if-clauses.  */
  if (!dominated_by_p (CDI_DOMINATORS, bb, b2))
continue;

  /* Compute the nearest dominator.  */
  if (!b1 || dominated_by_p (CDI_DOMINATORS, b2, b1))
b1 = b2;
}

  gcc_assert (b1);
  return b1;
}
...


IV.

Attached patch fixes this by removing the assert.

Bootstrapped and reg-tested on x86_64.

OK for trunk, 6-branch?

Thanks,
- Tom
Remove assert in get_def_bb_for_const

2016-05-29  Tom de Vries  

	PR tree-optimization/69067
	* graphite-isl-ast-to-gimple.c (get_def_bb_for_const): Remove assert.

	* gcc.dg/graphite/pr69067.c: New test.

---
 gcc/graphite-isl-ast-to-gimple.c|  1 -
 gcc/testsuite/gcc.dg/graphite/pr69067.c | 28 
 2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index 049a4c5..ff1d91f 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -1792,7 +1792,6 @@ get_def_bb_for_const (basic_block bb, basic_block old_bb) const
 	b1 = b2;
 }
 
-  gcc_assert (b1);
   return b1;
 }
 
diff --git a/gcc/testsuite/gcc.dg/graphite/pr69067.c b/gcc/testsuite/gcc.dg/graphite/pr69067.c
new file mode 100644
index 000..d767381d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/graphite/pr69067.c
@@ -0,0 +1,28 @@
+/* { dg-do link } */
+/* { dg-options " -O1 -floop-nest-optimize" } */
+/* { dg-additional-options "-flto" { target lto } } */
+
+int a1, c1, cr, kt;
+int aa[2];
+
+int
+ce (void)
+{
+  while (a1 < 1)
+{
+  int g8;
+  for (g8 = 0; g8 < 3; ++g8)
+	if (c1 != 0)
+	  cr = aa[a1 * 2] = kt;
+  for (c1 = 0; c1 < 2; ++c1)
+	aa[c1] = cr;
+  ++a1;
+}
+  return 0;
+}
+
+int
+main (void)
+{
+  return ce (aa);
+}


[PATCH] gcc/config/tilegx/tilegx.c (tilegx_function_profiler): Save r10 to stack before call mcount

2016-05-29 Thread chengang
From: Chen Gang 

r10 may also be as parameter for the nested function, so need save it
before call mcount.

2016-05-29  Chen Gang  

* config/tilegx/tilegx.c (tilegx_function_profiler): Save r10
to stack before call mcount.
---
 gcc/config/tilegx/tilegx.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/gcc/config/tilegx/tilegx.c b/gcc/config/tilegx/tilegx.c
index 06c832c..bc41105 100644
--- a/gcc/config/tilegx/tilegx.c
+++ b/gcc/config/tilegx/tilegx.c
@@ -5510,18 +5510,28 @@ tilegx_function_profiler (FILE *file, int labelno 
ATTRIBUTE_UNUSED)
   if (flag_pic)
 {
   fprintf (file,
+  "\tst\tsp, r10\n"
+  "\taddi\tsp, sp, -8\n"
   "\t{\n"
   "\tmove\tr10, lr\n"
   "\tjal\tplt(%s)\n"
-  "\t}\n", MCOUNT_NAME);
+  "\t}\n"
+  "\taddi\tsp, sp, 8\n"
+  "\tld\tr10, sp\n",
+  MCOUNT_NAME);
 }
   else
 {
   fprintf (file,
+  "\tst\tsp, r10\n"
+  "\taddi\tsp, sp, -8\n"
   "\t{\n"
   "\tmove\tr10, lr\n"
   "\tjal\t%s\n"
-  "\t}\n", MCOUNT_NAME);
+  "\t}\n"
+  "\taddi\tsp, sp, 8\n"
+  "\tld\tr10, sp\n",
+  MCOUNT_NAME);
 }
 
   tilegx_in_bundle = false;
-- 
1.9.3



[wwwdocs] readings.html -- fix two links to renesas.com

2016-05-29 Thread Gerald Pfeifer
Admittedly the new addresses are nicer (I never appreciated the
original ones); just why a compentent webmaster would not add a
redirect escapes my understanding.

Applied.

Gerald

Index: htdocs/readings.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/readings.html,v
retrieving revision 1.250
diff -u -r1.250 readings.html
--- htdocs/readings.html28 May 2016 23:16:54 -  1.250
+++ htdocs/readings.html29 May 2016 10:04:33 -
@@ -161,7 +161,7 @@
  
  m32c
   Manufacturer: Renesas
-  http://www.renesas.com/products/mpumcu/m16c/m16c_landing.jsp";>Renesas 
M16C Family (R32C/M32C/M16C) Site
+  http://www.renesas.com/products/mpumcu/m16c/";>Renesas M16C 
Family (R32C/M32C/M16C) Site
   GDB includes a simulator.
  
  
@@ -262,7 +262,7 @@
  sh
   Manufacturer: Renesas, various licensees.
   CPUs include: SH1, SH2, SH2-DSP, SH3, SH3-DSP, SH4, SH4A, SH5 series.
-  http://www.renesas.com/products/mpumcu/superh/superh_landing.jsp";>Renesas 
SuperH Processors
+  http://www.renesas.com/products/mpumcu/superh/";>Renesas 
SuperH Processors
   http://shared-ptr.com/sh_insns.html";>SuperH Instruction Set 
Summary
   GDB includes a simulator.
  


[wwwdocs] Remove broken link to old Intel handook in projects/tree-ssa

2016-05-29 Thread Gerald Pfeifer
This one tries to redirect on the Intel side, and then even 
leads to a "Server not found"

Committed.

Gerald

Index: htdocs/projects/tree-ssa/vectorization.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/projects/tree-ssa/vectorization.html,v
retrieving revision 1.31
diff -u -r1.31 vectorization.html
--- htdocs/projects/tree-ssa/vectorization.html 29 Jun 2014 11:31:34 -  
1.31
+++ htdocs/projects/tree-ssa/vectorization.html 29 May 2016 08:08:08 -
@@ -1548,8 +1548,7 @@
 
 "The Software Vectorization Handbook. Applying Multimedia
 Extensions for Maximum Performance.", Aart Bik, Intel Press,
-June 2004. http://www.intel.com/intelpress/sum_vmmx.htm";>
-http://www.intel.com/intelpress/sum_vmmx.htm
+June 2004.
 
 "Vectorization for SIMD Architectures with Alignment
 Constraints", Alexandre E. Eichenberger, Peng Wu, Kevin O'brien,


[PATCH,libstdc++] Adjust link in doc/xml/manual/backwards_compatibility.xml

2016-05-29 Thread Gerald Pfeifer
There is quite a bit more to fix in the libstdc++ manuals, I'll
try to make some progress the coming weeks.

Applied.

Gerald

2016-05-29  Gerald Pfeifer  

* doc/xml/manual/backwards_compatibility.xml: Adjust
lists.debian.org link to https.
* doc/html/manual/backwards.html: Regenerate.

Index: doc/xml/manual/backwards_compatibility.xml
===
--- doc/xml/manual/backwards_compatibility.xml  (revision 236857)
+++ doc/xml/manual/backwards_compatibility.xml  (working copy)
@@ -1304,7 +1304,7 @@
   
   
http://www.w3.org/1999/xlink";
- 
xlink:href="http://lists.debian.org/debian-gcc/2006/03/msg00405.html";>
+ 
xlink:href="https://lists.debian.org/debian-gcc/2006/03/msg00405.html";>
   Building the Whole Debian Archive with GCC 4.1: A Summary

   


Re: [C++ Patch] PR 71105 ("lambdas with default captures improperly have function pointer conversions")

2016-05-29 Thread Paolo Carlini

Hi,

On 29/05/2016 05:04, Jason Merrill wrote:

OK for trunk and 6.

Thanks.
The regression is from another issue; this bug just prevents a simple 
workaround for that bug.
Ah now I see, I confused the test attached to the bug and the test 
provided inline at then end of Comment #0. I'm also adding a testcase 
reduced from the latter.


Thanks,
Paolo.


[PATCH][i386] Add -march=native support for VIA nano CPUs

2016-05-29 Thread J. Mayer
When trying to compile using -march=native on a VIA nano CPU, gcc
selects "-march=core2" "-mtune=i386" then is unable to compile, as this
creates a conflicts between 32 bits and 64 bits compilation modes, as
show by the following test:

# echo 'int main(){return 0;}' > test.c && gcc -march=native -O2
-pipe  test.c -o test && rm test.c test
Compilation fails with the following error message and informations:
[...]
test.c:1:0: error: CPU you selected does not support x86-64 instruction
set
 int main(){return 0;}
 ^

Using "-v -Q" option shows the detection problem:
gnu/gcc/../lib/gcc/x86_64-unknown-linux-gnu/4.8.4/
 test.c -march=core2 -mcx16 -msahf -mno-movbe -mno-aes -mno-pclmul
 -mno-popcnt -mno-abm -mno-lwp -mno-fma -mno-fma4 -mno-xop -mno-bmi
 -mno-bmi2 -mno-tbm -mno-avx -mno-avx2 -mno-sse4.2 -mno-sse4.1 -mno-
lzcnt
 -mno-rtm -mno-hle -mno-rdrnd -mno-f16c -mno-fsgsbase -mno-rdseed
 -mno-prfchw -mno-adx -mfxsr -mno-xsave -mno-xsaveopt
 --param l1-cache-size=64 --param l1-cache-line-size=64
 --param l2-cache-size=1024 -mtune=i386 -O2 -fno-use-linker-plugin


The following patch allows gcc to select correct compilation options,
which can be checked using "-v -Q" gcc options. GCC output becomes:
gnu/gcc/../lib/gcc/x86_64-unknown-linux-gnu/4.8.4/
 test.c -march=x86-64 -mcx16 -msahf -mno-movbe -mno-aes -mno-pclmul
 -mno-popcnt -mno-abm -mno-lwp -mno-fma -mno-fma4 -mno-xop -mno-bmi
 -mno-bmi2 -mno-tbm -mno-avx -mno-avx2 -msse3 -mssse3 -mno-sse4.2
 -mno-sse4.1 -mno-lzcnt -mno-rtm -mno-hle -mno-rdrnd -mno-f16c
 -mno-fsgsbase -mno-rdseed -mno-prfchw -mno-adx -mfxsr -mno-xsave
 -mno-xsaveopt --param l1-cache-size=64 --param l1-cache-line-size=64
 --param l2-cache-size=1024 -mtune=generic -O2 -fno-use-linker-plugin

which seems OK.
The same problem appears with gcc 4.9.3 and 5.2, and likely with
current git version.

The following patch applies to gcc >= 4.9; it has been tested by
recompiling the whole system from scratch on a Gentoo distribution
using gcc version 4.9.3 (the current stable gcc version on Gentoo
x86_64) with no issue.

Please consider applying this patch to future releases.

Jocelyn Mayer 

PS: please CC me to any answer to this mail, as I didn't subscribe to
the mailing list.

---

--- gcc/config/i386/driver-i386.c.orig  2015-02-02 05:20:49.0
+0100
+++ gcc/config/i386/driver-i386.c   2015-08-23 01:11:03.0
+0200
@@ -601,15 +601,20 @@
  switch (family)
{
case 6:
- if (model > 9)
-   /* Use the default detection procedure.  */
+ if (has_longmode)
processor = PROCESSOR_GENERIC;
- else if (model == 9)
-   cpu = "c3-2";
- else if (model >= 6)
-   cpu = "c3";
  else
-   processor = PROCESSOR_GENERIC;
+   {
+ if (model > 9)
+   /* Use the default detection procedure.  */
+   processor = PROCESSOR_GENERIC;
+ else if (model == 9)
+   cpu = "c3-2";
+ else if (model >= 6)
+   cpu = "c3";
+ else
+   processor = PROCESSOR_GENERIC;
+   }
  break;
case 5:
  if (has_3dnow)
@@ -623,6 +628,8 @@
  /* We have no idea.  */
  processor = PROCESSOR_GENERIC;
}
+   } else {
+ processor = PROCESSOR_GENERIC;
}
 }
   else
@@ -840,7 +847,12 @@
   if (arch)
{ 
  if (has_ssse3)
-   cpu = "core2";
+   {
+ if (vendor == signature_CENTAUR_ebx)
+   cpu = "x86-64";
+ else
+   cpu = "core2";
+   }
  else if (has_sse3)
{ 
  if (has_longmode)