Re: [PATCH] Fix PR58775

2013-10-20 Thread Zhenqiang Chen
On 18 October 2013 19:09, Jakub Jelinek  wrote:
> On Fri, Oct 18, 2013 at 06:36:44PM +0800, Zhenqiang Chen wrote:
> --- a/gcc/tree-ssa-reassoc.c
> +++ b/gcc/tree-ssa-reassoc.c
> @@ -2861,6 +2861,19 @@ swap_ops_for_binary_stmt (vec ops,
>  }
>  }
>
> +/* Determine if stmt A is in th next list of stmt B.  */
> +static inline bool
> +next_stmt_of (gimple a, gimple b)
> +{
> +  gimple_stmt_iterator gsi;
> +  for (gsi = gsi_for_stmt (b); !gsi_end_p (gsi); gsi_next (&gsi))
> +{
> +  if (gsi_stmt (gsi) == a)
> +   return true;
> +}
> +  return false;
> +}
>
> How is that different from appears_later_in_bb?  More importantly,
> not_dominated_by shouldn't be called with the same uid and basic block,
> unless the stmts are the same, except for the is_gimple_debug case
> (which looks like a bug).  And more importantly, if a stmt has zero uid,
> we'd better set it from the previous or next non-zero uid stmt, so that
> we don't lineary traverse the whole bb many times.

Thanks for the comments.

Patch is updated to set uid in update_range_test after force_gimple_operand_gsi.

I am not sure the patch can cover all cases. If not, I think
force_gimple_operand_gsi_1 maybe the only place to set uid for all new
generated statements.

Thanks!
-Zhenqiang

> That said, I've lost track of the bugfixes for this, don't know if there is
> any patch still pending or not.
>
> Jakub


pr58775-2.patch
Description: Binary data


Re: Honnor ix86_accumulate_outgoing_args again

2013-10-20 Thread Vladimir Makarov

On 13-10-19 4:30 PM, Jan Hubicka wrote:

Jan,

Does this seem reasonable to you?

Oops, sorry, I missed your email. (I was travelling and I am finishing a paper
now).

Thanks,
Igor


-Original Message-
From: Zamyatin, Igor
Sent: Tuesday, October 15, 2013 3:48 PM
To: Jan Hubicka
Subject: RE: Honnor ix86_accumulate_outgoing_args again

Jan,

Now we have following prologue in, say, phi0 routine in equake

0x804aa90 1  push   %ebp
0x804aa91 2  mov%esp,%ebp
0x804aa93 3  sub$0x18,%esp
0x804aa96 4  vmovsd 0x80ef7a8,%xmm0
0x804aa9e 5  vmovsd 0x8(%ebp),%xmm1
0x804aaa3 6  vcomisd %xmm1,%xmm0   <-- we see big stall somewhere here
or 1-2 instructions above

While earlier it was

0x804abd0 1 sub$0x2c,%esp
0x804abd3 2 vmovsd 0x30(%esp),%xmm1
0x804abd9 3 vmovsd 0x80efcc8,%xmm0
0x804abe1 4 vcomisd %xmm1,%xmm0

Thanks for analysis! It is a different benchmark than for bulldozer, but
apparently same case.  Again we used to eliminate frame pointer here but IRS
now doesn't Do you see the same regression with -fno-omit-frame-pointer
-maccumulate-outgoing-args?

I suppose this is a conflict in between the push instruction hanled by stack
engine and initialization of EBP that isn't.  That would explain why bulldozer
don't seem to care about this particular benchmark (its stack engine seems to
have quite different design).

This is a bit sad situation - accumulate-outgoing-args is expensive code size
wise and it seems we don't really need esp with -mno-accumulate-outgoing-args.
The non-accumulation code path was mistakely disabled for too long ;(

Vladimir, how much effort do you think it will be to fix the frame pointer
elimination here?

My guess is a week.  The problem I am busy and having some problems with two
small projects right now which I'd like to include into gcc-4.9.

But I think, this still can be fixed on stage2 as it is a PR.


I can imagine it is a quite tricky case. If so I would suggest adding m_CORE_ALL
to X86_TUNE_ACCUMULATE_OUTGOING_ARGS with a comment explaining the problem and
mentioning the regression on equake on core and mgrid on Bulldizer and opening
an enhancement request for this...

I also wonder if direct ESP use and push/pop instructions are causing so
noticeable issues, I wonder if we can't "shrink wrap" this into red-zone in the
64bit compilation.  It seems that even with -maccumulate-outgoing-args pushing
the frame allocation as late as possible in the function would be a good idea
so it is not close to the push/pop/call/ret.






Re: patch to enable LRA for ppc

2013-10-20 Thread Vladimir Makarov

On 13-10-18 11:26 AM, David Edelsohn wrote:

On Thu, Oct 3, 2013 at 5:02 PM, Vladimir Makarov  wrote:

The following patch permits today trunk to use LRA for ppc by default.
To switch it off -mno-lra can be used.

The patch was bootstrapped on ppc64.  GCC testsuite does not have
regressions too (in comparison with reload).  The change in rs6000.md is
for fix LRA failure on a recently added ppc test.

Vlad,

I have not forgotten this patch. We are trying to figure out the right
timeframe to make this change. The patch does affect performance --
both positively and negatively; most are in the noise but not all. And
there still are some SPEC benchmarks that fail to build with the
patch, at least in Mike's tests. And Mike is implementing some patches
to utilize reload to improve use of VSX registers, which would need to
be mirrored in LRA for the equivalent functionality.

Thanks for informing me, David.

I am ready to work on any LRA ppc issues when it will be in the trunk.  
It would be easier for me to work on LRA ppc if the patch is committed 
to the trunk and of course LRA is used as non-default local RA.


I don't know what Mike is doing on reload to use VSX registers.  I guess 
it is usage of  VSX regs as spilled locations for GENERAL regs instead 
of memory.  If it is so, it is 2 day work to add this functionality in 
LRA (as it already has analogous functionality for Intel processors and 
that gave a nice SPECFP2000 improvement for them) and probably more work 
on resolving issues especially as I have no power8.






Re: FW: MAX_PATH problems with mingw gcc

2013-10-20 Thread Joey Ye
Vladimir,

I found no more issue on patch itself. But ChangeLogs are missing.
Please refer to format in /include/ChangeLog, and generate yours in
your next email for

/include
/libcpp
/libiberty

Maintainers of libiberty are in TO list.

- Joey


Re: [PATCH] reimplement -fstrict-volatile-bitfields v4, part 1/2

2013-10-20 Thread Sandra Loosemore

On 10/18/2013 10:38 AM, Richard Biener wrote:

Sandra Loosemore  wrote:

On 10/18/2013 04:50 AM, Richard Biener wrote:

On Sat, Sep 28, 2013 at 4:19 AM, Sandra Loosemore
 wrote:

This patch fixes various -fstrict-volatile-bitfields wrong-code

bugs,

including instances where bitfield values were being quietly

truncated as

well as bugs relating to using the wrong width.  The code changes

are

identical to those in the previous version of this patch series

(originally

posted in June and re-pinged many times since then), but for this

version I

have cleaned up the test cases to remove dependencies on header

files, as

Bernd requested.


Ok.


Just to clarify, is this approval unconditional, independent of part 2
and other patches or changes still under active discussion?


Yes.


Hr.  After some further testing, I'm afraid this patch is still 
buggy.  :-(


I tried a backport to GCC 4.8 and tested on arm-none-eabi.  On the new 
pr56997-1.c testcase, it got stuck in infinite recursion between 
store_split_bit_field/store_fixed_bit_field and/or 
extract_split_bit_field/extract_fixed_bit_field.  This didn't show up in 
my previous mainline testing.


The difference between 4.8 and mainline head is the alignment of the 
incoming str_rtx passed to store_bit_field/extract_bit_field, due to the 
changes in r199898.  The alignment is 8 bits on mainline, and 32 on 4.8. 
 It seems to me that the bitfield code ought to handle store/extract 
from a more-aligned object and it's probably possible to construct an 
example that fails in this way on mainline as well.


It looks like there are conflicting assumptions between get_best_mode, 
the places that call it in store_fixed_bit_field and 
extract_fixed_bit_field, and the code that actually does the splitting 
-- which uses a unit based on the MEM_ALIGN of the incoming rtx, rather 
than its mode.  In the case where it's failing, get_best_mode is 
deciding it can't do a HImode access without splitting, but then the 
split code is assuming SImode units because of the 32-bit alignment, but 
not actually changing the mode of the rtx to match that.


On top of that, this is one of the cases that strict_volatile_bitfield_p 
checks for and returns false, but the callers of store_bit_field and 
extract_bit_field in expr.c have already fiddled with the mode of the 
incoming rtx assuming that -fstrict-volatile-bitfields does apply.  It 
doesn't get into that infinite recursion if it's compiled with 
-fno-strict-volatile-bitfields instead; in that case the incoming rtx 
has BLKmode, get_best_mode chooses SImode, and it's able to do the 
access without splitting at all.


Anyway  I tried a couple different bandaids that solved the infinite 
recursion problem but caused regressions elsewhere, and now I'm not sure 
of the right place to fix this.  Given that there is also still ongoing 
discussion about making this a 3-way switch (etc) I am going to hold off 
on committing this patch for now.


-Sandra



[PR libstdc++/58804][PR libstdc++/58729] tr2/dynamic_bitset issues.

2013-10-20 Thread Ed Smith-Rowland

Greetings.

Here is a patch to correct tr2/dynamic_bitset to use __builtin_xxxll for 
long long instead of the long versions.


Relevant bugs:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58804
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58729

Builds and *really* tests clean on x86_64-linux.

OK?

Index: include/tr2/dynamic_bitset
===
--- include/tr2/dynamic_bitset  (revision 203841)
+++ include/tr2/dynamic_bitset  (working copy)
@@ -287,7 +287,7 @@
  if (_M_w[__i] != ~static_cast(0))
return 0;
return ((this->_M_w.size() - 1) * _S_bits_per_block
-   + __builtin_popcountl(this->_M_hiword()));
+   + __builtin_popcountll(this->_M_hiword()));
   }
 
   bool
@@ -332,7 +332,7 @@
   {
size_t __result = 0;
for (size_t __i = 0; __i < this->_M_w.size(); ++__i)
- __result += __builtin_popcountl(this->_M_w[__i]);
+ __result += __builtin_popcountll(this->_M_w[__i]);
return __result;
   }
 
Index: include/tr2/dynamic_bitset.tcc
===
--- include/tr2/dynamic_bitset.tcc  (revision 203841)
+++ include/tr2/dynamic_bitset.tcc  (working copy)
@@ -131,7 +131,7 @@
  _WordT __thisword = this->_M_w[__i];
  if (__thisword != static_cast<_WordT>(0))
return (__i * _S_bits_per_block
-   + __builtin_ctzl(__thisword));
+   + __builtin_ctzll(__thisword));
}
   // not found, so return an indication of failure.
   return __not_found;
@@ -158,7 +158,7 @@
 
   if (__thisword != static_cast<_WordT>(0))
return (__i * _S_bits_per_block
-   + __builtin_ctzl(__thisword));
+   + __builtin_ctzll(__thisword));
 
   // check subsequent words
   for (++__i; __i < this->_M_w.size(); ++__i)
@@ -166,7 +166,7 @@
  __thisword = this->_M_w[__i];
  if (__thisword != static_cast<_WordT>(0))
return (__i * _S_bits_per_block
-   + __builtin_ctzl(__thisword));
+   + __builtin_ctzll(__thisword));
}
   // not found, so return an indication of failure.
   return __not_found;
2013-10-20  Edward Smith-Rowland  <3dw...@verizon.net>

PR libstdc++/58804
PR libstdc++/58729
* include/tr2/dynamic_bitset
(__dynamic_bitset_base<_WordT, _Alloc>::_M_are_all_aux,
__dynamic_bitset_base<_WordT, _Alloc>::_M_do_count):
Use __builtin_popcountll() instead of __builtin_popcountl().
* include/tr2/dynamic_bitset.tcc
(__dynamic_bitset_base<_WordT, _Alloc>::_M_do_find_first,
__dynamic_bitset_base<_WordT, _Alloc>::_M_do_find_next):
Use __builtin_ctzll() instead of __builtin_ctzl().


Re: [PATCH, SH] Add support for inlined builtin-strcmp (2/2)

2013-10-20 Thread Kaz Kojima
Christian Bruel  wrote:
> thanks for having retested this,  The tests are still not complete for 
> RTL generated functions, there are cases where no str/cmp wil be
> emitted, because we can predict than the size is less than 4 and so have
> a direct fallthru into the byte at a byte loop copying.
> 
> Also I will consider a size optimized implementation, so we don't jump
> to the library
> 
> I will post examples for this shortly (and add them as a testcase) with
> a strncmp implementation helper, that pertains to strcmp with constant
> strings by the way. Please allow me some time to complete by benchmarking.
> 
> thanks  for the hints about removing empty "" is the expanders
> 
> Kaz, before proceeding with the next patch, was your approval for 1/2
> only or 2/2 with the expander cleanup ?

Only for 1/2.  The revised 2/2 patch is pre-approved if the usual
tests are Ok, though.  One minor formatting problem:

>+   if (! optimize_insn_for_size_p () && sh4_expand_cmpstr(operands))
>+  DONE;

A space is needed just before (operands).

Regards,
kaz


Re: User-define literals for std::complex.

2013-10-20 Thread Ed Smith-Rowland

On 09/27/2013 05:39 AM, Jonathan Wakely wrote:

On 27 September 2013 05:17, Ed Smith-Rowland wrote:

The complex user-defined literals finally passed (n3779) with the resolution
to DR1473 allowing the suffix id to touch the quotes (Can't find it but I
put it in not too long ago).

I think it's been approved by the LWG and looks like it will go to a
vote by the full committee, but let's wait for that to pass before
making any changes.



Now that this is in the working paper can we go ahead?




Re: [PATCH]Fix computation of offset in ivopt

2013-10-20 Thread Bin.Cheng
On Fri, Oct 18, 2013 at 7:18 PM, Richard Biener
 wrote:
> On Thu, Oct 17, 2013 at 7:52 AM, bin.cheng  wrote:
>> Hi,
>> As noted in previous messages, GCC forces offset to unsigned in middle end.
>> It also gets offset value and stores it in HOST_WIDE_INT then uses it in
>> various computation.  In this scenario, It should use int_cst_value to do
>> additional sign extension against the type of tree node, otherwise we might
>> lose sign information.  Take function fold_plusminus_mult_expr as an
>> example, the sign information of arg01/arg11 might be lost because it uses
>> TREE_INT_CST_LOW directly.  I think this is the offender of the problem in
>> this thread.  I think we should fix the offender, rather the hacking
>> strip_offset as in the original patch, so I split original patch into two as
>> attached, with one fixing the offset of COMPONENT_REF in strip_offset_1, the
>> other fixing the mentioned sign extension problem.
>
> Index: gcc/fold-const.c
> ===
> --- gcc/fold-const.c (revision 203267)
> +++ gcc/fold-const.c (working copy)
> @@ -7270,8 +7270,8 @@ fold_plusminus_mult_expr (location_t loc, enum tre
>HOST_WIDE_INT int01, int11, tmp;
>bool swap = false;
>tree maybe_same;
> -  int01 = TREE_INT_CST_LOW (arg01);
> -  int11 = TREE_INT_CST_LOW (arg11);
> +  int01 = int_cst_value (arg01);
> +  int11 = int_cst_value (arg11);
>
> this is not correct - it will mishandle all large unsigned numbers.
As far as the patch itself.  I think the preceding host_integerp
already checks for this case:

int
host_integerp (const_tree t, int pos)
{
  if (t == NULL_TREE)
return 0;

  return (TREE_CODE (t) == INTEGER_CST
  && ((TREE_INT_CST_HIGH (t) == 0
   && (HOST_WIDE_INT) TREE_INT_CST_LOW (t) >= 0)
  || (! pos && TREE_INT_CST_HIGH (t) == -1
  && (HOST_WIDE_INT) TREE_INT_CST_LOW (t) < 0
  && !TYPE_UNSIGNED (TREE_TYPE (t)))
  || (pos && TREE_INT_CST_HIGH (t) == 0)));
}

Since the call is host_integerp (xxx, 0), it returns 0 for large
unsigned numbers, right?

>
> The real issue is that we rely on twos-complement arithmetic to work
> when operating on pointer offsets (because we convert them all to
> unsigned sizetype).  That makes interpreting the offsets or expressions
> that compute offsets hard (or impossible in some cases), as you can
> see from the issues in IVOPTs.
>
> The above means that strip_offset_1 cannot reliably look through
> MULT_EXPRs as that can make an offset X * CST that is
> "effectively" signed "surely" unsigned in the process.  Think of
> this looking into X * CST as performing a division - whose result
> is dependent on the sign of X which we lost with our unsigned
> casting.  Now, removing the MULT_EXPR look-through might
> be too pessimizing ... but it may be worth trying to see if it fixes
> your issue?  In this context I also remember a new bug filed
> recently of us not folding x /[ex] 4 * 4 to x.
>
> In the past I was working multiple times on stopping doing that
> (make all offsets unsigned), but I never managed to finish ...
>
> Richard.
>
>> Bootstrap and test on x86/x86_64/arm.  Is it OK?
>>
>> Thanks.
>> bin
>>
>> Patch a:
>> 2013-10-17  Bin Cheng  
>>
>> * tree-ssa-loop-ivopts.c (strip_offset_1): Change parameter type.
>> Count DECL_FIELD_BIT_OFFSET when computing offset for COMPONENT_REF.
>>
>> Patch b:
>> 2013-10-17  Bin Cheng  
>>
>> * fold-const.c (fold_plusminus_mult_expr): Use int_cst_value instead
>> of TREE_INT_CST_LOW.
>>
>> gcc/testsuite/ChangeLog
>> 2013-10-17  Bin Cheng  
>>
>> * gcc.dg/tree-ssa/ivopts-outside-loop-use-1.c: New test.
>>
>>> -Original Message-
>>> From: Richard Biener [mailto:richard.guent...@gmail.com]
>>> Sent: Wednesday, October 02, 2013 4:32 PM
>>> To: Bin.Cheng
>>> Cc: Bin Cheng; GCC Patches
>>> Subject: Re: [PATCH]Fix computation of offset in ivopt
>>>
>>> On Tue, Oct 1, 2013 at 6:13 PM, Bin.Cheng  wrote:
>>> > On Tue, Oct 1, 2013 at 6:50 PM, Richard Biener
>>> >  wrote:
>>> >> On Mon, Sep 30, 2013 at 7:39 AM, bin.cheng 
>>> wrote:
>>> >>>
>>> >>>
>>> >>
>>> >> I don't think you need
>>> >>
>>> >> +  /* Sign extend off if expr is in type which has lower precision
>>> >> + than HOST_WIDE_INT.  */
>>> >> +  if (TYPE_PRECISION (TREE_TYPE (expr)) <= HOST_BITS_PER_WIDE_INT)
>>> >> +off = sext_hwi (off, TYPE_PRECISION (TREE_TYPE (expr)));
>>> >>
>>> >> at least it would be suspicious if you did ...
>>> > There is a problem for example of the first message.  The iv base if
>> like:
>>> > pretmp_184 + ((sizetype) KeyIndex_180 + 1073741823) * 4 I am not sure
>>> > why but it seems (-4/0xFFFC) is represented by (1073741823*4).
>>> > For each operand strip_offset_1 returns exactly the positive number
>>> > and result of multiplication never get its chance of sign extend.
>>> > That's why the sign extend is necessary to fix the problem. Or it
>>> > shou

[wwwdocs] steering.html -- update list of members

2013-10-20 Thread Gerald Pfeifer
This updates the list of steering committee members after Joe
stepped down a bit ago.

Committed.

Gerald

Index: steering.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/steering.html,v
retrieving revision 1.35
diff -u -3 -p -r1.35 steering.html
--- steering.html   12 Feb 2012 21:15:16 -  1.35
+++ steering.html   20 Oct 2013 19:42:32 -
@@ -26,7 +26,6 @@ committee.
 place to reach them is the gcc mailinglist.
 
 
-Joe Buck (Synopsys)
 David Edelsohn (IBM)
 Kaveh R. Ghazi
 Jeffrey A. Law (Red Hat)


[C++ Patch] PR 58647

2013-10-20 Thread Paolo Carlini

Hi,

this ICE on valid, [4.7/4.8/4.9 Regression], happens only with 
-std=c++11 and is quite simple to analyze: cxx_eval_constant_expression 
sees a COMPONENT_REF and forwards to cxx_eval_component_reference, but 
the latter, evidently, only handles "fields" not functions (per the 
comment) and an ICE soon happens when it tries to use DECL_MUTABLE_P.


If, for comparison, instead of a static member function, we have a 
static free function, like in, eg


static void foo();

template void bar()
{
  foo;
}

what happens is that cxx_eval_constant_expression sees a FUNCTION_DECL 
and returns it as-is. Thus I believe that doing the same in the case at 
issue should be fine.


Tested x86_64-linux.

Thanks,
Paolo.

/
/cp
2013-10-20  Paolo Carlini  

PR c++/58647
* semantics.c (cxx_eval_constant_expression, [COMPONENT_REF]):
Handle functions as TREE_OPERAND (t, 1).

/testsuite
2013-10-20  Paolo Carlini  

PR c++/58647
* g++.dg/parse/crash66.C: New.
Index: cp/semantics.c
===
--- cp/semantics.c  (revision 203876)
+++ cp/semantics.c  (working copy)
@@ -9553,6 +9558,8 @@ cxx_eval_constant_expression (const constexpr_call
   break;
 
 case COMPONENT_REF:
+  if (is_overloaded_fn (TREE_OPERAND (t, 1)))
+   return t;
   r = cxx_eval_component_reference (call, t, allow_non_constant, addr,
non_constant_p, overflow_p);
   break;
Index: testsuite/g++.dg/parse/crash66.C
===
--- testsuite/g++.dg/parse/crash66.C(revision 0)
+++ testsuite/g++.dg/parse/crash66.C(working copy)
@@ -0,0 +1,11 @@
+// PR c++/58647
+
+struct A
+{
+  static void foo();
+};
+
+template void bar()
+{
+  A().foo;
+}


Re: [patch] Cleanup tree-ssa-ter.c exports

2013-10-20 Thread Eric Botcazou
> Unfortunately,  tree-ssa-ter.c also has 2 functions
> (find_replaceable_exprs() and  dump_replaceable_exprs()) which are
> exported and utilized by tree-outof-ssa.c  (the file is a part of the
> out-of-ssa module).  So I moved the prototypes from tree-ssa-live.h into
> a newly created tree-ssa-ter.h file,  and included it directly from
> tree-outof-ssa.c, the only consumer.

Apparently something went wrong when tree-ssa-ter.h was created.

-- 
Eric Botcazou


Re: [wide-int] Reduce the size of the scratch arrays

2013-10-20 Thread Richard Sandiford
Kenneth Zadeck  writes:
> On 10/20/2013 06:30 AM, Richard Sandiford wrote:
>> If yesterday's patch goes in, we'll only need the scratch array for
>> HWI inputs.  We can therefore shrink it to 2 HWIs.
>>
>> Tested on x86_64-linux-gnu.  OK for wide-int?
>>
>> Thanks,
>> Richard
>>
>>
>> Index: gcc/wide-int.h
>> ===
>> --- gcc/wide-int.h   2013-10-19 20:08:15.571987579 +0100
>> +++ gcc/wide-int.h   2013-10-19 20:09:12.299436972 +0100
>> @@ -778,7 +778,7 @@ struct wide_int_ref_storage : public wi:
>>   private:
>> /* Scratch space that can be used when decomposing the original integer.
>>It must live as long as this object.  */
>> -  HOST_WIDE_INT scratch[WIDE_INT_MAX_ELTS];
>> +  HOST_WIDE_INT scratch[2];
>>   
>>   public:
>> template 
> we now allow partial int precisions that are any size.   this will break 
> that.

Note that this was only "if yesterday's patch goes in".  That patch removes
the use of the scratch array from the tree decompose routine.  We already
removed the scratch array from the rtx decompose routine.  So the only
decompose routines left that need scratch space are those for pure HWI inputs.

Thanks,
Richard


Re: [wide-int] Reduce the size of the scratch arrays

2013-10-20 Thread Kenneth Zadeck

On 10/20/2013 06:30 AM, Richard Sandiford wrote:

If yesterday's patch goes in, we'll only need the scratch array for
HWI inputs.  We can therefore shrink it to 2 HWIs.

Tested on x86_64-linux-gnu.  OK for wide-int?

Thanks,
Richard


Index: gcc/wide-int.h
===
--- gcc/wide-int.h  2013-10-19 20:08:15.571987579 +0100
+++ gcc/wide-int.h  2013-10-19 20:09:12.299436972 +0100
@@ -778,7 +778,7 @@ struct wide_int_ref_storage : public wi:
  private:
/* Scratch space that can be used when decomposing the original integer.
   It must live as long as this object.  */
-  HOST_WIDE_INT scratch[WIDE_INT_MAX_ELTS];
+  HOST_WIDE_INT scratch[2];
  
  public:

template 
we now allow partial int precisions that are any size.   this will break 
that.


kenny


Re: [PATCH, rs6000] Adjust vec_unpacku patterns for little endian

2013-10-20 Thread David Edelsohn
On Sun, Oct 20, 2013 at 12:00 AM, Bill Schmidt
 wrote:
> Hi,
>
> For little endian, the permute control vector for unpacking high and low
> halves of a vector register must be reversed from the one used for big
> endian.  Fixing this corrects 27 failing tests for
> powerpc64le-unknown-linux-gnu.
>
> Bootstrapped and tested for powerpc64{,le}-unknown-linux-gnu with no new
> regressions.  Is this ok for trunk?
>
> Thanks,
> Bill
>
>
> 2013-10-19  Bill Schmidt  
>
> * altivec.md (vec_unpacku_hi_v16qi): Adjust for little endian.
> (vec_unpacku_hi_v8hi): Likewise.
> (vec_unpacku_lo_v16qi): Likewise.
> (vec_unpacku_lo_v8hi): Likewise.

Okay.

Thanks, David


Re: changing a collision resolution strategy of the symbol table of identifiers

2013-10-20 Thread Ondřej Bílka
On Sun, Oct 20, 2013 at 06:55:40PM +0600, Roman Gareev wrote:
> Dear gcc contributors,
> 
> Recently I came across the list of "ideas for speeding up GCC"
> (http://gcc.gnu.org/wiki/Speedup_areas). Among others, there was
> suggested to replace identifier hash table with other data structure.
> 
> Please find attached a patch, that switches the resolution strategy of
> a hash table used in the symbol table of identifiers from open
> addressing to separate chaining with linked list. I would be very
> grateful for your comments, feedback and ideas about further
> enhancements to gcc, in which I could take a part. I am pursuing
> master of science and looking for thesis theme, that is compiler
> related.
> 
> Below are results of testing and the patch.
> 
> During testing of the linux kernel (3.8.8) compilation time, the
> acquired results were the following: increasing by 0.17% for the
> version 4.8.0, increasing by 1.12% for the version 4.8.1, decreasing
> by 0.598% for trunk (this are average values).
> 
> 
> Tested x86_64-unknown-linux-gnu, applying to 4.8.0, 4.8.1 and trunk.
> 
> 
> diff --git a/gcc/stringpool.c b/gcc/stringpool.c
> 
> index f4d0dae..cc696f7 100644
> 
> --- a/gcc/stringpool.c
> 
> +++ b/gcc/stringpool.c
> 
> @@ -241,12 +241,8 @@ gt_pch_nx (unsigned char *x, gt_pointer_operator
> op, void *cookie)
> 
> /* SPD is saved in the PCH file and holds the information needed
> 
> to restore the string pool. */
> 
> -struct GTY(()) string_pool_data {
> 
> - ht_identifier_ptr *
> 
> - GTY((length ("%h.nslots"),
> 
Mail client broke it?


changing a collision resolution strategy of the symbol table of identifiers

2013-10-20 Thread Roman Gareev
Dear gcc contributors,

Recently I came across the list of "ideas for speeding up GCC"
(http://gcc.gnu.org/wiki/Speedup_areas). Among others, there was
suggested to replace identifier hash table with other data structure.

Please find attached a patch, that switches the resolution strategy of
a hash table used in the symbol table of identifiers from open
addressing to separate chaining with linked list. I would be very
grateful for your comments, feedback and ideas about further
enhancements to gcc, in which I could take a part. I am pursuing
master of science and looking for thesis theme, that is compiler
related.

Below are results of testing and the patch.

During testing of the linux kernel (3.8.8) compilation time, the
acquired results were the following: increasing by 0.17% for the
version 4.8.0, increasing by 1.12% for the version 4.8.1, decreasing
by 0.598% for trunk (this are average values).


Tested x86_64-unknown-linux-gnu, applying to 4.8.0, 4.8.1 and trunk.


diff --git a/gcc/stringpool.c b/gcc/stringpool.c

index f4d0dae..cc696f7 100644

--- a/gcc/stringpool.c

+++ b/gcc/stringpool.c

@@ -241,12 +241,8 @@ gt_pch_nx (unsigned char *x, gt_pointer_operator
op, void *cookie)

/* SPD is saved in the PCH file and holds the information needed

to restore the string pool. */

-struct GTY(()) string_pool_data {

- ht_identifier_ptr *

- GTY((length ("%h.nslots"),

- nested_ptr (union tree_node, "%h ? GCC_IDENT_TO_HT_IDENT (%h) : NULL",

- "%h ? HT_IDENT_TO_GCC_IDENT (%h) : NULL")))

- entries;

+struct GTY((user)) string_pool_data {

+ ht_identifier_ptr * entries;

unsigned int nslots;

unsigned int nelements;

};

@@ -284,4 +280,113 @@ gt_pch_restore_stringpool (void)

spd = NULL;

}

+/* User-provided GC marking routines for the struct ht_identifier.
They are using

+ only in GC marking routines for the struct string_pool_data. */

+

+extern void ht_identifier_ggc_mx (void *x_p);

+extern void ht_identifier_pch_nx (void *x_p);

+extern void ht_identifier_pch_nx (void *this_obj, void *x_p,
gt_pointer_operator op, void *cookie);

+

+void

+ht_identifier_ggc_mx (void *x_p)

+{

+ struct ht_identifier * x = (struct ht_identifier *)x_p;

+ struct ht_identifier * xlimit = x;

+ while (ggc_test_and_set_mark (xlimit))

+ xlimit = ((*xlimit).next);

+ while (x != xlimit)

+ {

+ union tree_node * const x0 = ((*x).next) ? HT_IDENT_TO_GCC_IDENT
(((*x).next)) : NULL;

+ gt_ggc_m_9tree_node (x0);

+ x = ((*x).next);

+ }

+}

+

+void

+ht_identifier_pch_nx (void *x_p)

+{

+ struct ht_identifier * x = (struct ht_identifier *)x_p;

+ struct ht_identifier * xlimit = x;

+ while (gt_pch_note_object (xlimit, xlimit, ht_identifier_pch_nx))

+ xlimit = ((*xlimit).next);

+ while (x != xlimit)

+ {

+ union tree_node * const x0 = ((*x).next) ? HT_IDENT_TO_GCC_IDENT
(((*x).next)) : NULL;

+ gt_pch_n_9tree_node (x0);

+ x = ((*x).next);

+ }

+}

+

+void

+ht_identifier_pch_nx (void *this_obj, void *x_p, gt_pointer_operator
op, void *cookie)

+{

+ struct ht_identifier * x = (struct ht_identifier *)x_p;

+ op (&((*x).str), cookie);

+ union tree_node * x0 = ((*x).next) ? HT_IDENT_TO_GCC_IDENT
(((*x).next)) : NULL;

+ op (&(x0), cookie);

+ (*x).next = (x0) ? (GCC_IDENT_TO_HT_IDENT ((x0))) : NULL;

+}

+

+/* A pointer walker for entries of the struct string_pool_data. */

+

+void

+string_pool_data_pch_entries_walker (void *this_obj, void *x_p,
gt_pointer_operator op, void *cookie)

+{

+ struct string_pool_data * x ATTRIBUTE_UNUSED = (struct string_pool_data *)x_p;

+ size_t l0 = (size_t)(((*x)).nslots);

+ if ((*x).entries != NULL)

+ {

+ size_t i0;

+ for (i0 = 0; i0 != (size_t)(l0); i0++)

+ {

+ union tree_node * x0 = ((*x).entries[i0]) ? HT_IDENT_TO_GCC_IDENT
(((*x).entries[i0])) : NULL;

+ op (&(x0), cookie);

+ (*x).entries[i0] = (x0) ? (GCC_IDENT_TO_HT_IDENT ((x0))) : NULL;

+ }

+ }

+}

+

+/* User-provided GC marking routines for the struct string_pool_data. */

+

+void

+gt_ggc_mx (string_pool_data *x)

+{

+ size_t l0 = (size_t)(((*x)).nslots);

+ if ((*x).entries != NULL)

+ {

+ size_t i0;

+ for (i0 = 0; i0 != (size_t)(l0); i0++)

+ {

+ ht_identifier_ggc_mx ((*x).entries[i0]);

+ union tree_node * const x0 = ((*x).entries[i0]) ?
HT_IDENT_TO_GCC_IDENT (((*x).entries[i0])) : NULL;

+ gt_ggc_m_9tree_node (x0);

+ }

+ ggc_mark ((*x).entries);

+ }

+}

+

+void

+gt_pch_nx (string_pool_data *x)

+{

+ size_t l0 = (size_t)(((*x)).nslots);

+ if ((*x).entries != NULL)

+ {

+ size_t i0;

+ for (i0 = 0; i0 != (size_t)(l0); i0++)

+ {

+ union tree_node * const x1 = ((*x).entries[i0]) ?
HT_IDENT_TO_GCC_IDENT (((*x).entries[i0])) : NULL;

+ gt_pch_n_9tree_node (x1);

+ ht_identifier_pch_nx ((*x).entries[i0]);

+ }

+ gt_pch_note_object ((*x).entries, x, string_pool_data_pch_entries_walker);

+ }

+}

+

+void

+gt_pch_nx (string_pool_data *x, gt_pointer_operator op, void *cookie)

+{

+ if ((*x).entries != NULL)

+ op (&((*x).entries), cookie);

+}

+

#include "gt-stringpool.h"

diff --git a/libcpp/inc

RE: [PATCH] reimplement -fstrict-volatile-bitfields v4, part 2/2

2013-10-20 Thread Richard Biener
Bernd Edlinger  wrote:
>Hi,
>
>>> What I would suggest is to have a -fgnu-strict-volatile-bit-fields
>>
>> Why a new option? The -fstrict-volatile-bitfields option is already
>> GCC-specific, and I think it can do what you want anyway.
>
>As I understand Richard's comment, he proposes to
>have an option for true AAPCS compliance, which will
>be allowed to break the C++11 memory model and
>which will _not_ be the default on any target.
>Name it -fstrict-volatile-bitfields.
>
>And an option that addresses your requirements,
>which will _not_ break the C++11 memory model
>and which will be the default on some targets,
>dependent on the respective ABI requirements.
>Name it -fgnu-strict-volatile-bit-fields.

Yes. You could also make it -fstrict-volatile-bitfields={off,gnu,aacps} if you 
think that's better.

Richard.

>
>Bernd.   




Re: [wide-int] Add a copy helper routine

2013-10-20 Thread Richard Biener
Richard Sandiford  wrote:
>When the wi:: patch went in, Kenny rightly complained that there were
>too many copy loops.  This patch gets rid of an unnecessary loop in
>wi::abs and consolidates the rest into a single function.  Again, it
>helps
>the ::is_sign_extended patch.
>
>Tested on x86_64-linux-gnu.  OK to install?

Ok.

Thanks,
Richard.

>Thanks,
>Richard
>
>
>Index: gcc/wide-int.h
>===
>--- gcc/wide-int.h 2013-10-20 09:38:41.175501696 +0100
>+++ gcc/wide-int.h 2013-10-20 11:32:47.425353384 +0100
>@@ -347,6 +347,9 @@ typedef generic_wide_inttemplate 
>   unsigned int get_binary_precision (const T1 &, const T2 &);
> 
>+  template 
>+  void copy (T1 &, const T2 &);
>+
> #define UNARY_PREDICATE \
>   template  bool
> #define UNARY_FUNCTION \
>@@ -870,10 +873,7 @@ inline wide_int_storage::wide_int_storag
>   STATIC_ASSERT (!wi::int_traits::host_dependent_precision);
>   wide_int_ref xi (x);
>   precision = xi.precision;
>-  unsigned int l = xi.len;
>-  for (unsigned int i = 0; i < l; ++i)
>-val[i] = xi.val[i];
>-  set_len (l);
>+  wi::copy (*this, xi);
> }
> 
> inline unsigned int
>@@ -969,6 +969,21 @@ wi::int_traits ::get_b
> return wide_int::create (wi::get_precision (x));
> }
> 
>+/* Copy the contents of Y to X, but keeping X's current precision.  */
>+template 
>+void
>+wi::copy (T1 &x, const T2 &y)
>+{
>+  HOST_WIDE_INT *xval = x.write_val ();
>+  const HOST_WIDE_INT *yval = y.get_val ();
>+  unsigned int len = y.get_len ();
>+  unsigned int i = 0;
>+  do
>+xval[i] = yval[i];
>+  while (++i < len);
>+  x.set_len (len);
>+}
>+
>/* An N-bit integer.  Until we can use typedef templates, use this
>instead.  */
> #define FIXED_WIDE_INT(N) \
>   generic_wide_int < fixed_wide_int_storage  >
>@@ -1012,10 +1027,7 @@ inline fixed_wide_int_storage ::fixed
>   /* Check for type compatibility.  We don't want to initialize a
>  fixed-width integer from something like a wide_int.  */
>  WI_BINARY_RESULT (T, FIXED_WIDE_INT (N)) *assertion ATTRIBUTE_UNUSED;
>-  wide_int_ref xi (x, N);
>-  len = xi.len;
>-  for (unsigned int i = 0; i < len; ++i)
>-val[i] = xi.val[i];
>+  wi::copy (*this, wide_int_ref (x, N));
> }
> 
> template 
>@@ -1716,16 +1728,7 @@ wi::neg (const T &x, bool *overflow)
> inline WI_UNARY_RESULT (T)
> wi::abs (const T &x)
> {
>-  if (neg_p (x))
>-return neg (x);
>-
>-  WI_UNARY_RESULT_VAR (result, val, T, x);
>-  wide_int_ref xi (x, get_precision (result));
>-  for (unsigned int i = 0; i < xi.len; ++i)
>-val[i] = xi.val[i];
>-  result.set_len (xi.len);
>-
>-  return result;
>+  return neg_p (x) ? neg (x) : x;
> }
> 
> /* Return the result of sign-extending the low OFFSET bits of X.  */
>@@ -1760,9 +1763,7 @@ wi::zext (const T &x, unsigned int offse
>  maintain canonization.  */
>   if (offset >= precision)
> {
>-  for (unsigned int i = 0; i < xi.len; ++i)
>-  val[i] = xi.val[i];
>-  result.set_len (xi.len);
>+  wi::copy (result, xi);
>   return result;
> }
> 
>@@ -1809,22 +1810,12 @@ wi::set_bit (const T &x, unsigned int bi
> inline WI_BINARY_RESULT (T1, T2)
> wi::min (const T1 &x, const T2 &y, signop sgn)
> {
>-  WI_BINARY_RESULT_VAR (result, val, T1, x, T2, y);
>+  WI_BINARY_RESULT_VAR (result, val ATTRIBUTE_UNUSED, T1, x, T2, y);
>   unsigned int precision = get_precision (result);
>   if (wi::le_p (x, y, sgn))
>-{
>-  wide_int_ref xi (x, precision);
>-  for (unsigned int i = 0; i < xi.len; ++i)
>-  val[i] = xi.val[i];
>-  result.set_len (xi.len);
>-}
>+wi::copy (result, wide_int_ref (x, precision));
>   else
>-{
>-  wide_int_ref yi (y, precision);
>-  for (unsigned int i = 0; i < yi.len; ++i)
>-  val[i] = yi.val[i];
>-  result.set_len (yi.len);
>-}
>+wi::copy (result, wide_int_ref (y, precision));
>   return result;
> }
> 
>@@ -1850,22 +1841,12 @@ wi::umin (const T1 &x, const T2 &y)
> inline WI_BINARY_RESULT (T1, T2)
> wi::max (const T1 &x, const T2 &y, signop sgn)
> {
>-  WI_BINARY_RESULT_VAR (result, val, T1, x, T2, y);
>+  WI_BINARY_RESULT_VAR (result, val ATTRIBUTE_UNUSED, T1, x, T2, y);
>   unsigned int precision = get_precision (result);
>   if (wi::ge_p (x, y, sgn))
>-{
>-  wide_int_ref xi (x, precision);
>-  for (unsigned int i = 0; i < xi.len; ++i)
>-  val[i] = xi.val[i];
>-  result.set_len (xi.len);
>-}
>+wi::copy (result, wide_int_ref (x, precision));
>   else
>-{
>-  wide_int_ref yi (y, precision);
>-  for (unsigned int i = 0; i < yi.len; ++i)
>-  val[i] = yi.val[i];
>-  result.set_len (yi.len);
>-}
>+wi::copy (result, wide_int_ref (y, precision));
>   return result;
> }
> 




Re: [wide-int] Reduce the size of the scratch arrays

2013-10-20 Thread Richard Biener
Richard Sandiford  wrote:
>If yesterday's patch goes in, we'll only need the scratch array for
>HWI inputs.  We can therefore shrink it to 2 HWIs.

>Tested on x86_64-linux-gnu.  OK for wide-int?
>

Ok

Thanks,
Richard.

>Thanks,
>Richard
>
>
>Index: gcc/wide-int.h
>===
>--- gcc/wide-int.h 2013-10-19 20:08:15.571987579 +0100
>+++ gcc/wide-int.h 2013-10-19 20:09:12.299436972 +0100
>@@ -778,7 +778,7 @@ struct wide_int_ref_storage : public wi:
> private:
>/* Scratch space that can be used when decomposing the original
>integer.
>  It must live as long as this object.  */
>-  HOST_WIDE_INT scratch[WIDE_INT_MAX_ELTS];
>+  HOST_WIDE_INT scratch[2];
> 
> public:
>   template 




Re: [wide-int] Turn lts_p & co. back into template functions

2013-10-20 Thread Richard Biener
Richard Sandiford  wrote:
>This patch just converts some functions back to template functions,
>for the benefit of the upcoming ::is_sign_extended patch.  There's no
>behavioural change.
>
>Tested on x86_64-linux-gnu.  OK for wide-int?

Ok.

Thanks,
Richard.

>Thanks,
>Richard
>
>
>Index: gcc/gcc/wide-int.h
>===
>--- gcc.orig/gcc/wide-int.h
>+++ gcc/gcc/wide-int.h
>@@ -347,42 +347,48 @@ namespace wi
>   template 
>   unsigned int get_binary_precision (const T1 &, const T2 &);
> 
>-  bool fits_shwi_p (const wide_int_ref &);
>-  bool fits_uhwi_p (const wide_int_ref &);
>-  bool neg_p (const wide_int_ref &, signop = SIGNED);
>-  bool only_sign_bit_p (const wide_int_ref &, unsigned int);
>-  bool only_sign_bit_p (const wide_int_ref &);
>-  HOST_WIDE_INT sign_mask (const wide_int_ref &);
>-
>-  template 
>-  bool eq_p (const T1 &, const T2 &);
>-
>-  template 
>-  bool ne_p (const T1 &, const T2 &);
>-
>-  bool lt_p (const wide_int_ref &, const wide_int_ref &, signop);
>-  bool lts_p (const wide_int_ref &, const wide_int_ref &);
>-  bool ltu_p (const wide_int_ref &, const wide_int_ref &);
>-  bool le_p (const wide_int_ref &, const wide_int_ref &, signop);
>-  bool les_p (const wide_int_ref &, const wide_int_ref &);
>-  bool leu_p (const wide_int_ref &, const wide_int_ref &);
>-  bool gt_p (const wide_int_ref &, const wide_int_ref &, signop);
>-  bool gts_p (const wide_int_ref &, const wide_int_ref &);
>-  bool gtu_p (const wide_int_ref &, const wide_int_ref &);
>-  bool ge_p (const wide_int_ref &, const wide_int_ref &, signop);
>-  bool ges_p (const wide_int_ref &, const wide_int_ref &);
>-  bool geu_p (const wide_int_ref &, const wide_int_ref &);
>-  int cmp (const wide_int_ref &, const wide_int_ref &, signop);
>-  int cmps (const wide_int_ref &, const wide_int_ref &);
>-  int cmpu (const wide_int_ref &, const wide_int_ref &);
>-
>+#define UNARY_PREDICATE \
>+  template  bool
> #define UNARY_FUNCTION \
>   template  WI_UNARY_RESULT (T)
>+#define BINARY_PREDICATE \
>+  template  bool
> #define BINARY_FUNCTION \
>   template  WI_BINARY_RESULT (T1, T2)
> #define SHIFT_FUNCTION \
>   template  WI_UNARY_RESULT (T)
> 
>+  UNARY_PREDICATE fits_shwi_p (const T &);
>+  UNARY_PREDICATE fits_uhwi_p (const T &);
>+  UNARY_PREDICATE neg_p (const T &, signop = SIGNED);
>+
>+  template 
>+  HOST_WIDE_INT sign_mask (const T &);
>+
>+  BINARY_PREDICATE eq_p (const T1 &, const T2 &);
>+  BINARY_PREDICATE ne_p (const T1 &, const T2 &);
>+  BINARY_PREDICATE lt_p (const T1 &, const T2 &, signop);
>+  BINARY_PREDICATE lts_p (const T1 &, const T2 &);
>+  BINARY_PREDICATE ltu_p (const T1 &, const T2 &);
>+  BINARY_PREDICATE le_p (const T1 &, const T2 &, signop);
>+  BINARY_PREDICATE les_p (const T1 &, const T2 &);
>+  BINARY_PREDICATE leu_p (const T1 &, const T2 &);
>+  BINARY_PREDICATE gt_p (const T1 &, const T2 &, signop);
>+  BINARY_PREDICATE gts_p (const T1 &, const T2 &);
>+  BINARY_PREDICATE gtu_p (const T1 &, const T2 &);
>+  BINARY_PREDICATE ge_p (const T1 &, const T2 &, signop);
>+  BINARY_PREDICATE ges_p (const T1 &, const T2 &);
>+  BINARY_PREDICATE geu_p (const T1 &, const T2 &);
>+
>+  template 
>+  int cmp (const T1 &, const T2 &, signop);
>+
>+  template 
>+  int cmps (const T1 &, const T2 &);
>+
>+  template 
>+  int cmpu (const T1 &, const T2 &);
>+
>   UNARY_FUNCTION bit_not (const T &);
>   UNARY_FUNCTION neg (const T &);
>   UNARY_FUNCTION neg (const T &, bool *);
>@@ -446,9 +452,13 @@ namespace wi
>SHIFT_FUNCTION rrotate (const T &, const wide_int_ref &, unsigned int =
>0);
> 
> #undef SHIFT_FUNCTION
>+#undef BINARY_PREDICATE
> #undef BINARY_FUNCTION
>+#undef UNARY_PREDICATE
> #undef UNARY_FUNCTION
> 
>+  bool only_sign_bit_p (const wide_int_ref &, unsigned int);
>+  bool only_sign_bit_p (const wide_int_ref &);
>   int clz (const wide_int_ref &);
>   int clrsb (const wide_int_ref &);
>   int ctz (const wide_int_ref &);
>@@ -1401,40 +1411,48 @@ wi::get_binary_precision (const T1 &x, c
> 
> /* Return true if X fits in a HOST_WIDE_INT with no loss of
>precision.  */
>+template 
> inline bool
>-wi::fits_shwi_p (const wide_int_ref &x)
>+wi::fits_shwi_p (const T &x)
> {
>-  return x.len == 1;
>+  wide_int_ref xi (x);
>+  return xi.len == 1;
> }
> 
> /* Return true if X fits in an unsigned HOST_WIDE_INT with no loss of
>precision.  */
>+template 
> inline bool
>-wi::fits_uhwi_p (const wide_int_ref &x)
>+wi::fits_uhwi_p (const T &x)
> {
>-  if (x.precision <= HOST_BITS_PER_WIDE_INT)
>+  wide_int_ref xi (x);
>+  if (xi.precision <= HOST_BITS_PER_WIDE_INT)
> return true;
>-  if (x.len == 1)
>-return x.slow () >= 0;
>-  return x.len == 2 && x.uhigh () == 0;
>+  if (xi.len == 1)
>+return xi.slow () >= 0;
>+  return xi.len == 2 && xi.uhigh () == 0;
> }
> 
> /* Return true if X is negative based on the interpretation of SGN.
>For UNSIGNED, this is always false.  */
>+template 
> inline bool
>-wi::neg_p (const wide_int_ref &x, signop sgn)
>+

Re: Always treat sext_hwi and zext_hwi as inline

2013-10-20 Thread Richard Biener
Richard Sandiford  wrote:
>At the moment there are two copies of sext_hwi and zext_hwi, one inline
>for !ENABLE_CHECKING and one out-of-line for ENABLE_CHECKING.  However,
>there are several wide-int callers where it's obvious from context that
>the precision is <= HOST_BITS_PER_WIDE_INT, so if the functions are
>inline for ENABLE_CHECKING, the assert can often be compiled away.
>
>This improves the ENABLE_CHECKING code for some query functions on
>wide-int branch.  Tested on x86_64-linux-gnu.  OK for mainline?

Ok.  
>Thanks,
>Richard
>
>
>gcc/
>   * system.h: Move hwint.h include further down.
>   * hwint.h (sext_hwi, zext_hwi): Define unconditionally.  Add
>   gcc_checking_asserts.
>   * hwint.c (sext_hwi, zext_hwi): Delete ENABLE_CHECKING versions.
>
>Index: gcc/hwint.c
>===
>--- gcc/hwint.c2013-08-21 19:28:49.560621645 +0100
>+++ gcc/hwint.c2013-10-19 20:05:43.399978400 +0100
>@@ -204,35 +204,3 @@ least_common_multiple (HOST_WIDE_INT a,
> {
>   return mul_hwi (abs_hwi (a) / gcd (a, b), abs_hwi (b));
> }
>-
>-#ifdef ENABLE_CHECKING
>-/* Sign extend SRC starting from PREC.  */
>-
>-HOST_WIDE_INT
>-sext_hwi (HOST_WIDE_INT src, unsigned int prec)
>-{
>-  gcc_checking_assert (prec <= HOST_BITS_PER_WIDE_INT);
>-
>-  if (prec == HOST_BITS_PER_WIDE_INT)
>-return src;
>-  else
>-{
>-  int shift = HOST_BITS_PER_WIDE_INT - prec;
>-  return (src << shift) >> shift;
>-}
>-}
>-
>-/* Zero extend SRC starting from PREC.  */
>-
>-unsigned HOST_WIDE_INT
>-zext_hwi (unsigned HOST_WIDE_INT src, unsigned int prec)
>-{
>-  gcc_checking_assert (prec <= HOST_BITS_PER_WIDE_INT);
>-
>-  if (prec == HOST_BITS_PER_WIDE_INT)
>-return src;
>-  else
>-return src & (((HOST_WIDE_INT)1 << prec) - 1);
>-}
>-
>-#endif
>Index: gcc/hwint.h
>===
>--- gcc/hwint.h2013-09-05 20:55:31.192518091 +0100
>+++ gcc/hwint.h2013-10-19 20:05:23.469855942 +0100
>@@ -322,9 +322,6 @@ extern HOST_WIDE_INT least_common_multip
> 
> /* Sign extend SRC starting from PREC.  */
> 
>-#ifdef ENABLE_CHECKING
>-extern HOST_WIDE_INT sext_hwi (HOST_WIDE_INT, unsigned int);
>-#else
> static inline HOST_WIDE_INT
> sext_hwi (HOST_WIDE_INT src, unsigned int prec)
> {
>@@ -332,24 +329,23 @@ sext_hwi (HOST_WIDE_INT src, unsigned in
> return src;
>   else
> {
>+  gcc_checking_assert (prec < HOST_BITS_PER_WIDE_INT);
>   int shift = HOST_BITS_PER_WIDE_INT - prec;
>   return (src << shift) >> shift;
> }
> }
>-#endif
> 
> /* Zero extend SRC starting from PREC.  */
>-#ifdef ENABLE_CHECKING
>-extern unsigned HOST_WIDE_INT zext_hwi (unsigned HOST_WIDE_INT,
>unsigned int);
>-#else
> static inline unsigned HOST_WIDE_INT
> zext_hwi (unsigned HOST_WIDE_INT src, unsigned int prec)
> {
>   if (prec == HOST_BITS_PER_WIDE_INT)
> return src;
>   else
>-return src & (((HOST_WIDE_INT)1 << prec) - 1);
>+{
>+  gcc_checking_assert (prec < HOST_BITS_PER_WIDE_INT);
>+  return src & (((HOST_WIDE_INT) 1 << prec) - 1);
>+}
> }
>-#endif
> 
> #endif /* ! GCC_HWINT_H */
>Index: gcc/system.h
>===
>--- gcc/system.h   2013-09-07 10:19:22.797432289 +0100
>+++ gcc/system.h   2013-10-19 20:06:14.417170260 +0100
>@@ -272,9 +272,6 @@ #define ICE_EXIT_CODE 4
> # include 
> #endif
> 
>-/* Get definitions of HOST_WIDE_INT and HOST_WIDEST_INT.  */
>-#include "hwint.h"
>-
> /* A macro to determine whether a VALUE lies inclusively within a
>certain range without evaluating the VALUE more than once.  This
>macro won't warn if the VALUE is unsigned and the LOWER bound is
>@@ -1066,4 +1063,7 @@ #define DEBUG_FUNCTION
> #define DEBUG_VARIABLE
> #endif
> 
>+/* Get definitions of HOST_WIDE_INT and HOST_WIDEST_INT.  */
>+#include "hwint.h"
>+
> #endif /* ! GCC_SYSTEM_H */

Abs_hwi might be treated the same. IIRC it is out of line as the assert macros 
were not available in hwi.h.

Thanks,
Richard.


Re: [PATCH, SH] Add support for inlined builtin-strcmp (2/2)

2013-10-20 Thread Christian Bruel
Hi Oleg,

On 10/19/2013 11:30 AM, Oleg Endo wrote:

>
>
> I've attached two test cases, tested with 
> make -k check-gcc RUNTESTFLAGS="sh.exp=strcmp* --target_board=sh-sim
> \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"
>
> Could you please include them?
>
> Cheers,
> Oleg
>
>
thanks for having retested this,  The tests are still not complete for 
RTL generated functions, there are cases where no str/cmp wil be
emitted, because we can predict than the size is less than 4 and so have
a direct fallthru into the byte at a byte loop copying.

Also I will consider a size optimized implementation, so we don't jump
to the library

I will post examples for this shortly (and add them as a testcase) with
a strncmp implementation helper, that pertains to strcmp with constant
strings by the way. Please allow me some time to complete by benchmarking.

thanks  for the hints about removing empty "" is the expanders

Kaz, before proceeding with the next patch, was your approval for 1/2
only or 2/2 with the expander cleanup ?

Many thanks,

Christian






[wide-int] Add is_sign_extended optimisation

2013-10-20 Thread Richard Sandiford
Another follow-up to yesterday's patch.  This one implements Richard's
suggestion of having an is_sign_extended trait to optimise cases where
excess upper bits are known to be signs rather than undefined.

The uses so far are:

* make to_shwi () equivalent to slow ()
* turn eq_p into a simple loop
* avoid extensions in sign_mask ()
* avoid extensions in set_len if the input was already sign-extended

The first two are new (compared to wide-int svn) while the second two
partially undo the negative effects of yesterday's patch on is_sign_extended
values.

E.g.

  bool
  f (wide_int x, HOST_WIDE_INT y)
  {
return x == y;
  }

now gives:

xorl%eax, %eax
cmpl$1, 264(%rsp)
je  .L27
ret
.p2align 4,,10
.p2align 3
.L27:   
cmpq8(%rsp), %rdi
sete%al
ret

And:

  wide_int
  f (wide_int x, wide_int y)
  {
return x == y;
  }

gives:

movl264(%rsp), %ecx
xorl%eax, %eax
cmpl528(%rsp), %ecx
je  .L42
rep ret
.p2align 4,,10
.p2align 3
.L42:
xorl%eax, %eax
jmp .L38
.p2align 4,,10
.p2align 3
.L44:
addl$1, %eax
cmpl%eax, %ecx
je  .L43
.L38:
movl%eax, %edx
movq272(%rsp,%rdx,8), %rsi
cmpq%rsi, 8(%rsp,%rdx,8)
je  .L44
xorl%eax, %eax
ret
.p2align 4,,10
.p2align 3
.L43:
movl$1, %eax
ret

(which is a bit poor -- "je .L42" trivially threads to "je. L38",
although that's probably only true after RA).

The code for:

  bool
  f (wide_int x, unsigned HOST_WIDE_INT y)
  {
return x == y;
  }

still needs some work though...

The lts_p sequences of wide_int are similar to the ones Mike posted.

Tested on x86_64-linux-gnu.  OK for wide-int?

Thanks,
Richard


Index: gcc/rtl.h
===
--- gcc/rtl.h   2013-10-20 09:38:40.254493991 +0100
+++ gcc/rtl.h   2013-10-20 09:39:28.169894855 +0100
@@ -1410,6 +1410,7 @@ typedef std::pair  wide_int;
 
+template 
 struct wide_int_ref_storage;
-typedef generic_wide_int  wide_int_ref;
+
+typedef generic_wide_int  > wide_int_ref;
+
+/* This can be used instead of wide_int_ref if the referenced value is
+   known to have type T.  It carries across properties of T's representation,
+   such as whether excess upper bits in a HWI are defined, and can therefore
+   help avoid redundant work.
+
+   The macro could be replaced with a template typedef, once we're able
+   to use those.  */
+#define WIDE_INT_REF_FOR(T) \
+  generic_wide_int \
+::is_sign_extended> >
 
 /* Public functions for querying and operating on integers.  */
 namespace wi
@@ -520,18 +533,6 @@ wi::storage_ref::get_val () const
   return val;
 }
 
-namespace wi
-{
-  template <>
-  struct int_traits 
-  {
-static const enum precision_type precision_type = VAR_PRECISION;
-/* wi::storage_ref can be a reference to a primitive type,
-   so this is the conservatively-correct setting.  */
-static const bool host_dependent_precision = true;
-  };
-}
-
 /* This class defines an integer type using the storage provided by the
template argument.  The storage class must provide the following
functions:
@@ -626,6 +627,9 @@ #define INCDEC_OPERATOR(OP, DELTA) \
 #undef INCDEC_OPERATOR
 
   char *dump (char *) const;
+
+  static const bool is_sign_extended
+= wi::int_traits  >::is_sign_extended;
 };
 
 template 
@@ -653,7 +657,11 @@ inline generic_wide_int ::gener
 generic_wide_int ::to_shwi (unsigned int precision) const
 {
   if (precision == 0)
-precision = this->get_precision ();
+{
+  if (is_sign_extended)
+   return this->get_val ()[0];
+  precision = this->get_precision ();
+}
   if (precision < HOST_BITS_PER_WIDE_INT)
 return sext_hwi (this->get_val ()[0], precision);
   else
@@ -692,11 +700,14 @@ generic_wide_int ::to_short_add
 generic_wide_int ::sign_mask () const
 {
   unsigned int len = this->get_len ();
-  unsigned int precision = this->get_precision ();
   unsigned HOST_WIDE_INT high = this->get_val ()[len - 1];
-  int excess = len * HOST_BITS_PER_WIDE_INT - precision;
-  if (excess > 0)
-high <<= excess;
+  if (!is_sign_extended)
+{
+  unsigned int precision = this->get_precision ();
+  int excess = len * HOST_BITS_PER_WIDE_INT - precision;
+  if (excess > 0)
+   high <<= excess;
+}
   return HOST_WIDE_INT (high) < 0 ? -1 : 0;
 }
 
@@ -781,6 +792,7 @@ decompose (HOST_WIDE_INT *, unsigned int
 /* Provide the storage for a wide_int_ref.  This acts like a read-only
wide_int, with the optimization that VAL is normally a pointer to
another integer's storage, so that no array copy is needed.  */
+template 
 struct wide_int_ref_storage : public wi::storage_ref
 {
 private:
@@ -799,17 +811,19 @@ struct wide_int_ref_storage : public wi:
 /* Create a ref

[wide-int] Add a copy helper routine

2013-10-20 Thread Richard Sandiford
When the wi:: patch went in, Kenny rightly complained that there were
too many copy loops.  This patch gets rid of an unnecessary loop in
wi::abs and consolidates the rest into a single function.  Again, it helps
the ::is_sign_extended patch.

Tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard


Index: gcc/wide-int.h
===
--- gcc/wide-int.h  2013-10-20 09:38:41.175501696 +0100
+++ gcc/wide-int.h  2013-10-20 11:32:47.425353384 +0100
@@ -347,6 +347,9 @@ typedef generic_wide_int 
   unsigned int get_binary_precision (const T1 &, const T2 &);
 
+  template 
+  void copy (T1 &, const T2 &);
+
 #define UNARY_PREDICATE \
   template  bool
 #define UNARY_FUNCTION \
@@ -870,10 +873,7 @@ inline wide_int_storage::wide_int_storag
   STATIC_ASSERT (!wi::int_traits::host_dependent_precision);
   wide_int_ref xi (x);
   precision = xi.precision;
-  unsigned int l = xi.len;
-  for (unsigned int i = 0; i < l; ++i)
-val[i] = xi.val[i];
-  set_len (l);
+  wi::copy (*this, xi);
 }
 
 inline unsigned int
@@ -969,6 +969,21 @@ wi::int_traits ::get_b
 return wide_int::create (wi::get_precision (x));
 }
 
+/* Copy the contents of Y to X, but keeping X's current precision.  */
+template 
+void
+wi::copy (T1 &x, const T2 &y)
+{
+  HOST_WIDE_INT *xval = x.write_val ();
+  const HOST_WIDE_INT *yval = y.get_val ();
+  unsigned int len = y.get_len ();
+  unsigned int i = 0;
+  do
+xval[i] = yval[i];
+  while (++i < len);
+  x.set_len (len);
+}
+
 /* An N-bit integer.  Until we can use typedef templates, use this instead.  */
 #define FIXED_WIDE_INT(N) \
   generic_wide_int < fixed_wide_int_storage  >
@@ -1012,10 +1027,7 @@ inline fixed_wide_int_storage ::fixed
   /* Check for type compatibility.  We don't want to initialize a
  fixed-width integer from something like a wide_int.  */
   WI_BINARY_RESULT (T, FIXED_WIDE_INT (N)) *assertion ATTRIBUTE_UNUSED;
-  wide_int_ref xi (x, N);
-  len = xi.len;
-  for (unsigned int i = 0; i < len; ++i)
-val[i] = xi.val[i];
+  wi::copy (*this, wide_int_ref (x, N));
 }
 
 template 
@@ -1716,16 +1728,7 @@ wi::neg (const T &x, bool *overflow)
 inline WI_UNARY_RESULT (T)
 wi::abs (const T &x)
 {
-  if (neg_p (x))
-return neg (x);
-
-  WI_UNARY_RESULT_VAR (result, val, T, x);
-  wide_int_ref xi (x, get_precision (result));
-  for (unsigned int i = 0; i < xi.len; ++i)
-val[i] = xi.val[i];
-  result.set_len (xi.len);
-
-  return result;
+  return neg_p (x) ? neg (x) : x;
 }
 
 /* Return the result of sign-extending the low OFFSET bits of X.  */
@@ -1760,9 +1763,7 @@ wi::zext (const T &x, unsigned int offse
  maintain canonization.  */
   if (offset >= precision)
 {
-  for (unsigned int i = 0; i < xi.len; ++i)
-   val[i] = xi.val[i];
-  result.set_len (xi.len);
+  wi::copy (result, xi);
   return result;
 }
 
@@ -1809,22 +1810,12 @@ wi::set_bit (const T &x, unsigned int bi
 inline WI_BINARY_RESULT (T1, T2)
 wi::min (const T1 &x, const T2 &y, signop sgn)
 {
-  WI_BINARY_RESULT_VAR (result, val, T1, x, T2, y);
+  WI_BINARY_RESULT_VAR (result, val ATTRIBUTE_UNUSED, T1, x, T2, y);
   unsigned int precision = get_precision (result);
   if (wi::le_p (x, y, sgn))
-{
-  wide_int_ref xi (x, precision);
-  for (unsigned int i = 0; i < xi.len; ++i)
-   val[i] = xi.val[i];
-  result.set_len (xi.len);
-}
+wi::copy (result, wide_int_ref (x, precision));
   else
-{
-  wide_int_ref yi (y, precision);
-  for (unsigned int i = 0; i < yi.len; ++i)
-   val[i] = yi.val[i];
-  result.set_len (yi.len);
-}
+wi::copy (result, wide_int_ref (y, precision));
   return result;
 }
 
@@ -1850,22 +1841,12 @@ wi::umin (const T1 &x, const T2 &y)
 inline WI_BINARY_RESULT (T1, T2)
 wi::max (const T1 &x, const T2 &y, signop sgn)
 {
-  WI_BINARY_RESULT_VAR (result, val, T1, x, T2, y);
+  WI_BINARY_RESULT_VAR (result, val ATTRIBUTE_UNUSED, T1, x, T2, y);
   unsigned int precision = get_precision (result);
   if (wi::ge_p (x, y, sgn))
-{
-  wide_int_ref xi (x, precision);
-  for (unsigned int i = 0; i < xi.len; ++i)
-   val[i] = xi.val[i];
-  result.set_len (xi.len);
-}
+wi::copy (result, wide_int_ref (x, precision));
   else
-{
-  wide_int_ref yi (y, precision);
-  for (unsigned int i = 0; i < yi.len; ++i)
-   val[i] = yi.val[i];
-  result.set_len (yi.len);
-}
+wi::copy (result, wide_int_ref (y, precision));
   return result;
 }
 


[wide-int] Reduce the size of the scratch arrays

2013-10-20 Thread Richard Sandiford
If yesterday's patch goes in, we'll only need the scratch array for
HWI inputs.  We can therefore shrink it to 2 HWIs.

Tested on x86_64-linux-gnu.  OK for wide-int?

Thanks,
Richard


Index: gcc/wide-int.h
===
--- gcc/wide-int.h  2013-10-19 20:08:15.571987579 +0100
+++ gcc/wide-int.h  2013-10-19 20:09:12.299436972 +0100
@@ -778,7 +778,7 @@ struct wide_int_ref_storage : public wi:
 private:
   /* Scratch space that can be used when decomposing the original integer.
  It must live as long as this object.  */
-  HOST_WIDE_INT scratch[WIDE_INT_MAX_ELTS];
+  HOST_WIDE_INT scratch[2];
 
 public:
   template 


[wide-int] Turn lts_p & co. back into template functions

2013-10-20 Thread Richard Sandiford
This patch just converts some functions back to template functions,
for the benefit of the upcoming ::is_sign_extended patch.  There's no
behavioural change.

Tested on x86_64-linux-gnu.  OK for wide-int?

Thanks,
Richard


Index: gcc/gcc/wide-int.h
===
--- gcc.orig/gcc/wide-int.h
+++ gcc/gcc/wide-int.h
@@ -347,42 +347,48 @@ namespace wi
   template 
   unsigned int get_binary_precision (const T1 &, const T2 &);
 
-  bool fits_shwi_p (const wide_int_ref &);
-  bool fits_uhwi_p (const wide_int_ref &);
-  bool neg_p (const wide_int_ref &, signop = SIGNED);
-  bool only_sign_bit_p (const wide_int_ref &, unsigned int);
-  bool only_sign_bit_p (const wide_int_ref &);
-  HOST_WIDE_INT sign_mask (const wide_int_ref &);
-
-  template 
-  bool eq_p (const T1 &, const T2 &);
-
-  template 
-  bool ne_p (const T1 &, const T2 &);
-
-  bool lt_p (const wide_int_ref &, const wide_int_ref &, signop);
-  bool lts_p (const wide_int_ref &, const wide_int_ref &);
-  bool ltu_p (const wide_int_ref &, const wide_int_ref &);
-  bool le_p (const wide_int_ref &, const wide_int_ref &, signop);
-  bool les_p (const wide_int_ref &, const wide_int_ref &);
-  bool leu_p (const wide_int_ref &, const wide_int_ref &);
-  bool gt_p (const wide_int_ref &, const wide_int_ref &, signop);
-  bool gts_p (const wide_int_ref &, const wide_int_ref &);
-  bool gtu_p (const wide_int_ref &, const wide_int_ref &);
-  bool ge_p (const wide_int_ref &, const wide_int_ref &, signop);
-  bool ges_p (const wide_int_ref &, const wide_int_ref &);
-  bool geu_p (const wide_int_ref &, const wide_int_ref &);
-  int cmp (const wide_int_ref &, const wide_int_ref &, signop);
-  int cmps (const wide_int_ref &, const wide_int_ref &);
-  int cmpu (const wide_int_ref &, const wide_int_ref &);
-
+#define UNARY_PREDICATE \
+  template  bool
 #define UNARY_FUNCTION \
   template  WI_UNARY_RESULT (T)
+#define BINARY_PREDICATE \
+  template  bool
 #define BINARY_FUNCTION \
   template  WI_BINARY_RESULT (T1, T2)
 #define SHIFT_FUNCTION \
   template  WI_UNARY_RESULT (T)
 
+  UNARY_PREDICATE fits_shwi_p (const T &);
+  UNARY_PREDICATE fits_uhwi_p (const T &);
+  UNARY_PREDICATE neg_p (const T &, signop = SIGNED);
+
+  template 
+  HOST_WIDE_INT sign_mask (const T &);
+
+  BINARY_PREDICATE eq_p (const T1 &, const T2 &);
+  BINARY_PREDICATE ne_p (const T1 &, const T2 &);
+  BINARY_PREDICATE lt_p (const T1 &, const T2 &, signop);
+  BINARY_PREDICATE lts_p (const T1 &, const T2 &);
+  BINARY_PREDICATE ltu_p (const T1 &, const T2 &);
+  BINARY_PREDICATE le_p (const T1 &, const T2 &, signop);
+  BINARY_PREDICATE les_p (const T1 &, const T2 &);
+  BINARY_PREDICATE leu_p (const T1 &, const T2 &);
+  BINARY_PREDICATE gt_p (const T1 &, const T2 &, signop);
+  BINARY_PREDICATE gts_p (const T1 &, const T2 &);
+  BINARY_PREDICATE gtu_p (const T1 &, const T2 &);
+  BINARY_PREDICATE ge_p (const T1 &, const T2 &, signop);
+  BINARY_PREDICATE ges_p (const T1 &, const T2 &);
+  BINARY_PREDICATE geu_p (const T1 &, const T2 &);
+
+  template 
+  int cmp (const T1 &, const T2 &, signop);
+
+  template 
+  int cmps (const T1 &, const T2 &);
+
+  template 
+  int cmpu (const T1 &, const T2 &);
+
   UNARY_FUNCTION bit_not (const T &);
   UNARY_FUNCTION neg (const T &);
   UNARY_FUNCTION neg (const T &, bool *);
@@ -446,9 +452,13 @@ namespace wi
   SHIFT_FUNCTION rrotate (const T &, const wide_int_ref &, unsigned int = 0);
 
 #undef SHIFT_FUNCTION
+#undef BINARY_PREDICATE
 #undef BINARY_FUNCTION
+#undef UNARY_PREDICATE
 #undef UNARY_FUNCTION
 
+  bool only_sign_bit_p (const wide_int_ref &, unsigned int);
+  bool only_sign_bit_p (const wide_int_ref &);
   int clz (const wide_int_ref &);
   int clrsb (const wide_int_ref &);
   int ctz (const wide_int_ref &);
@@ -1401,40 +1411,48 @@ wi::get_binary_precision (const T1 &x, c
 
 /* Return true if X fits in a HOST_WIDE_INT with no loss of
precision.  */
+template 
 inline bool
-wi::fits_shwi_p (const wide_int_ref &x)
+wi::fits_shwi_p (const T &x)
 {
-  return x.len == 1;
+  wide_int_ref xi (x);
+  return xi.len == 1;
 }
 
 /* Return true if X fits in an unsigned HOST_WIDE_INT with no loss of
precision.  */
+template 
 inline bool
-wi::fits_uhwi_p (const wide_int_ref &x)
+wi::fits_uhwi_p (const T &x)
 {
-  if (x.precision <= HOST_BITS_PER_WIDE_INT)
+  wide_int_ref xi (x);
+  if (xi.precision <= HOST_BITS_PER_WIDE_INT)
 return true;
-  if (x.len == 1)
-return x.slow () >= 0;
-  return x.len == 2 && x.uhigh () == 0;
+  if (xi.len == 1)
+return xi.slow () >= 0;
+  return xi.len == 2 && xi.uhigh () == 0;
 }
 
 /* Return true if X is negative based on the interpretation of SGN.
For UNSIGNED, this is always false.  */
+template 
 inline bool
-wi::neg_p (const wide_int_ref &x, signop sgn)
+wi::neg_p (const T &x, signop sgn)
 {
+  wide_int_ref xi (x);
   if (sgn == UNSIGNED)
 return false;
-  return x.sign_mask () < 0;
+  return xi.sign_mask () < 0;
 }
 
 /* Return -1 if the

Always treat sext_hwi and zext_hwi as inline

2013-10-20 Thread Richard Sandiford
At the moment there are two copies of sext_hwi and zext_hwi, one inline
for !ENABLE_CHECKING and one out-of-line for ENABLE_CHECKING.  However,
there are several wide-int callers where it's obvious from context that
the precision is <= HOST_BITS_PER_WIDE_INT, so if the functions are
inline for ENABLE_CHECKING, the assert can often be compiled away.

This improves the ENABLE_CHECKING code for some query functions on
wide-int branch.  Tested on x86_64-linux-gnu.  OK for mainline?

Thanks,
Richard


gcc/
* system.h: Move hwint.h include further down.
* hwint.h (sext_hwi, zext_hwi): Define unconditionally.  Add
gcc_checking_asserts.
* hwint.c (sext_hwi, zext_hwi): Delete ENABLE_CHECKING versions.

Index: gcc/hwint.c
===
--- gcc/hwint.c 2013-08-21 19:28:49.560621645 +0100
+++ gcc/hwint.c 2013-10-19 20:05:43.399978400 +0100
@@ -204,35 +204,3 @@ least_common_multiple (HOST_WIDE_INT a,
 {
   return mul_hwi (abs_hwi (a) / gcd (a, b), abs_hwi (b));
 }
-
-#ifdef ENABLE_CHECKING
-/* Sign extend SRC starting from PREC.  */
-
-HOST_WIDE_INT
-sext_hwi (HOST_WIDE_INT src, unsigned int prec)
-{
-  gcc_checking_assert (prec <= HOST_BITS_PER_WIDE_INT);
-
-  if (prec == HOST_BITS_PER_WIDE_INT)
-return src;
-  else
-{
-  int shift = HOST_BITS_PER_WIDE_INT - prec;
-  return (src << shift) >> shift;
-}
-}
-
-/* Zero extend SRC starting from PREC.  */
-
-unsigned HOST_WIDE_INT
-zext_hwi (unsigned HOST_WIDE_INT src, unsigned int prec)
-{
-  gcc_checking_assert (prec <= HOST_BITS_PER_WIDE_INT);
-
-  if (prec == HOST_BITS_PER_WIDE_INT)
-return src;
-  else
-return src & (((HOST_WIDE_INT)1 << prec) - 1);
-}
-
-#endif
Index: gcc/hwint.h
===
--- gcc/hwint.h 2013-09-05 20:55:31.192518091 +0100
+++ gcc/hwint.h 2013-10-19 20:05:23.469855942 +0100
@@ -322,9 +322,6 @@ extern HOST_WIDE_INT least_common_multip
 
 /* Sign extend SRC starting from PREC.  */
 
-#ifdef ENABLE_CHECKING
-extern HOST_WIDE_INT sext_hwi (HOST_WIDE_INT, unsigned int);
-#else
 static inline HOST_WIDE_INT
 sext_hwi (HOST_WIDE_INT src, unsigned int prec)
 {
@@ -332,24 +329,23 @@ sext_hwi (HOST_WIDE_INT src, unsigned in
 return src;
   else
 {
+  gcc_checking_assert (prec < HOST_BITS_PER_WIDE_INT);
   int shift = HOST_BITS_PER_WIDE_INT - prec;
   return (src << shift) >> shift;
 }
 }
-#endif
 
 /* Zero extend SRC starting from PREC.  */
-#ifdef ENABLE_CHECKING
-extern unsigned HOST_WIDE_INT zext_hwi (unsigned HOST_WIDE_INT, unsigned int);
-#else
 static inline unsigned HOST_WIDE_INT
 zext_hwi (unsigned HOST_WIDE_INT src, unsigned int prec)
 {
   if (prec == HOST_BITS_PER_WIDE_INT)
 return src;
   else
-return src & (((HOST_WIDE_INT)1 << prec) - 1);
+{
+  gcc_checking_assert (prec < HOST_BITS_PER_WIDE_INT);
+  return src & (((HOST_WIDE_INT) 1 << prec) - 1);
+}
 }
-#endif
 
 #endif /* ! GCC_HWINT_H */
Index: gcc/system.h
===
--- gcc/system.h2013-09-07 10:19:22.797432289 +0100
+++ gcc/system.h2013-10-19 20:06:14.417170260 +0100
@@ -272,9 +272,6 @@ #define ICE_EXIT_CODE 4
 # include 
 #endif
 
-/* Get definitions of HOST_WIDE_INT and HOST_WIDEST_INT.  */
-#include "hwint.h"
-
 /* A macro to determine whether a VALUE lies inclusively within a
certain range without evaluating the VALUE more than once.  This
macro won't warn if the VALUE is unsigned and the LOWER bound is
@@ -1066,4 +1063,7 @@ #define DEBUG_FUNCTION
 #define DEBUG_VARIABLE
 #endif
 
+/* Get definitions of HOST_WIDE_INT and HOST_WIDEST_INT.  */
+#include "hwint.h"
+
 #endif /* ! GCC_SYSTEM_H */


Re: [Patch] Change default executor to DFS in regex

2013-10-20 Thread Tim Shen
On Sun, Oct 20, 2013 at 4:42 AM, Paolo Carlini  wrote:
> Ok with those changes.

Committed.

Thanks!


-- 
Tim Shen


a.patch
Description: Binary data


Re: [PATCH i386 3/8] [AVX512] [19/n] Add AVX-512 patterns: Extracts and converts.

2013-10-20 Thread Uros Bizjak
On Thu, Oct 17, 2013 at 4:16 PM, Kirill Yukhin  wrote:

>> >> > I suspect gen_lowpart is bad turn when reload is completed, as
>> >> > far as it can create new pseudo. gen_lowpart () may call
>> >> > gen_reg_rtx (), which contain corresponging gcc_assert ().
>> >>
>> >> False.  gen_lowpart is perfectly safe post-reload.
>> >> Indeed, taking the subreg of a hard register should arrive
>> >>
>> >>   x = gen_rtx_REG_offset (op, outermode, final_regno, 
>> >> final_offset);
>> >>
>> >> in simplify_subreg.
>> >>
>> >> Have you encountered some specific problem with gen_lowpart?
>> > Maybe the code in the pattern is buggy? Or is it a gen_lowpart?
>>
>> I think that original approach with gen_rtx_REG is correct and follows
>> established practice in sse.md (please grep for gen_reg_RTX in
>> sse.md). If this approach is necessary due to the deficiency of
>> gen_lowpart, then the fix to gen_lowpart should be proposed in a
>> follow-up patch.
>
> So, I've reverted changes in mult_vect patterns and
> added "%" to constraints.

Please also add back expanders with operand fixups and insn
constraints, as is the case with other commutative operators. They are
needed to hoist operand loads out of the loops (reload and later
passes won't hoist memory loads out of the loops when fixing up
operands).

The patch is OK with this change, but please wait for rths final approval.

Thanks,
Uros.


Re: [PATCH][i386] Enable vector_loop in memset expanding and merge expanders for memset and memmov

2013-10-20 Thread Jan Hubicka
> That's a good point.  I added a check for this case - so if CONST_INT is 
> passed
> we assume that mode is QI.  But usually promoted value resides in a register, 
> so
> we don't go over-conservative here.
Hmm, so if we use broadcast constant, then we won't end up having CONST_INT 
here?
It is OK then.
> 
> > > +  if (size_to_move < GET_MODE_SIZE (move_mode))
> > > +{
> > > +  move_mode = mode_for_size (size_to_move * BITS_PER_UNIT, MODE_INT, 
> > > 0);
> > > +  promoted_val = gen_lowpart (move_mode, promoted_val);
> > > +}
> > > +  piece_size = GET_MODE_SIZE (move_mode);
> > > +  code = optab_handler (mov_optab, move_mode);
> > > +  gcc_assert (code != CODE_FOR_nothing && promoted_val != NULL_RTX);
> > > +
> > > +  dst = adjust_automodify_address_nv (dst, move_mode, destptr, 0);
> > > +
> > > +  /* Emit moves.  We'll need SIZE_TO_MOVE/PIECE_SIZES moves.  */
> > > +  gcc_assert (size_to_move % piece_size == 0);
> > > +  adjust = GEN_INT (piece_size);
> > > +  for (i = 0; i < size_to_move; i += piece_size)
> > > +{
> > > +  if (piece_size <= GET_MODE_SIZE (word_mode))
> > > + {
> > > +   emit_insn (gen_strset (destptr, dst, promoted_val));
> > > +   continue;
> > > + }
> > Don't we want to go for strmov here, too?
> I don't quite get it.  How could strmov help us here?

strset/strmov expanders results in stos/movs instructions on some targets.
The code is supposed to use them when profitable - i.e. for constant sized
prologues and known offset, you can set up registers for rep; movsq
and then use use movb/movsw/movsl to get the required alignment (or to copy
the tail).
It seems to me that whenever we can use strset we can also use strmov, so the
code should be symmetric.
It is currently enabled only for P4, but perhaps it makes sense elsewhere, too
(and to expand compact move by pieces/store by pieces).
I do not know.  Those single stringop insns are always fishy.
> > How do we ensure that value is sufficiently promoted? (for example for
> > pentiumpro that requires bigger alignment than promotion)?
> > Also don't we need to handle vec_value for AVX?
> I don't see a problem here actually.  I tried to keep logic for promoted_val
> unchanged, and just add VEC_PROMOTED_VAL, which doesn't use alignment info at
> all.  VEC_PROMOTED_VAL is promoted to move_mode, which is the biggest mode we
> could use for storing, so we always have enough-promoted value.

OK.
> > > @@ -23015,7 +23074,8 @@ ix86_expand_movmem (rtx dst, rtx src, rtx 
> > > count_exp, rtx align_exp,
> > >  align = INTVAL (expected_align_exp);
> > >/* ALIGN is the minimum of destination and source alignment, but we 
> > > care here
> > >   just about destination alignment.  */
> > > -  else if (MEM_ALIGN (dst) > (unsigned HOST_WIDE_INT) align * 
> > > BITS_PER_UNIT)
> > > +  else if (!issetmem
> > > +&& MEM_ALIGN (dst) > (unsigned HOST_WIDE_INT) align * BITS_PER_UNIT)
> > >  align = MEM_ALIGN (dst) / BITS_PER_UNIT;
> > 
> > Hmm, this really should be pushed up into expansion and the alignment 
> > lookup should be improved...
> Well, for now I just to keep some things, like this, unchanged.  We could
> continue cleaning this all up.

Yep, lets handle other things incrementally.
> 
> > > +  /* For now vector-version of memset is generated only for memory 
> > > zeroing, as
> > > + creating of promoted vector value is very cheap in this case.  */
> > > +  if (issetmem && alg == vector_loop && val_exp != const0_rtx)
> > > +alg = unrolled_loop;
> > 
> > I believe we had vector version of promote_reg once. Load of constant is 
> > easy, too.
> > Broadcasting a value is bit different story.
> Most of the cases where we need vector memset is zeroing a memory, that's why 
> I
> focused on this case.  But you are right, promoting constant value would be
> simple as well.
> 
> > Thanks, the patch is OK with those changes.  It really seems like an
> > improvement in maintainability.  Thanks for working on it!
> Could you look at this version of the patch?  Didn't I forget about anything?
> 
> The patch was bootstrapped, make-checked and tested on specs 2k/2k6 (stability
> only).  The same testing was performed with -minline-all-stringops.

The patch is OK.
Honza


Re: [PATCH][i386] Enable vector_loop in memset expanding and merge expanders for memset and memmov

2013-10-20 Thread Michael V. Zolotukhin
Hi Jan,

Thanks for the review, please find my answers below.

> > +/* Output "rep; mov" or "rep; stos" instruction depending on ISSETMEM 
> > argument.
> > +   When ISSETMEM is true, arguments SRCMEM and SRCPTR are ignored.
> > +   When ISSETMEM is false, arguments VALUE and ORIG_VALUE are ignored.
> > +   Other arguments have same meaning as for previous function.  */
> 
> You need to document ORIG_VALUE
Fixed.

> > +expand_movmem_or_setmem_via_rep (rtx destmem, rtx srcmem,
> 
> lets go with the same name as in first function, expand_set_or_movmem_via_rep.
> Either I would add issetmem into expand_set_or_movmem_via_loop
> or I would go for the same scheme - if srcptr/srcmem is NULL, use value.
Done.

> > @@ -22233,82 +22236,65 @@ expand_movmem_via_rep_mov (rtx destmem, rtx 
> > srcmem,
> >HOST_WIDE_INT rounded_count;
> >  
> >/* If the size is known, it is shorter to use rep movs.  */
> > -  if (mode == QImode && CONST_INT_P (count)
> > +  if (!issetmem && mode == QImode && CONST_INT_P (count)
> 
> Instead of !issetmem allow this also for value being 0.  Add comment that for
> memset we would need to promote value and also that the logic should be moved
> to decide_alg (I believe I have it there in my proposed patch).
Done.

> > +/* This function emits moves to fill SIZE_TO_MOVE bytes starting from 
> > DESTMEM
> > +   with value PROMOTED_VAL.
> > +   SRC is passed by pointer to be updated on return.
> > +   Return value is updated DST.  */
> 
> So it is same as store_by_pieces only taking promoted_val instead of doing 
> promotion itself?
Yes, that's very similar to store_by_pieces (as emit_memmov is very similar to
move_by_pieces).  Later we could think of use of this functions in
store/move_by_pieces, or vice versa.  I added it now because at the moment we
are not able to emit vector moves in move/store_by_pieces.  And yes, as you
notices, it also takes promoted_val, as it's intended to be used along with 
other
emit_memmov/memset calls (as opposed to move_by_pieces which is assumed to
perform the entire copying).

> > +static rtx
> > +emit_memset (rtx destmem, rtx destptr, rtx promoted_val,
> > +HOST_WIDE_INT size_to_move)
> > +{
> > +  rtx dst = destmem, adjust;
> > +  enum insn_code code;
> > +  enum machine_mode move_mode;
> > +  int piece_size, i;
> > +
> > +  /* Find the widest mode in which we could perform moves.
> > + Start with the biggest power of 2 less than SIZE_TO_MOVE and half
> > + it until move of such size is supported.  */
> > +  move_mode = GET_MODE (promoted_val);
> 
> Won't this be VOIDmode for CONST_INT?
That's a good point.  I added a check for this case - so if CONST_INT is passed
we assume that mode is QI.  But usually promoted value resides in a register, so
we don't go over-conservative here.

> > +  if (size_to_move < GET_MODE_SIZE (move_mode))
> > +{
> > +  move_mode = mode_for_size (size_to_move * BITS_PER_UNIT, MODE_INT, 
> > 0);
> > +  promoted_val = gen_lowpart (move_mode, promoted_val);
> > +}
> > +  piece_size = GET_MODE_SIZE (move_mode);
> > +  code = optab_handler (mov_optab, move_mode);
> > +  gcc_assert (code != CODE_FOR_nothing && promoted_val != NULL_RTX);
> > +
> > +  dst = adjust_automodify_address_nv (dst, move_mode, destptr, 0);
> > +
> > +  /* Emit moves.  We'll need SIZE_TO_MOVE/PIECE_SIZES moves.  */
> > +  gcc_assert (size_to_move % piece_size == 0);
> > +  adjust = GEN_INT (piece_size);
> > +  for (i = 0; i < size_to_move; i += piece_size)
> > +{
> > +  if (piece_size <= GET_MODE_SIZE (word_mode))
> > +   {
> > + emit_insn (gen_strset (destptr, dst, promoted_val));
> > + continue;
> > +   }
> Don't we want to go for strmov here, too?
I don't quite get it.  How could strmov help us here?

> > +expand_movmem_or_setmem_prologue (rtx destmem, rtx srcmem,
> > + rtx destptr, rtx srcptr, rtx value,
> > + rtx vec_value, rtx count, int align,
> > + int desired_alignment, bool issetmem)
> 
> Again, I would go with consistent names.
Fixed.

> How do we ensure that value is sufficiently promoted? (for example for
> pentiumpro that requires bigger alignment than promotion)?
> Also don't we need to handle vec_value for AVX?
I don't see a problem here actually.  I tried to keep logic for promoted_val
unchanged, and just add VEC_PROMOTED_VAL, which doesn't use alignment info at
all.  VEC_PROMOTED_VAL is promoted to move_mode, which is the biggest mode we
could use for storing, so we always have enough-promoted value.

> > +expand_constant_movmem_or_setmem_prologue (rtx dst, rtx *srcp, rtx destreg,
> 
> Again I would make names consistent
Done.

> > +ix86_expand_movmem_or_setmem (rtx dst, rtx src, rtx count_exp, rtx val_exp,
> 
> Name ;)
Fixed:)

> > @@ -23015,7 +23074,8 @@ ix86_expand_movmem (rtx dst, rtx src, rtx 
> > count_exp, rtx align_exp,
> >  align = INTVAL (expected_align_exp);
> >/* ALIGN is the minimum of destin

Re: libstdc++ Testsuite extension for sort, partial_sort, partial_sort_copy, nth_element

2013-10-20 Thread Paolo Carlini

On 10/19/2013 11:10 PM, Christopher Jefferson wrote:

I plan to write my tests if these are accepted (although writing this
reminded me why gcc is the worst open source project for submitting
code to I have ever worked with).
Why? If you really think something is wrong and have constructive 
proposals, please take the discussion outside this mailing list, to 
g...@gcc.gnu.org, I suppose.


Paolo.


[Patch, fortran] PR58793 - Wrong value for _vtab for intrinsic types with CLASS(*): storage_size of class(*) gives wrong result

2013-10-20 Thread Paul Richard Thomas
Dear All,

This patch is fairly obvious and follows a suggestion from Tobias to
use gfc_element_size.  He even wrote the testcase!

Bootstrapped and regested on FC17/x86_64 - OK for trunk?

Cheers

Paul

PS In writing this, I have just noted that I need to trap ts->type ==
BT_HOLLERITH.  It would be a rather bizarre thing to try to do but I
just know that somebody, somewhere will eventually try it :-) Consider
that it will be done in the committed version.

2013-10-20  Paul Thomas  

PR fortran 57893
* class.c : Include target-memory.h.
(gfc_find_intrinsic_vtab) Build a minimal expression so that
gfc_element_size can be used to obtain the storage size, rather
that the kind value.

2013-10-20  Paul Thomas  

PR fortran 57893
* gfortran.dg/unlimited_polymorphic_12.f90 : New test.


pr58793.diff
Description: Binary data


Re: nth_element fix [58800]

2013-10-20 Thread Paolo Carlini

Hi,

On 10/19/2013 11:02 PM, Christopher Jefferson wrote:

A previous fix to a performance problem in std::sort unfortunately has
lead to a crashing bug in std::nth_element.

The following minimal patch fixes the problem (there are a couple of
different ways of fixing it, this is the shortest and safest).

2013-09-19 Chris Jefferson 

PR libstdc++/58800
* include/bits/stl_algo.h (__unguarded_partition_pivot) : Change
__last - 2 to __last - 1, so we correctly work on arrays of length 4.
* testsuite/25_algorithms/nth_element/58800.cc : New
I have this already applied in my local svn branches and can apply it 
later today (I fixed the Copyright years to only 2013 of course and 
reformatted a bit the rest of the testcase)


Thanks!
Paolo.


Re: [Patch] Change default executor to DFS in regex

2013-10-20 Thread Paolo Carlini

On 10/19/2013 11:28 PM, Tim Shen wrote:
I see. Here's the macro version. Thanks! 
Mostly Ok, but the macro is completely undocumented. Please change the 
comment which says: "... more than certain number of quantifiers..." to 
actually mention the macro, thus, say: "... more than 
_GLIBCXX_REGEX_DFS_QUANTIFIERS_LIMIT quantifiers...". I would also add, 
before the #ifndef, a comment, say "// See below about its meaning. The 
default value of 1 appears to be a good compromise.


Ok with those changes.

Thanks,
Paolo.


Re: [wide-int] int_traits

2013-10-20 Thread Richard Sandiford
Kenneth Zadeck  writes:
> On 10/19/2013 10:18 AM, Richard Sandiford wrote:
>> Kenneth Zadeck  writes:
>>> On 10/19/2013 05:01 AM, Richard Sandiford wrote:
 Mike Stump  writes:
> +  // We optimize x < y, where y is 64 or fewer bits.
> +  // We have to be careful to not allow comparison to a large positive
> +  // unsigned value like 0x8000, those would be encoded
> +  // with a y.len == 2.
> +  if (y.precision <= HOST_BITS_PER_WIDE_INT
> +  && y.len == 1)
 I don't get this.  If y.precision <= HOST_BITS_PER_WIDE_INT then
 y.len must be 1.  I realise that tree constants can be stored with
 TREE_INT_CST_NUNITS > TYPE_PRECISION / HOST_BITS_PER_WIDE_INT
 (so that extensions beyond TYPE_PRECISION are free).  But the
 wide-int code is shielded from that by the ::decompose routine.
 A wide int never has len > precision / HOST_BITS_PER_WIDE_INT.

 Thanks,
 Richard
>>> I think that part of this is that neither mike or i really understand
>>> how this stuff works anymore.
>>>
>>> in the old version where we had precision 0, we would wait to
>>> canonicalize a constant or a simple integer until we saw what the
>>> precision of the other operand was.   That was what precison 0 meant.
>>> it was kind of like what you are now proposing with this new trait, but
>>> for the reason that we actually did not know what to do than some
>>> concern about escapement.
>>>
>>> What i do not understand is what happens what do you get when you bring
>>> in an integer variable that is an unsigned HOST_WIDE_INT with the top
>>> bit set.   In the precision 0 days, if the prec of the other side was 64
>>> or less, the incoming integer took 1 hwi and if the precision was
>>> larger, it took two hwis.  The canonicalization happened late enough so
>>> that there was never a question.
>> Ah, I think I know what you mean.  The original implementation was:
>>
>>template 
>>static inline const HOST_WIDE_INT*
>>to_shwi1 (HOST_WIDE_INT *s, unsigned int *l, unsigned int *p, const T& x)
>>{
>>  s[0] = x;
>>  if (signedp(x)
>>  || sizeof (T) < sizeof (HOST_WIDE_INT)
>>  || ! top_bit_set (x))
>>{
>>  *l = 1;
>>}
>>  else
>>{
>>  s[1] = 0;
>>  *l = 2;
>>}
>>  *p = 0;
>>  return s;
>>}
>>
>>void
>>wide_int_ro::check_precision (unsigned int *p1, unsigned int *p2,
>>  bool check_equal ATTRIBUTE_UNUSED,
>>  bool check_zero ATTRIBUTE_UNUSED)
>>{
>>  gcc_checking_assert ((!check_zero) || *p1 != 0 || *p2 != 0);
>>
>>  if (*p1 == 0)
>>*p1 = *p2;
>>
>>  if (*p2 == 0)
>>*p2 = *p1;
>>
>>  gcc_checking_assert ((!check_equal) || *p1 == *p2);
>>}
>>
>>/* Return true if C1 < C2 using signed comparisons.  */
>>template 
>>static inline bool
>>lts_p (const T1 &c1, const T2 &c2)
>>{
>>  bool result;
>>  HOST_WIDE_INT ws1[WIDE_INT_MAX_ELTS];
>>  HOST_WIDE_INT ws2[WIDE_INT_MAX_ELTS];
>>  const HOST_WIDE_INT *s1, *s2;  /* Returned data */
>>  unsigned int cl1, cl2; /* array lengths  */
>>  unsigned int p1, p2;   /* precisions */
>>  
>>  s1 = to_shwi1 (ws1, &cl1, &p1, c1);
>>  s2 = to_shwi1 (ws2, &cl2, &p2, c2);
>>  check_precision (&p1, &p2, false, true);
>>  
>>  if (p1 <= HOST_BITS_PER_WIDE_INT
>>  && p2 <= HOST_BITS_PER_WIDE_INT)
>>{
>>  HOST_WIDE_INT x0 = sext_hwi (s1[0], p1);
>>  HOST_WIDE_INT x1 = sext_hwi (s2[0], p2);
>>  result = x0 < x1;
>>}
>>  else
>>result = lts_p_large (s1, cl1, p1, s2, cl2, p2);
>>  
>> #ifdef DEBUG_WIDE_INT
>>  debug_vaa ("wide_int_ro:: %d = (%s lts_p %s\n", result, s1, cl1, p1, 
>> s2, cl2, p2);
>> #endif
>>  return result;
>>}
> you need to be careful about asserting too much from the old code. the 
> time line was:
>
> 1) we developed the stuff on x86-64
> 2) you did your patch
> 3) we ported everything to ppc and our private port.
>
> i really only became very sensitive to this issue during step 3 because 
> the x86 does not exhibit these bugs.
>
>
>> So if you had a 128-bit wide_int and T == unsigned HOST_WIDE_INT,
>> this lts_p would zero-extend the unsigned HOST_WIDE_INT to 128 bits and
>> then do a signed comparison.
>>
>> The thing here is that the "check_equal" argument is false.
>> So if instead you were comparing a 128-bit wide_int with a 64-bit tree
>> constant, lts_p would treat that tree constant as a signed 64-bit number,
>> even if it was TYPE_UNSIGNED.  Similarly if you were comparing a 128-bit
>> tree constant and a 64-bit tree constant.  You also allowed a comparison
>> of a 128-bit wide_int with a 64-bit rtx, again treating the 64-bit rtx
>> as signed.
> I do not think that this is what check_equal meant because the 0 
> precision was a wild card.  The 0 precision 

[MIPS] Fix mips-ps-[57].c tests after cost model changes

2013-10-20 Thread Richard Sandiford
I could just have added an extra command-line option, but I went for
this instead.

Tested on mips64-linux-gnu and applied.

Richard


gcc/testsuite/
* gcc.target/mips/mips-ps-5.c: Add alignment attributes.
* gcc.target/mips/mips-ps-7.c: Likewise.

Index: gcc/testsuite/gcc.target/mips/mips-ps-5.c
===
--- gcc/testsuite/gcc.target/mips/mips-ps-5.c   2012-08-27 17:31:22.0 
+0100
+++ gcc/testsuite/gcc.target/mips/mips-ps-5.c   2013-10-19 18:12:17.462075229 
+0100
@@ -2,7 +2,9 @@
 /* { dg-options "-mpaired-single -mgp64 -ftree-vectorize" } */
 /* { dg-skip-if "requires vectorization" { *-*-* } { "-O0" "-Os" } { "" } } */
 
-extern float a[], b[], c[];
+extern float a[] __attribute__ ((aligned (8)));
+extern float b[] __attribute__ ((aligned (8)));
+extern float c[] __attribute__ ((aligned (8)));
 
 NOMIPS16 void
 foo (void)
Index: gcc/testsuite/gcc.target/mips/mips-ps-7.c
===
--- gcc/testsuite/gcc.target/mips/mips-ps-7.c   2012-08-27 17:31:22.0 
+0100
+++ gcc/testsuite/gcc.target/mips/mips-ps-7.c   2013-10-19 18:12:31.762186608 
+0100
@@ -3,7 +3,9 @@
 /* { dg-options "-mgp32 -mpaired-single -ftree-vectorize" } */
 /* { dg-skip-if "requires vectorization" { *-*-* } { "-O0" "-Os" } { "" } } */
 
-extern float a[], b[], c[];
+extern float a[] __attribute__ ((aligned (8)));
+extern float b[] __attribute__ ((aligned (8)));
+extern float c[] __attribute__ ((aligned (8)));
 
 NOMIPS16 void
 foo (void)


[MIPS] Add bswap patterns

2013-10-20 Thread Richard Sandiford
Noticed the other day that we'd never added bswap patterns for WSBH & co.

Tested on mips64-linux-gnu with --with-arch=mips64r2 and applied.

Richard


gcc/
* config/mips/mips.h (ISA_HAS_WSBH): Define.
* config/mips/mips.md (UNSPEC_WSBH, UNSPEC_DSBH, UNSPEC_DSHD): New
constants.
(bswaphi2, bswapsi2, bswapdi2, wsbh, dsbh, dshd): New patterns.

gcc/testsuite/
* gcc.target/mips/bswap-1.c, gcc.target/mips/bswap-2.c,
gcc.target/mips/bswap-3.c, gcc.target/mips/bswap-4.c,
gcc.target/mips/bswap-5.c, gcc.target/mips/bswap-6.c: New tests.

Index: gcc/config/mips/mips.h
===
--- gcc/config/mips/mips.h  2013-10-16 20:38:57.977088830 +0100
+++ gcc/config/mips/mips.h  2013-10-19 11:32:47.636342587 +0100
@@ -972,6 +972,11 @@ #define ISA_HAS_ROR((ISA_MIPS32R2  
\
  || TARGET_SMARTMIPS)  \
 && !TARGET_MIPS16)
 
+/* ISA has the WSBH (word swap bytes within halfwords) instruction.
+   64-bit targets also provide DSBH and DSHD.  */
+#define ISA_HAS_WSBH   ((ISA_MIPS32R2 || ISA_MIPS64R2) \
+&& !TARGET_MIPS16)
+
 /* ISA has data prefetch instructions.  This controls use of 'pref'.  */
 #define ISA_HAS_PREFETCH   ((ISA_MIPS4 \
  || TARGET_LOONGSON_2EF\
Index: gcc/config/mips/mips.md
===
--- gcc/config/mips/mips.md 2013-10-16 20:38:57.977088830 +0100
+++ gcc/config/mips/mips.md 2013-10-19 11:36:21.984285902 +0100
@@ -74,6 +74,11 @@ (define_c_enum "unspec" [
   UNSPEC_STORE_LEFT
   UNSPEC_STORE_RIGHT
 
+  ;; Integer operations that are too cumbersome to describe directly.
+  UNSPEC_WSBH
+  UNSPEC_DSBH
+  UNSPEC_DSHD
+
   ;; Floating-point moves.
   UNSPEC_LOAD_LOW
   UNSPEC_LOAD_HIGH
@@ -5358,6 +5363,56 @@ (define_insn "rotr3"
 }
   [(set_attr "type" "shift")
(set_attr "mode" "")])
+
+(define_insn "bswaphi2"
+  [(set (match_operand:HI 0 "register_operand" "=d")
+   (bswap:HI (match_operand:HI 1 "register_operand" "d")))]
+  "ISA_HAS_WSBH"
+  "wsbh\t%0,%1"
+  [(set_attr "type" "shift")])
+
+(define_insn_and_split "bswapsi2"
+  [(set (match_operand:SI 0 "register_operand" "=d")
+   (bswap:SI (match_operand:SI 1 "register_operand" "d")))]
+  "ISA_HAS_WSBH && ISA_HAS_ROR"
+  "#"
+  ""
+  [(set (match_dup 0) (unspec:SI [(match_dup 1)] UNSPEC_WSBH))
+   (set (match_dup 0) (rotatert:SI (match_dup 0) (const_int 16)))]
+  ""
+  [(set_attr "insn_count" "2")])
+
+(define_insn_and_split "bswapdi2"
+  [(set (match_operand:DI 0 "register_operand" "=d")
+   (bswap:DI (match_operand:DI 1 "register_operand" "d")))]
+  "TARGET_64BIT && ISA_HAS_WSBH"
+  "#"
+  ""
+  [(set (match_dup 0) (unspec:DI [(match_dup 1)] UNSPEC_DSBH))
+   (set (match_dup 0) (unspec:DI [(match_dup 0)] UNSPEC_DSHD))]
+  ""
+  [(set_attr "insn_count" "2")])
+
+(define_insn "wsbh"
+  [(set (match_operand:SI 0 "register_operand" "=d")
+   (unspec:SI [(match_operand:SI 1 "register_operand" "d")] UNSPEC_WSBH))]
+  "ISA_HAS_WSBH"
+  "wsbh\t%0,%1"
+  [(set_attr "type" "shift")])
+
+(define_insn "dsbh"
+  [(set (match_operand:DI 0 "register_operand" "=d")
+   (unspec:DI [(match_operand:DI 1 "register_operand" "d")] UNSPEC_DSBH))]
+  "TARGET_64BIT && ISA_HAS_WSBH"
+  "dsbh\t%0,%1"
+  [(set_attr "type" "shift")])
+
+(define_insn "dshd"
+  [(set (match_operand:DI 0 "register_operand" "=d")
+   (unspec:DI [(match_operand:DI 1 "register_operand" "d")] UNSPEC_DSHD))]
+  "TARGET_64BIT && ISA_HAS_WSBH"
+  "dshd\t%0,%1"
+  [(set_attr "type" "shift")])
 
 ;;
 ;;  
Index: gcc/testsuite/gcc.target/mips/bswap-1.c
===
--- /dev/null   2013-10-13 08:29:45.608935301 +0100
+++ gcc/testsuite/gcc.target/mips/bswap-1.c 2013-10-19 11:30:01.551867196 
+0100
@@ -0,0 +1,10 @@
+/* { dg-options "isa_rev>=2" } */
+/* { dg-skip-if "bswap recognition needs expensive optimizations" { *-*-* } { 
"-O0" "-O1" } { "" } } */
+
+NOMIPS16 unsigned short
+foo (unsigned short x)
+{
+  return ((x << 8) & 0xff00) | ((x >> 8) & 0xff);
+}
+
+/* { dg-final { scan-assembler "\twsbh\t" } } */
Index: gcc/testsuite/gcc.target/mips/bswap-2.c
===
--- /dev/null   2013-10-13 08:29:45.608935301 +0100
+++ gcc/testsuite/gcc.target/mips/bswap-2.c 2013-10-19 11:30:01.551867196 
+0100
@@ -0,0 +1,9 @@
+/* { dg-options "isa_rev>=2" } */
+
+NOMIPS16 unsigned short
+foo (unsigned short x)
+{
+  return __builtin_bswap16 (x);
+}
+
+/* { dg-final { scan-assembler "\twsbh\t" } } */
Index: gcc/testsuite/gcc.target/mips/bswap-3.c
===
--- /dev/null   2013-10-13 08:29:45.