fixed misplaced testcase

2015-09-01 Thread Kenneth Zadeck

2015-09-01  Kenneth Zadeck 
* gcc.c-torture/execute/ieee/2320-1.c Fixed misplaced test 
case.


This was approved offline by Mike Stump.

committed as revision 227389.

Kenny
--- gcc/testsuite/gcc.c-torture/execute/ieee/2320-1.c (revision 227385)
+++ gcc/testsuite/gcc.c-torture/execute/ieee/2320-1.c (working copy)
@@ -35,10 +35,10 @@ int main()
   || sizeof (double) != sizeof (ull))
 exit (0);

-  c(0x3690ULL, 0xU);
 #if (defined __arm__ || defined __thumb__) && ! (defined __ARMEB__ || 
defined __VFP_FP__)

   /* The ARM always stores FP numbers in big-wordian format,
  even when running in little-byteian mode.  */
+  c(0x3690ULL, 0xU);
   c(0x00013690ULL, 0x0001U);
   c(0x369fULL, 0x0001U);
   c(0x36A0ULL, 0x0001U);
@@ -61,6 +61,7 @@ int main()
   c(0x50003810ULL, 0x0082U);
   c(0x50013810ULL, 0x0083U);
 #else
+  c(0x3690ULL, 0xU);
   c(0x3691ULL, 0x0001U);
   c(0x369fULL, 0x0001U);
   c(0x36A0ULL, 0x0001U);



https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01604.html

2015-07-29 Thread Kenneth Zadeck

I had the following conversation with richi about this patch.

Sorry to reply off thread, but i do net read this group in my mailer.


[09:00]zadeckrichi: i am reviewing a patch and i have a couple 
of questions, do you have a second to look at something?

[09:00]richizadeck: sure
[09:01]zadeckthe patch is 
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01604.html
[09:01]zadeckhe has set up the data flow problem correctly, but 
i worry that this really is not the right way to solve this problem.

[09:02]richilet me look at the problem
[09:03]zadeckin particular, the only way he can demonstrate this 
problem in c is with an uninitialized variable.It seems that a 
normal correct program would not normally have this kind of issue unless 
there was some upstream bug coming out of the middle end.

[09:04]richieven for his Ada case it's not initialized it seems
[09:06]richizadeck: I've added a comment to the PR and requested 
info

[09:06]zadeckmy ada is as good as my German.
[09:07]zadeckthe thing is that if you turn this into a truly 
must problem, it will disable a lot of legit transformations.

[09:08]richiyep
[09:09]zadeckthanks
[09:09]richilets see if he can point to a different issue
[09:09]richior produce a C testcase for us
[09:13]jakubrichi: yeah; if it is Ada only thing and Ada 
uninitialized variable must have some special properties, then they'd 
better use some on the side flag for whether it is initialized, or zero 
initialize or whatever Ada expects
[09:15]richijakub: from what I understand of Ada the testcase 
doesn't look like such a case

[09:16]richijakub: so the actual bug is likely somewhere else




Kenny


Re: [PATCH, ping1] Fix removing of df problem in df_finish_pass

2015-04-20 Thread Kenneth Zadeck
As a dataflow maintainer, I approve this patch for the next release.
However, you will have to get approval of a release manager to get it 
into 5.0.




On 04/20/2015 04:22 AM, Thomas Preud'homme wrote:

Ping?


-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
Sent: Tuesday, March 03, 2015 12:02 PM
To: 'Bernhard Reutner-Fischer'; gcc-patches@gcc.gnu.org; 'Paolo Bonzini';
'Seongbae Park'; 'Kenneth Zadeck'
Subject: RE: [PATCH] Fix removing of df problem in df_finish_pass


From: Bernhard Reutner-Fischer [mailto:rep.dot@gmail.com]
Sent: Saturday, February 28, 2015 4:00 AM

   use df_remove_problem rather than manually removing problems,

living

leaving

Indeed. Please find updated changelog below:

2015-03-03  Thomas Preud'homme  

* df-core.c (df_finish_pass): Iterate over df-

problems_by_index[] and

use df_remove_problem rather than manually removing
problems, leaving
holes in df->problems_in_order[].

Best regards,

Thomas











Re: patch to fix rtl documentation for new floating point comparisons

2015-02-18 Thread Kenneth Zadeck




> On Feb 18, 2015, at 3:23 AM, Joseph Myers  wrote:
> 
>> On Tue, 17 Feb 2015, Kenneth Zadeck wrote:
>> 
>> The fp exceptions raise some very tricky issues with respect to gcc and 
>> optimization.  On many machines, noisy does not mean to throw an 
>> exception, it means that you set a bit and then check later.  If you try 
>> to model this kind of behavior in gcc, you end up pinning the code so 
>> that nothing can be moved or reordered.
> 
> When I say exception here, I'm always referring to that flag bit setting, 
> not to processor-level exceptions.  In IEEE 754 terms, an exception is 
> *signaled*, and the default exception handling is to *raise* a flag and 
> deliver a default result (except for exact underflow which doesn't raise 
> the flag).
> 
> To quote Annex F, "This specification does not require support for trap 
> handlers that maintain information about the order or count of 
> floating-point exceptions. Therefore, between function calls, 
> floating-point exceptions need not be precise: the actual order and number 
> of occurrences of floating-point exceptions (> 1) may vary from what the 
> source code expresses.".  So it is not necessary to be concerned about 
> configurations where trap handlers may be called.
> 
> There is as yet no public draft of TS 18661-5 (Supplementary attributes).  
> That will provide C bindings for alternate exception handling as described 
> in IEEE 754-2008 clause 8.  I suspect such bindings will not readily be 
> efficiently implementable using processor-level exception handlers; SIGFPE 
> is an awkward interface for implementing such things at the C language 
> level, some processors do not support such trap handlers at all (e.g. many 
> ARM processors), and where traps are supported they may be asynchronous 
> rather than occurring immediately on execution of the relevant 
> instruction.  In addition, at least x86 does not support raising exception 
> flags without running trap handlers on the next floating-point instruction 
> (raiseFlags operation, fesetexcept in TS 18661-1); that is, if trap 
> handlers were used to implement standard functionality, it would need to 
> be in a way such that this x86 peculiarity is not visible.
my point here is that what you want to be able to do is freely reorder the fp 
operations ( within the rules of reordering fp operations) between places were 
those bits are explicitly read or cleared.   were have no way to model that 
chain of modify operations in gcc.
> 
>> to get this right gcc needs something like a monotonic dependency which 
>> would allow reordering and gcc has nothing like this.  essentially, you 
>> need way to say that all of these insns modify the same variable, but 
>> they all just move the value in the same direction so you do not care 
>> what order the operations are performed in.  that does not mean that 
>> this could not be added but gcc has nothing like this.
> 
> Indeed, this is one of the things about defining the default mode that I 
> referred to; the present default is -ftrapping-math, but we may wish to 
> distinguish between strict trapping-math (whenever exception flags might 
> be tested / raised / lowered, exactly the computations specified by the 
> abstract machine have occurred, which might mean rather more limits on 
> code movement in the absence of monotonic dependencies) and loose trapping 
> math (like the present default; maybe don't transform expressions locally 
> in ways that add or remove exceptions, but don't treat an expression as 
> having side effects or reading global state purely because of possible 
> raising of floating-point exceptions).
> 
>> going back to the rounding modes issue, there is a huge range in the 
>> architectural implementation space.  you have a few that are pure 
>> dynamic, a few that are pure static and some in the middle that are just 
>> a mess.  a lot of machines would have liked to support fully static, but 
>> could not fit the bits to specify the rounding modes into the 
>> instruction.  my point here is you do need to at least have a plan that 
>> will support the full space even if you do this with a 1000 small 
>> patches.
> 
> I think the norm is dynamic, because that's what was in IEEE 754-1985, 
> with static rounding added more recently on some processors, because of 
> IEEE 754-2008.  (There are other variants - IA64 having multiple dynamic 
> rounding mode registers and allowing instructions to specify which one the 
> rounding mode is taken from.)
the first ieee standard only allowed the dynamic model.   the second allows the 
static model.   while dynamic is more common, there are/were architectures that 
ar

Re: patch to fix rtl documentation for new floating point comparisons

2015-02-17 Thread Kenneth Zadeck




> On Feb 17, 2015, at 6:01 PM, Joseph Myers  wrote:
> 
>> On Tue, 17 Feb 2015, Kenneth Zadeck wrote:
>> 
>>> On 02/17/2015 07:05 AM, Joseph Myers wrote:
>>>> On Tue, 17 Feb 2015, Richard Earnshaw wrote:
>>>> 
>>>> So the problem we have today is the compiler has no way to distinguish
>>>> between, say, < and __builtin_isless.  According to Annex F (c99) the
>>>> former should be signalling while the latter quiet.
>>> We do have a way: < is LT and __builtin_isless is !UNGE.
>>> 
>>> __builtin_islessgreater is !UNEQ.  The question is whether it's also LTGT
>>> or whether LTGT means LT || GT.  And the existing documentation of
>>> LTGT_EXPR leaves this unspecified, which seems clearly unhelpful.  Either
>>> way, you have existing code in GCC that's incorrect (i.e. that does not
>>> correspond to the set of transformations that are actually valid for the
>>> chosen semantics).
>> Having spent the majority of my gcc life at the rtl level, i find it
>> astonishing that the designers of the tree level allowed this machine
>> dependency to get into what i was always told was a pristine machine
>> independent pass.i would have thought that the at the tree level this
>> could/should be nailed down to being quiet and if someone wanted to support a
>> noisy version in their port, then they would look for lt || gt when 
>> converting
>> to the rtl level which has never been so pristine.
> 
> As I said in <https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00583.html>, I 
> can't find any posting to gcc-patches of the r82467 commit that introduced 
> the "possible exception" wording.
> 
>> it would be nice to know exactly how many ports (if any) actually have a 
>> noisy
>> version of this in hardware. ltgt is not a common primitive (it is not even
>> mentioned in the ieee fp standards).
> 
> For example, on MIPS the C.cond.fmt instruction has a four-bit condition 
> field: "In the cond field of the instruction: cond 2..1 specify the nature 
> of the comparison (equals, less than, and so on); cond 0 specifies whether 
> the comparison is ordered or unordered, i.e. false or true if any operand 
> is a NaN; cond 3 indicates whether the instruction should signal an 
> exception on QNaN inputs, or not".  Together with possibly negating the 
> result you get all 32 possible comparisons (choice of whether the 
> comparison is true or false for each of = < > unordered, choice of whether 
> to raise invalid for quiet NaNs).
> 
This is a pretty weak motivation.Computer architects love this kind of 
thing, assuming they have the opcode space.   Just give them every possible 
combination and let the programmers decide what is useful -  and by doing that, 
the architect saves a couple of muxes and gate delays.But that doesn't mean 
that the layers of software need to support all of this,   Especially  , in 
this case when there is not motivation from either the fp standards or the 
language standards.
>>> I think the main difficulty in proper Annex F support would be making
>>> optimizers (on each IR) understand the side-effects operations have in
>>> terms of raising exceptions, and how operations may take the rounding mode
>>> or existing exceptions raised as inputs - with an associated issue of
>>> defining the existing default floating-point rules well enough to keep a
>>> default mode that doesn't unduly inhibit optimization.
>> i have been thinking a lot about the rounding modes.the only way that
>> could think about properly supporting them was to actually add new rtl and
>> tree operations that take the rounding mode as an extra parameter.i think
>> that this is going to be the only way to support both the static and dynamic
>> models.   This will be ugly, but having just finished the wide int patch it 
>> is
>> natural to say "how bad could it be?"   some ports will want to support this
>> and some will not.
> 
> My starting point is to presume that any port with hardware floating-point 
> exceptions and rounding modes should support this.  But since ports would 
> probably need to change anyway to ensure operations do raise the right 
> exceptions, and to ensure that the machine-independent compiler can tell 
> which registers referred to in inline asm are part of the floating-point 
> exceptions / rounding modes state, maybe it wouldn't be so bad if they 
> also need to change their instruction descriptions to describe the 
> involvement of exceptions and rounding modes explicitly.  (You do need to 
> handle the cas

Re: patch to fix rtl documentation for new floating point comparisons

2015-02-17 Thread Kenneth Zadeck


On 02/17/2015 07:05 AM, Joseph Myers wrote:

On Tue, 17 Feb 2015, Richard Earnshaw wrote:


So the problem we have today is the compiler has no way to distinguish
between, say, < and __builtin_isless.  According to Annex F (c99) the
former should be signalling while the latter quiet.

We do have a way: < is LT and __builtin_isless is !UNGE.

__builtin_islessgreater is !UNEQ.  The question is whether it's also LTGT
or whether LTGT means LT || GT.  And the existing documentation of
LTGT_EXPR leaves this unspecified, which seems clearly unhelpful.  Either
way, you have existing code in GCC that's incorrect (i.e. that does not
correspond to the set of transformations that are actually valid for the
chosen semantics).

Having spent the majority of my gcc life at the rtl level, i find it 
astonishing that the designers of the tree level allowed this machine 
dependency to get into what i was always told was a pristine machine 
independent pass.i would have thought that the at the tree level 
this could/should be nailed down to being quiet and if someone wanted to 
support a noisy version in their port, then they would look for lt || gt 
when converting to the rtl level which has never been so pristine.


it would be nice to know exactly how many ports (if any) actually have a 
noisy version of this in hardware. ltgt is not a common primitive (it is 
not even mentioned in the ieee fp standards).




I suspect there are two ways we could deal with that: add new comparison
RTL codes to distinguish the cases; or use something like the RTL /v bit
on a comparison to indicate that it should be signalling.

Of the two, the latter would probably be easiest to implement in a
backwards compatible manner (backends not understanding /v would
continue to use their existing code paths), but it would still take a
fair amount of rejigging in the mid end to fully preserve the signalling
nature of comparisons: there are many places where just RTX_CODE is
available and a new pattern is generated from that.  The first method
would require all back-ends to be updated pretty much simultaneously to
handle the new RTL codes.

I don't know the optimal way of representing these variants in GENERIC,
GIMPLE and RTL (the existing representation can cover everything, and is
unambiguous apart from LTGT, but may not be optimal).

I think the main difficulty in proper Annex F support would be making
optimizers (on each IR) understand the side-effects operations have in
terms of raising exceptions, and how operations may take the rounding mode
or existing exceptions raised as inputs - with an associated issue of
defining the existing default floating-point rules well enough to keep a
default mode that doesn't unduly inhibit optimization.
i have been thinking a lot about the rounding modes.the only way 
that could think about properly supporting them was to actually add new 
rtl and tree operations that take the rounding mode as an extra 
parameter.i think that this is going to be the only way to support 
both the static and dynamic models.   This will be ugly, but having just 
finished the wide int patch it is natural to say "how bad could it 
be?"   some ports will want to support this and some will not.


kenny






Re: patch to fix rtl documentation for new floating point comparisons

2015-02-15 Thread Kenneth Zadeck


On 02/14/2015 03:26 PM, Paolo Bonzini wrote:


On 10/02/2015 22:46, Joseph Myers wrote:

It may make sense to define LTGT as exactly !UNEQ, and so quiet, but the
choice of definition is a matter of what's convenient for the
implementation (and which choice you make determines which existing code
in GCC should be considered incorrect).

It would be different from e.g. !UNLT and GE differing only in that UNLT
is quiet and GE is signaling.  So it makes sense to me to keep LTGT as
signaling.
while in theory, your argument is correct, in practice, this is not how 
people use this stuff and so i disagree.


The interrupts are there to allow legacy code that does not know about 
nans to be "run"
in a mode where the nans can be used to signal that things did not run 
well.   Properly written floating point aware code never uses the 
interrupts.In that code, you want a rich set of comparisons which 
allow the programmer to efficiently deal with the fact that any 
comparison can go one of 4 possible ways.


to support legacy code we need ne and eq to be quiet and lt gt le ge to 
be noisy.   This is what the standards call for and this is what gcc 
delivers.Going beyond that for legacy code is a waste of time.


kenny

Paolo




Re: patch to fix rtl documentation for new floating point comparisons

2015-02-11 Thread Kenneth Zadeck

I vote for quiet.For three reasons:
1) it matches the expectation of what programmers expect. I view 
people who do floating point as falling into two categories:
a) people who wish that there were no such thing as nans. These 
people are happy to write programs with just < > <= >= == != and then be 
unhappily surprised if their code blows up.
b) careful floating pt programmers.  These people never want a 
signal, they properly choose their operators to with knowledge of how 
each of those operations work with respect to nans.

These people will never use anything like a signaling <>
2) it matches iso n1778 which is primarily written to satisfy the needs 
to (b).
3) Whenever you leave something like this undefined, you are basically 
saying "do not optimize"


On 02/10/2015 04:46 PM, Joseph Myers wrote:

On Mon, 9 Feb 2015, Kenneth Zadeck wrote:


I don't think it's useful to have the trapping semantics unspecified for a
comparison operation like that.  So the question is what's most useful for
LTGT and LTGT_EXPR to do (presumably they should do the same thing).  Lots
of existing code in this area seems confused (for example, HONOR_SNANS
should be irrelevant to reversing an equality comparison, as reversing it
will change neither results nor exceptions raised, whether or not
signaling NaNs are involved).  But maybe more code treats LTGT as a
signaling operation than otherwise, in accordance with the original
intent, and so that's the most appropriate way to document it (with !UNEQ
being the representation of the corresponding quiet operation).

section 7.12.14.4 in C99 seems pretty much to say that this is a quiet
operation.

It says islessgreater is quiet.  It says nothing about the LTGT RTL
operation or the LTGT_EXPR GIMPLE/GENERIC operation.
__builtin_islessgreater is implemented using UNEQ_EXPR not LTGT_EXPR.

It may make sense to define LTGT as exactly !UNEQ, and so quiet, but the
choice of definition is a matter of what's convenient for the
implementation (and which choice you make determines which existing code
in GCC should be considered incorrect).

Where back ends implement ltgt patterns, I don't know if they are
consistently quiet or signaling.





Re: [PATCH] PR 62173, re-shuffle insns for RTL loop invariant hoisting

2015-02-11 Thread Kenneth Zadeck

On 02/11/2015 06:20 AM, Jiong Wang wrote:

2014-12-19 15:21 GMT+00:00 Kenneth Zadeck :

however, since i am a nice person 

loop-invariant solves the DF_UD_CHAIN which builds a data structure that
connects each use with all of the defs that reach it.   I believe that this
is the opposite of what you want here.

if you really need this, you need to also turn on the DF_DU_CHAIN which
builds the opposite structure.Both structures can be space hogs, but
they are both turned on in other places in the compiler so it is unlikely to
be an issue.

Exactly, Thanks, Kenneth.

This approach works from my experiment and look much better than
previous REG_NOTE approach.
while it do have one problem. We need the UD/DU chain built before we
do insn re-shuffling.
While after re-shuffling, UD chain needs update, otherwise, the later
"check_dependecies" in find_invariant_insn may fail.

although we have re-shuffle instruction 1 into 2, the later check
still using old UD info, thus
decide instruction 2 is not iv.

1: regA <- vfp + regB
2: regA <- vfp + const

my current fix is to insert those re-shuffled insn into a table named
"vfp_const_iv", then skip those
dependencies check  for them as they don't have any dependencies.
You now are beginning to understand why Mark Wegman and I invented SSA 
form almost 30 years ago  There is no good way to keep the 
information up to data as you are changing the program.To a large 
extent you are on your own.   If what you are suggesting works, then 
this is likely much faster than rebuilding the chains.






LOG_LINKs have nothing to do with single use; they point from the
_first_
use to its corresponding def.

You might want to look at what fwprop does instead.

Pass rtl fwprop uses df information in single-definition way, it
doesn't really take into consideration if register is a single use.
This often corrupts other optimizations like post-increment and
load/store pair.  For example:

add r2, r1, r0
ldr rx, [r2]
add r2, r2, #4
is transformed into below form:
add r2, r1, r0
ldr rx, [r1, r0]
add r2, r2, #4

As a result, post-increment opportunity is corrupted, also definition
of r2 can't be deleted because it's not single use.

Thanks,
bin

thanks for all these suggestion.

Have look at the LOG_LINK build function, a simple reverse scan, while
needs to allocate big map array for all pseudo regs. Haven't looked at
similar code in fwprop,

actually, when found the first matched insn pattern, I just want to
scan several insns next, then abort quickly if nothing meet
requirement. there is no need to build full single-use information.

still can anyone confirm that it is safe to re-use REG_DEAD info there
without calling df_note_add_problem and df_analysis first? or I am
using those info passed down from the previous pass which calculated
these info and maybe broken?

It's not safe to use REG_DEAD info without re-computing it.

not sure that reg_dead is the right answer even if you did re-compute it.
I believe you can have two parallel uses (on both sides of an if-then-else)
for a single def (above the if then else) and have two REG_DEAD notes.


Richard.






Re: patch to fix rtl documentation for new floating point comparisons

2015-02-09 Thread Kenneth Zadeck

On 02/09/2015 06:24 PM, Joseph Myers wrote:

On Mon, 9 Feb 2015, Kenneth Zadeck wrote:


  @findex ge
  @cindex greater than
@@ -2603,6 +2618,10 @@ Like @code{gt} and @code{gtu} but test f
  @item (ge:@var{m} @var{x} @var{y})
  @itemx (geu:@var{m} @var{x} @var{y})
  Like @code{gt} and @code{gtu} but test for ``greater than or equal''.
+If the operands are floating point, @code{ge} is a signaling
+comparison and corresponds to the IEC 60559
+@code{compareSignalingGreater} operation.  @code{geu} is undefined for
+floating point numbers.

No, compareSignalingGreaterEqual.


oops



  @findex le
  @cindex less than or equal
@@ -2610,7 +2629,64 @@ Like @code{gt} and @code{gtu} but test f
  @cindex unsigned less than
  @item (le:@var{m} @var{x} @var{y})
  @itemx (leu:@var{m} @var{x} @var{y})
-Like @code{gt} and @code{gtu} but test for ``less than or equal''.
+Like @code{gt} and @code{gtu} but test for ``less than or equal''.  If
+the operands are floating point, @code{le} is a signaling comparison
+and corresponds to the IEC 60559 @code{compareSignalingLess}
+operation.  @code{leu} is undefined for floating point numbers.

compareSignalingLessEqual.

oops again

+@table @code
+@findex ltgt
+@cindex less than or greater than
+@item (ltgt:@var{m} @var{x} @var{y})
+@code{STORE_FLAG_VALUE} if the values represented by @var{x} and
+@var{y} are less, or greater, otherwise 0.  This is a quiet operation
+that applies only to floating point operands and does not have a
+corresponding IEC 60559 operation.

On GENERIC, the documentation describes some ambiguity: "With the possible
exception of @code{LTGT_EXPR}, all of these operations are guaranteed
not to generate a floating point exception.".

LTGT (RTL code) was added by RTH in
<https://gcc.gnu.org/ml/gcc-patches/2000-01/msg00974.html>.  LTGT_EXPR was
added in the thread starting at
<https://gcc.gnu.org/ml/gcc-patches/2004-05/msg01674.html>, wherein RTH
stated "the LTGT rtl code is assumed to trap".  The documentation was soon
thereafter changed to have the "possible exception" wording (r82467, which
I can't find any sign of having been posted to gcc-patches so don't know
the rationale).

I don't think it's useful to have the trapping semantics unspecified for a
comparison operation like that.  So the question is what's most useful for
LTGT and LTGT_EXPR to do (presumably they should do the same thing).  Lots
of existing code in this area seems confused (for example, HONOR_SNANS
should be irrelevant to reversing an equality comparison, as reversing it
will change neither results nor exceptions raised, whether or not
signaling NaNs are involved).  But maybe more code treats LTGT as a
signaling operation than otherwise, in accordance with the original
intent, and so that's the most appropriate way to document it (with !UNEQ
being the representation of the corresponding quiet operation).
section 7.12.14.4 in C99 seems pretty much to say that this is a quiet 
operation.


The islessgreater macro determines whether its first argument is less 
than or
greater than its second argument. The islessgreater(x, y) macro is 
similar to

(x) < (y) || (x) > (y); howev er, islessgreater(x, y) does not raise
the ‘‘invalid’’ floating-point exception when x and y are unordered 
(nor does it evaluate x

and y twice).

So i think that it seems that we should be pretty safe saying it is quiet.



It's not clear to me that ad hoc logic around particular operations is the
best way of handling optimizations in this area, instead of generic logic
using four bits to track which conditions (< = > unordered) a comparison
is true for and one bit to track whether it raises an exception for
unordered operands.  Even if you keep using particular sets of tree / RTL
codes for a subset of operations, maybe transformations that map into and
out of such five-bit form would be more likely to be correct.


I will work on the tree version tomorrow and resubmit.

Kenny







Re: patch to fix rtl documentation for new floating point comparisons

2015-02-09 Thread Kenneth Zadeck

On 02/09/2015 06:26 PM, Richard Henderson wrote:

On 02/09/2015 11:10 AM, Kenneth Zadeck wrote:

+@table @code
+@findex ltgt
+@cindex less than or greater than
+@item (ltgt:@var{m} @var{x} @var{y})
+@code{STORE_FLAG_VALUE} if the values represented by @var{x} and
+@var{y} are less, or greater, otherwise 0.  This is a quiet operation
+that applies only to floating point operands and does not have a
+corresponding IEC 60559 operation.

This is intended to match c99 7.12.14.5 islessgreater, which I believe is the
compareQuietNotEqual operation.
I think that the description that I have is correct.   According to IEEE 
754-2008, table 5.1, page 29,  compareQuietNotEqual is defined to have 
the true relations of LT GT UN.What appears to be described in c99 
7.12.14.5 is a quiet version with relations LT GT and so returns false 
if either operand is a nan.   This does not have a defined name in IEEE 
754-2008, but if it did would have the name compareQuietLessGreater 
according to their naming conventions.





+@table @code
+@findex uneq
+@cindex unordered or equal
+@item (uneq:@var{m} @var{x} @var{y})
+@code{STORE_FLAG_VALUE} if the values represented by @var{x} and
+@var{y} are unordered or equal, otherwise 0.  This is a quiet
+operation that applies only to floating point operands and does not
+have a corresponding IEC 60559 operation.

This is the strict inverse to LTGT, i.e. !compareQuietNotEqual, and you of
course note this is not the same as compareQuietEqual.

I believe that when the c99 math.h comparison builtins were added, treating
EQ+NE as quiet comparisons was common but not universal.  It's possible this
could be simplified now.
I agree that this is the strict inverse of LTGT, but as the name of this 
implies, if either operand is a nan, this returns true.   As such, the 
relations are UN and EQ which matches the name.


r~




patch to fix rtl documentation for new floating point comparisons

2015-02-09 Thread Kenneth Zadeck
Before I commit this, or do the corresponding tree level patch, I would 
like this to be looked at by the FP people.   I am guessing at some of 
this, other parts i gleaned from various standards docs and an irc 
conversation with Joseph.   This should have been documented when it was 
put into the compiler.  This is certainly not obvious.



Kenny
Index: gcc/doc/rtl.texi
===
--- gcc/doc/rtl.texi	(revision 220541)
+++ gcc/doc/rtl.texi	(working copy)
@@ -2553,40 +2553,52 @@ of comparisons is supported on a particu
 pass will try to merge the operations to produce the @code{eq} shown
 in case it exists in the context of the particular insn involved.
 
-Inequality comparisons come in two flavors, signed and unsigned.  Thus,
-there are distinct expression codes @code{gt} and @code{gtu} for signed and
-unsigned greater-than.  These can produce different results for the same
-pair of integer values: for example, 1 is signed greater-than @minus{}1 but not
-unsigned greater-than, because @minus{}1 when regarded as unsigned is actually
+Fixed point inequality comparisons come in two flavors, signed and
+unsigned.  Thus, there are distinct expression codes @code{gt} and
+@code{gtu} for signed and unsigned greater-than.  These can produce
+different results for the same pair of integer values: for example, 1
+is signed greater-than @minus{}1 but not unsigned greater-than,
+because @minus{}1 when regarded as unsigned is actually
 @code{0x} which is greater than 1.
 
-The signed comparisons are also used for floating point values.  Floating
-point comparisons are distinguished by the machine modes of the operands.
+Floating point comparisons come in two flavors, signaling and quiet,
+though these are not always paired.  The fixed point signed
+comparisons are also used for floating point comparisons.  Floating
+point comparisons are distinguished by the machine modes of the
+operands.
 
 @table @code
 @findex eq
 @cindex equal
 @item (eq:@var{m} @var{x} @var{y})
-@code{STORE_FLAG_VALUE} if the values represented by @var{x} and @var{y}
-are equal, otherwise 0.
+@code{STORE_FLAG_VALUE} if the values represented by @var{x} and
+@var{y} are equal, otherwise 0.  If the operands are floating point,
+this is a quiet comparison and corresponds to the IEC 60559
+@code{compareQuietEqual} operation.
 
 @findex ne
 @cindex not equal
 @item (ne:@var{m} @var{x} @var{y})
-@code{STORE_FLAG_VALUE} if the values represented by @var{x} and @var{y}
-are not equal, otherwise 0.
+@code{STORE_FLAG_VALUE} if the values represented by @var{x} and
+@var{y} are not equal, otherwise 0. If the operands are floating
+point, this is a quiet comparison and corresponds to the IEC 60559
+@code{compareQuietNotEqual} operation.
 
 @findex gt
 @cindex greater than
 @item (gt:@var{m} @var{x} @var{y})
-@code{STORE_FLAG_VALUE} if the @var{x} is greater than @var{y}.  If they
-are fixed-point, the comparison is done in a signed sense.
+@code{STORE_FLAG_VALUE} if the @var{x} is greater than @var{y}.  If
+the operands are fixed-point, the comparison is done in a signed
+sense.  If the operands are floating point, this is a signaling
+comparison and corresponds to the IEC 60559
+@code{compareSignalingGreater} operation.
 
 @findex gtu
 @cindex greater than
 @cindex unsigned greater than
 @item (gtu:@var{m} @var{x} @var{y})
-Like @code{gt} but does unsigned comparison, on fixed-point numbers only.
+Like @code{gt} but does unsigned comparison, on fixed-point numbers
+only.  This is undefined for floating point numbers.
 
 @findex lt
 @cindex less than
@@ -2594,7 +2606,10 @@ Like @code{gt} but does unsigned compari
 @cindex unsigned less than
 @item (lt:@var{m} @var{x} @var{y})
 @itemx (ltu:@var{m} @var{x} @var{y})
-Like @code{gt} and @code{gtu} but test for ``less than''.
+Like @code{gt} and @code{gtu} but test for ``less than''. If the
+operands are floating point, @code{gt} is a signaling comparison and
+corresponds to the IEC 60559 @code{compareSignalingLess} operation.
+@code{gtu} is undefined for floating point numbers.
 
 @findex ge
 @cindex greater than
@@ -2603,6 +2618,10 @@ Like @code{gt} and @code{gtu} but test f
 @item (ge:@var{m} @var{x} @var{y})
 @itemx (geu:@var{m} @var{x} @var{y})
 Like @code{gt} and @code{gtu} but test for ``greater than or equal''.
+If the operands are floating point, @code{ge} is a signaling
+comparison and corresponds to the IEC 60559
+@code{compareSignalingGreater} operation.  @code{geu} is undefined for
+floating point numbers.
 
 @findex le
 @cindex less than or equal
@@ -2610,7 +2629,64 @@ Like @code{gt} and @code{gtu} but test f
 @cindex unsigned less than
 @item (le:@var{m} @var{x} @var{y})
 @itemx (leu:@var{m} @var{x} @var{y})
-Like @code{gt} and @code{gtu} but test for ``less than or equal''.
+Like @code{gt} and @code{gtu} but test for ``less than or equal''.  If
+the operands are floating point, @code{le} is a signaling comparison
+and corresponds t

Re: [PATCH] PR 62173, re-shuffle insns for RTL loop invariant hoisting

2014-12-19 Thread Kenneth Zadeck

On 12/19/2014 06:26 AM, Richard Biener wrote:

On Fri, Dec 19, 2014 at 11:28 AM, Jiong Wang
 wrote:

2014-12-19 3:51 GMT+00:00 Bin.Cheng :

On Fri, Dec 19, 2014 at 6:09 AM, Segher Boessenkool
 wrote:

On Thu, Dec 18, 2014 at 05:00:01PM +, Jiong Wang wrote:

On 17/12/14 15:54, Richard Biener wrote:

ick.  I realize we don't have SSA form on RTL but doesn't DF provide
at least some help in looking up definition statements for pseudos?
In fact we want to restrict the transform to single-use pseudos, thus
hopefully it can at least tell us that... (maybe not and this is what
LOG_LINKS are for in combine...?)  At least loop-invariant alreadly
computes df_chain with DF_UD_CHAIN which seems exactly what
is needed (apart from maybe getting at single-use info).

thanks very much for these inspiring questions.

yes, we want to restrict the transformation on single-use pseudo only,
and it's better the transformation could re-use existed info and helper
function to avoid increase compile time. but I haven't found anything I
can reuse at the stage the transformation happen.

the info similar as LOG_LINKS is what I want, but maybe simpler. I'd study
the code about build LOG_LINKS, and try to see if we can do some factor out.

LOG_LINKs in combine are just historical.  combine should be converted
to use DF fully.

As noted loop-invariant already computes DF use-def chains.  The question
is whether it does compute single-use info when doing that (should be trivially
possible).  Kenny?
In the US, (where i live), there is something called the "statute of 
limitations" which says that after 7 years you cannot be found 
accountable for anything except the most heinous of crimes.   It has 
been more than 7 years since i did this.


however, since i am a nice person 

loop-invariant solves the DF_UD_CHAIN which builds a data structure that 
connects each use with all of the defs that reach it.   I believe that 
this is the opposite of what you want here.


if you really need this, you need to also turn on the DF_DU_CHAIN which 
builds the opposite structure.Both structures can be space hogs, but 
they are both turned on in other places in the compiler so it is 
unlikely to be an issue.






LOG_LINKs have nothing to do with single use; they point from the _first_
use to its corresponding def.

You might want to look at what fwprop does instead.

Pass rtl fwprop uses df information in single-definition way, it
doesn't really take into consideration if register is a single use.
This often corrupts other optimizations like post-increment and
load/store pair.  For example:

   add r2, r1, r0
   ldr rx, [r2]
   add r2, r2, #4
is transformed into below form:
   add r2, r1, r0
   ldr rx, [r1, r0]
   add r2, r2, #4

As a result, post-increment opportunity is corrupted, also definition
of r2 can't be deleted because it's not single use.

Thanks,
bin

thanks for all these suggestion.

Have look at the LOG_LINK build function, a simple reverse scan, while
needs to allocate big map array for all pseudo regs. Haven't looked at
similar code in fwprop,

actually, when found the first matched insn pattern, I just want to
scan several insns next, then abort quickly if nothing meet
requirement. there is no need to build full single-use information.

still can anyone confirm that it is safe to re-use REG_DEAD info there
without calling df_note_add_problem and df_analysis first? or I am
using those info passed down from the previous pass which calculated
these info and maybe broken?

It's not safe to use REG_DEAD info without re-computing it.
not sure that reg_dead is the right answer even if you did re-compute 
it.   I believe you can have two parallel uses (on both sides of an 
if-then-else) for a single def (above the if then else) and have two 
REG_DEAD notes.



Richard.




[PATCH]: fix pr61111 Fixed width of mask.

2014-05-09 Thread Kenneth Zadeck

2014-05-09  Kenneth Zadeck  

PR middle-end/6
* fold-const.c (fold_binary_loc): Changed width of mask.


committed as revision 210274.

kenny
Index: gcc/fold-const.c
===
--- gcc/fold-const.c	(revision 210253)
+++ gcc/fold-const.c	(working copy)
@@ -11358,7 +11358,7 @@ fold_binary_loc (location_t loc,
 	  wide_int c3 = c1.and_not (c2);
 	  for (w = BITS_PER_UNIT; w <= width; w <<= 1)
 	{
-	  wide_int mask = wi::mask (width - w, false,
+	  wide_int mask = wi::mask (w, false,
 	TYPE_PRECISION (type));
 	  if (((c1 | c2) & mask) == mask && c1.and_not (mask) == 0)
 		{


Re: [wide-int] Add fast path for hosts with HWI widening multiplication

2014-05-08 Thread Kenneth Zadeck
everyone who has a private port will hate you forever.   note that i 
have 2 of them.

On 05/08/2014 02:31 PM, Richard Sandiford wrote:

"Joseph S. Myers"  writes:

On Thu, 8 May 2014, Ramana Radhakrishnan wrote:


  Ramana Radhakrishnan  

* wide-int.cc (UTItype): Define.
(UDWtype): Define for appropriate W_TYPE_SIZE.

This breaks builds for 32-bit hosts, where TImode isn't supported.  You
can only use TImode on the host if it's 64-bit.

wide-int.cc:37:56: error: unable to emulate 'TI'

The longlong.h interface seems to be designed to be as difficult to use
as possible :-(  So maybe we really do need to limit it to hosts that are
known to work and benefit from it.

How about the following?  I tested that it produces identical
wide-int.o .text for x86_64.

I think additions to or removals from the list should be treated as
pre-approved.

Thanks,
Richard


gcc/
* wide-int.cc: Only include longlong.h for certain targets.

Index: gcc/wide-int.cc
===
--- gcc/wide-int.cc 2014-05-08 19:13:15.782158808 +0100
+++ gcc/wide-int.cc 2014-05-08 19:28:52.880742385 +0100
@@ -27,19 +27,20 @@ along with GCC; see the file COPYING3.
  #include "tree.h"
  #include "dumpfile.h"
  
-#if GCC_VERSION >= 3000

+#if (GCC_VERSION >= 3000 \
+ && (defined __aarch64 \
+ || defined __alpha \
+ || defined __ia64 \
+ || defined __powerpc64__ \
+ || defined __sparcv9 \
+ || defined __x86_64__))
  #define W_TYPE_SIZE HOST_BITS_PER_WIDE_INT
-typedef unsigned HOST_HALF_WIDE_INT UHWtype;
-typedef unsigned HOST_WIDE_INT UWtype;
  typedef unsigned int UQItype __attribute__ ((mode (QI)));
  typedef unsigned int USItype __attribute__ ((mode (SI)));
  typedef unsigned int UDItype __attribute__ ((mode (DI)));
-typedef unsigned int UTItype __attribute__ ((mode (TI)));
-#if W_TYPE_SIZE == 32
-# define UDWtype   UDItype
-#elif W_TYPE_SIZE == 64
-# define UDWtype   UTItype
-#endif
+typedef unsigned HOST_HALF_WIDE_INT UHWtype;
+typedef unsigned HOST_WIDE_INT UWtype;
+typedef unsigned int UDWtype __attribute__ ((mode (TI)));
  #include "longlong.h"
  #endif
  




fallout on x86-64 from the wide-int merge.

2014-05-06 Thread Kenneth Zadeck
here is a comparison.  The two areas were built using configure with no 
options at all on x86-64.  The comparison is between revision 210112 and 
210113.Tsan is very unhappy but everything else looks ok.I know 
that this worked a couple of days before the merge. I know that there 
was some tsan fiddling before the merge.


Kenny




# Comparing directories
## Dir1=gbBaseline/: 11 sum files
## Dir2=gbNew: 11 sum files

# Comparing 11 common sum files
## /bin/sh gccBaseline/contrib/compare_tests  /tmp/gxx-sum1.26585 
/tmp/gxx-sum2.26585
Tests that now fail, but worked before:

c-c++-common/tsan/thread_leak.c  -O0  execution test
c-c++-common/tsan/thread_leak.c  -O0  execution test
c-c++-common/tsan/thread_leak.c  -O2  execution test
c-c++-common/tsan/thread_leak.c  -O2  execution test
g++.dg/tsan/aligned_vs_unaligned_race.C  -O0  execution test
g++.dg/tsan/aligned_vs_unaligned_race.C  -O2  execution test
g++.dg/tsan/benign_race.C  -O0  execution test
g++.dg/tsan/benign_race.C  -O2  execution test
g++.dg/tsan/default_options.C  -O0  execution test
g++.dg/tsan/default_options.C  -O2  execution test
g++.dg/tsan/fd_close_norace.C  -O0  execution test
g++.dg/tsan/fd_close_norace.C  -O2  execution test
g++.dg/tsan/fd_close_norace2.C  -O0  execution test
g++.dg/tsan/fd_close_norace2.C  -O2  execution test

New tests that FAIL:

c-c++-common/tsan/atomic_stack.c  -O0  output pattern test, is FATAL: 
ThreadSanitizer CHECK failed: 
../../../../gccNew/libsanitizer/tsan/tsan_rtl.cc:587 "((IsShadowMem((uptr)(p + 
size * kShadowCnt / kShadowCell - 1 != (0)" (0x0, 0x0)
c-c++-common/tsan/atomic_stack.c  -O0  output pattern test, is FATAL: 
ThreadSanitizer CHECK failed: 
../../../../gccNew/libsanitizer/tsan/tsan_rtl.cc:587 "((IsShadowMem((uptr)(p + 
size * kShadowCnt / kShadowCell - 1 != (0)" (0x0, 0x0)
c-c++-common/tsan/atomic_stack.c  -O2  output pattern test, is FATAL: 
ThreadSanitizer CHECK failed: 
../../../../gccNew/libsanitizer/tsan/tsan_rtl.cc:587 "((IsShadowMem((uptr)(p + 
size * kShadowCnt / kShadowCell - 1 != (0)" (0x0, 0x0)
c-c++-common/tsan/atomic_stack.c  -O2  output pattern test, is FATAL: 
ThreadSanitizer CHECK failed: 
../../../../gccNew/libsanitizer/tsan/tsan_rtl.cc:587 "((IsShadowMem((uptr)(p + 
size * kShadowCnt / kShadowCell - 1 != (0)" (0x0, 0x0)
c-c++-common/tsan/fd_pipe_race.c  -O0  output pattern test, is FATAL: 
ThreadSanitizer CHECK failed: 
../../../../gccNew/libsanitizer/tsan/tsan_rtl.cc:587 "((IsShadowMem((uptr)(p + 
size * kShadowCnt / kShadowCell - 1 != (0)" (0x0, 0x0)
c-c++-common/tsan/fd_pipe_race.c  -O0  output pattern test, is FATAL: 
ThreadSanitizer CHECK failed: 
../../../../gccNew/libsanitizer/tsan/tsan_rtl.cc:587 "((IsShadowMem((uptr)(p + 
size * kShadowCnt / kShadowCell - 1 != (0)" (0x0, 0x0)
c-c++-common/tsan/fd_pipe_race.c  -O2  output pattern test, is FATAL: 
ThreadSanitizer CHECK failed: 
../../../../gccNew/libsanitizer/tsan/tsan_rtl.cc:587 "((IsShadowMem((uptr)(p + 
size * kShadowCnt / kShadowCell - 1 != (0)" (0x0, 0x0)
c-c++-common/tsan/fd_pipe_race.c  -O2  output pattern test, is FATAL: 
ThreadSanitizer CHECK failed: 
../../../../gccNew/libsanitizer/tsan/tsan_rtl.cc:587 "((IsShadowMem((uptr)(p + 
size * kShadowCnt / kShadowCell - 1 != (0)" (0x0, 0x0)
c-c++-common/tsan/free_race.c  -O0  output pattern test, is FATAL: 
ThreadSanitizer CHECK failed: 
../../../../gccNew/libsanitizer/tsan/tsan_rtl.cc:587 "((IsShadowMem((uptr)(p + 
size * kShadowCnt / kShadowCell - 1 != (0)" (0x0, 0x0)
c-c++-common/tsan/free_race.c  -O0  output pattern test, is FATAL: 
ThreadSanitizer CHECK failed: 
../../../../gccNew/libsanitizer/tsan/tsan_rtl.cc:587 "((IsShadowMem((uptr)(p + 
size * kShadowCnt / kShadowCell - 1 != (0)" (0x0, 0x0)
c-c++-common/tsan/free_race.c  -O2  output pattern test, is FATAL: 
ThreadSanitizer CHECK failed: 
../../../../gccNew/libsanitizer/tsan/tsan_rtl.cc:587 "((IsShadowMem((uptr)(p + 
size * kShadowCnt / kShadowCell - 1 != (0)" (0x0, 0x0)
c-c++-common/tsan/free_race.c  -O2  output pattern test, is FATAL: 
ThreadSanitizer CHECK failed: 
../../../../gccNew/libsanitizer/tsan/tsan_rtl.cc:587 "((IsShadowMem((uptr)(p + 
size * kShadowCnt / kShadowCell - 1 != (0)" (0x0, 0x0)
c-c++-common/tsan/free_race2.c  -O0  output pattern test, is FATAL: 
ThreadSanitizer CHECK failed: 
../../../../gccNew/libsanitizer/tsan/tsan_rtl.cc:587 "((IsShadowMem((uptr)(p + 
size * kShadowCnt / kShadowCell - 1 != (0)" (0x0, 0x0)
c-c++-common/tsan/free_race2.c  -O0  output pattern test, is FATAL: 
ThreadSanitizer CHECK failed: 
../../../../gccNew/libsanitizer/tsan/tsan_rtl.cc:587 "((IsShadowMem((uptr)(p + 
size * kShadowCnt / kShadowCell - 1 != (0)" (0x0, 0x0)
c-c++-common/tsan/free_race2.c  -O2  output pattern test, is FATAL: 
ThreadSanitizer CHECK failed: 
../../../../gccNew/libsanitizer/tsan/tsan_rtl.cc:587 "((IsShadowMem((uptr)(p + 
size * kShadowCnt / kShadowCell - 1 != (0)" (0x0, 0x0)
c-c++-common/tsan/free_race2.c  -O2  output pattern 

we are starting the wide int merge

2014-05-06 Thread Kenneth Zadeck
please hold off on committing patches for the next couple of hours as we 
have a very large merge to do.

thanks.

kenny


Re: [wide-int] Handle zero-precision INTEGER_CSTs again

2014-05-03 Thread Kenneth Zadeck

Then with a fixed comment, this patch is fine.

kenny

On 05/03/2014 02:59 PM, Richard Sandiford wrote:

Kenneth Zadeck  writes:

The doc at wide-int.h:136 needs work.The doc currently says that the
precision and length are always greater than 0.   This is now not
correct. It also says that the bits above the precision are defined
to be the sign extension if the precision does not cover that block.

I agree that the comment needs work.

Reordering slightly:


I do not know if i believe this completely.

This is now covered by the compile-time is_sign_extended property.

Trees aren't extended in that way because we want to be able to access
them quickly in wider precisions, with the extension following the
signedness of the underlying type.  So an 8-bit all-ones INTEGER_CST
will be stored as -1 if the type is signed and as 0xff otherwise.

RTL constants aren't sign-extended because of the case you note:


I worry about this because we have moved back and forth on this issue
many times and i still see code at rtl.h:1440 which assumes that BImode
constants are stored differently on some platforms. It is possible that
i am reading that code incorrectly but at first reading it looks
questionable.   so code comparing a bimode 1 with a 1 of a different
precision would not work.

I hope to get rid of the special BImode case after the merge and set
rtl's is_sign_extended back to true.

wide_int itself is always sign-extended.  The situation never occurs
for offset_int and widest_int.


I noticed that in the eq_p_large code, you removed a block of code left
over from the days when this was not true, but there is still so of this
code remaining that does not assume this.

The *_large functions have to be conservative and assume that the inputs
are not sign-extended.  My eq_p_large change doesn't change that,
it just represents it differently.  The idea is that rather than
extending each HWI individually from small_prec and then comparing them,
we can shift the XOR of the two values left by the number of excess bits,
so that all undefined bits disappear from the value.  The shifted-in bits
are all zeros so the shifted XOR value will be zero iff the meaningful
bits are the same and nonzero iff the meaningful bits are different,
regardless of whether the input is sign-extended or not.  The same
technique is already used in the inline version.

Thanks,
Richard




Re: [wide-int] Handle zero-precision INTEGER_CSTs again

2014-05-03 Thread Kenneth Zadeck
The doc at wide-int.h:136 needs work.The doc currently says that the 
precision and length are always greater than 0.   This is now not 
correct. It also says that the bits above the precision are defined 
to be the sign extension if the precision does not cover that block.


I do not know if i believe this completely.

I noticed that in the eq_p_large code, you removed a block of code left 
over from the days when this was not true, but there is still so of this 
code remaining that does not assume this.


I worry about this because we have moved back and forth on this issue 
many times and i still see code at rtl.h:1440 which assumes that BImode 
constants are stored differently on some platforms. It is possible that 
i am reading that code incorrectly but at first reading it looks 
questionable.   so code comparing a bimode 1 with a 1 of a different 
precision would not work.


aside from that the patch is fine.

kenny

On 05/02/2014 03:19 PM, Richard Sandiford wrote:

I'd hoped the days of zero-precision INTEGER_CSTs were behind us after
Richard's patch to remove min amd max values from zero-width bitfields,
but a boostrap-ubsan showed otherwise.  One source is in:

   null_pointer_node = build_int_cst (build_pointer_type (void_type_node), 0);

if no_target, since the precision of the type comes from ptr_mode,
which remains the default VOIDmode.  Maybe that's a bug, but setting
it to an arbitrary nonzero value would also be dangerous.

The other use is in the C/C++ void_zero_node, which is specifically
defined as a VOID_TYPE, zero-precision INTEGER_CST.  This is used by the
ubsan code in some ?: tests, as well as by other places in the frontend
proper.  At least the ubsan use persists until gimple.

This patch therefore restores the wide-int handling for zero precision,
for which the length must be 1 and the single HWI must be zero.
I've tried to wrap up most of the dependencies in two new functions,
wi::blocks_needed (a function version of the .cc BLOCKS_NEEDED, now also
used in the header) and wi::excess_bits, so that it'll be easier to
remove the handling again if zero precision ever goes away.
There are some remaining, easily-greppable cases that check directly
for a precision of 0 though.

The combination of this and the other patches allows boostrap-ubsan to
complete.  There are a lot of extra testsuite failures compared to a
normal bootstrap, but they don't look related to wide-int.

Tested on x86_64-linux-gnu and powerpc64-linux-gnu.  OK to install?

Thanks,
Richard


Index: gcc/wide-int.cc
===
--- gcc/wide-int.cc 2014-05-02 16:28:07.657847935 +0100
+++ gcc/wide-int.cc 2014-05-02 16:28:09.560842845 +0100
@@ -48,8 +48,6 @@ static const HOST_WIDE_INT zeros[WIDE_IN
  #define HALF_INT_MASK (((HOST_WIDE_INT) 1 << HOST_BITS_PER_HALF_WIDE_INT) - 1)
  
  #define BLOCK_OF(TARGET) ((TARGET) / HOST_BITS_PER_WIDE_INT)

-#define BLOCKS_NEEDED(PREC) \
-  (PREC ? (((PREC) + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT) : 1)
  #define SIGN_MASK(X) ((HOST_WIDE_INT) (X) < 0 ? -1 : 0)
  
  /* Return the value a VAL[I] if I < LEN, otherwise, return 0 or -1

@@ -69,20 +67,20 @@ safe_uhwi (const HOST_WIDE_INT *val, uns
  static unsigned int
  canonize (HOST_WIDE_INT *val, unsigned int len, unsigned int precision)
  {
-  unsigned int blocks_needed = BLOCKS_NEEDED (precision);
+  unsigned int blocks_needed = wi::blocks_needed (precision);
HOST_WIDE_INT top;
int i;
  
if (len > blocks_needed)

  len = blocks_needed;
  
+  if (wi::excess_bits (len, precision) > 0)

+val[len - 1] = sext_hwi (val[len - 1], precision % HOST_BITS_PER_WIDE_INT);
if (len == 1)
  return len;
  
top = val[len - 1];

-  if (len * HOST_BITS_PER_WIDE_INT > precision)
-val[len - 1] = top = sext_hwi (top, precision % HOST_BITS_PER_WIDE_INT);
-  if (top != 0 && top != (HOST_WIDE_INT)-1)
+  if (top != 0 && top != (HOST_WIDE_INT) -1)
  return len;
  
/* At this point we know that the top is either 0 or -1.  Find the

@@ -134,7 +132,7 @@ wi::from_buffer (const unsigned char *bu
  
/* We have to clear all the bits ourself, as we merely or in values

   below.  */
-  unsigned int len = BLOCKS_NEEDED (precision);
+  unsigned int len = wi::blocks_needed (precision);
HOST_WIDE_INT *val = result.write_val ();
for (unsigned int i = 0; i < len; ++i)
  val[i] = 0;
@@ -180,7 +178,7 @@ wi::to_mpz (const wide_int_ref &x, mpz_t
  {
int len = x.get_len ();
const HOST_WIDE_INT *v = x.get_val ();
-  int excess = len * HOST_BITS_PER_WIDE_INT - x.get_precision ();
+  unsigned int excess = wi::excess_bits (len, x.get_precision ());
  
if (wi::neg_p (x, sgn))

  {
@@ -306,7 +304,8 @@ wi::force_to_size (HOST_WIDE_INT *val, c
   unsigned int xlen, unsigned int xprecision,
   unsigned int precision, signop sgn)
  {
-  unsigned int blocks_needed = BLOCKS_NEEDED (precision);
+  uns

Re: [wide-int] Fix some division cases

2014-05-02 Thread Kenneth Zadeck

this is fine.

On 05/02/2014 03:22 PM, Richard Sandiford wrote:

divmod_internal didn't handle unsigned division in which the inputs
have implicit all-one upper bits.  There were two problems:

- wi_unpack should extend implicit 1s to index blocks_needed - 1
   (possibly with a zext_hwi on the last block for small_prec)
   regardless of signedness

- when calculating m and n, divmod_internal only used the
   *_blocks_needed value if the division was signed.  Neither is value
   is really signed in this context, since we've already calculated
   absolute values if sgnop == SIGNED, so the neg_p (and its pre-wi::
   incarnation) would only have held for the minimum integer.

   Using a simple while loop to find the top half-HWI seems more direct.

Also, divmod_internal_2 took n and m as unsigned values but used
HOST_WIDE_INT iterators on m - n.  m can be less than n for
things like 0 / big, and the negative m - n was being treated
as an unsigned int and zero-extended to HOST_WIDE_INT.
The patch fixes both the types of n and m and the types
of the iterators.

Tested on x86_64-linux-gnu and powerpc64-linux-gnu.  OK to install?

Thanks,
Richard


Index: gcc/wide-int.cc
===
--- gcc/wide-int.cc 2014-05-02 16:28:09.560842845 +0100
+++ gcc/wide-int.cc 2014-05-02 16:40:40.002916374 +0100
@@ -1126,8 +1126,7 @@ wi::add_large (HOST_WIDE_INT *val, const
 HOST_HALF_WIDE_INTs of RESULT.  The rest of RESULT is filled by
 uncompressing the top bit of INPUT[IN_LEN - 1].  */
  static void
-wi_unpack (unsigned HOST_HALF_WIDE_INT *result,
-  const unsigned HOST_WIDE_INT *input,
+wi_unpack (unsigned HOST_HALF_WIDE_INT *result, const HOST_WIDE_INT *input,
   unsigned int in_len, unsigned int out_len,
   unsigned int prec, signop sgn)
  {
@@ -1145,20 +1144,24 @@ wi_unpack (unsigned HOST_HALF_WIDE_INT *
else
  mask = 0;
  
-  for (i = 0; i < in_len; i++)

+  for (i = 0; i < blocks_needed - 1; i++)
  {
-  HOST_WIDE_INT x = input[i];
-  if (i == blocks_needed - 1 && small_prec)
-   {
- if (sgn == SIGNED)
-   x = sext_hwi (x, small_prec);
- else
-   x = zext_hwi (x, small_prec);
-   }
+  HOST_WIDE_INT x = safe_uhwi (input, in_len, i);
result[j++] = x;
result[j++] = x >> HOST_BITS_PER_HALF_WIDE_INT;
  }
  
+  HOST_WIDE_INT x = safe_uhwi (input, in_len, i);

+  if (small_prec)
+{
+  if (sgn == SIGNED)
+   x = sext_hwi (x, small_prec);
+  else
+   x = zext_hwi (x, small_prec);
+}
+  result[j++] = x;
+  result[j++] = x >> HOST_BITS_PER_HALF_WIDE_INT;
+
/* Smear the sign bit.  */
while (j < out_len)
  result[j++] = mask;
@@ -1317,10 +1320,8 @@ wi::mul_internal (HOST_WIDE_INT *val, co
  }
  
/* We do unsigned mul and then correct it.  */

-  wi_unpack (u, (const unsigned HOST_WIDE_INT *) op1val, op1len,
-half_blocks_needed, prec, SIGNED);
-  wi_unpack (v, (const unsigned HOST_WIDE_INT *) op2val, op2len,
-half_blocks_needed, prec, SIGNED);
+  wi_unpack (u, op1val, op1len, half_blocks_needed, prec, SIGNED);
+  wi_unpack (v, op2val, op2len, half_blocks_needed, prec, SIGNED);
  
/* The 2 is for a full mult.  */

memset (r, 0, half_blocks_needed * 2
@@ -1519,7 +1520,7 @@ divmod_internal_2 (unsigned HOST_HALF_WI
   unsigned HOST_HALF_WIDE_INT *b_remainder,
   unsigned HOST_HALF_WIDE_INT *b_dividend,
   unsigned HOST_HALF_WIDE_INT *b_divisor,
-  unsigned int m, unsigned int n)
+  int m, int n)
  {
/* The "digits" are a HOST_HALF_WIDE_INT which the size of half of a
   HOST_WIDE_INT and stored in the lower bits of each word.  This
@@ -1530,7 +1531,8 @@ divmod_internal_2 (unsigned HOST_HALF_WI
unsigned HOST_WIDE_INT qhat;   /* Estimate of quotient digit.  */
unsigned HOST_WIDE_INT rhat;   /* A remainder.  */
unsigned HOST_WIDE_INT p;  /* Product of two digits.  */
-  HOST_WIDE_INT s, i, j, t, k;
+  HOST_WIDE_INT t, k;
+  int i, j, s;
  
/* Single digit divisor.  */

if (n == 1)
@@ -1757,26 +1759,17 @@ wi::divmod_internal (HOST_WIDE_INT *quot
}
  }
  
-  wi_unpack (b_dividend, (const unsigned HOST_WIDE_INT *) dividend.get_val (),

-dividend.get_len (), dividend_blocks_needed, dividend_prec, sgn);
-  wi_unpack (b_divisor, (const unsigned HOST_WIDE_INT *) divisor.get_val (),
-divisor.get_len (), divisor_blocks_needed, divisor_prec, sgn);
-
-  if (wi::neg_p (dividend, sgn))
-m = dividend_blocks_needed;
-  else
-m = 2 * dividend.get_len ();
+  wi_unpack (b_dividend, dividend.get_val (), dividend.get_len (),
+dividend_blocks_needed, dividend_prec, sgn);
+  wi_unpack (b_divisor, divisor.get_val (), divisor.get_len (),
+divisor_blocks_needed, divisor_prec, sgn);
+
+  m = dividend_blocks_needed;
+  while (m > 1 && b_divide

Re: [wide-int] Add more assertions

2014-05-02 Thread Kenneth Zadeck

These are fine.

On 05/02/2014 03:20 PM, Richard Sandiford wrote:

This patch adds some assertions against sext (.., 0) and zext (..., 0).
The former is undefined at the sext_hwi level and the latter is disallowed
for consistency with the former.

Also, set_bit (rightly IMO) can't handle bit >= precision.  For
precision <= HOST_BITS_PER_WIDE_INT it would invoke undefined
behaviour while for other precisions I think it would crash.
A case with precision <= HOST_BITS_PER_WIDE_INT showed up in java
(fix posted separately).

Tested on x86_64-linux-gnu and powerpc64-linux-gnu.  OK to install?

Thanks,
Richard


Index: gcc/wide-int.h
===
--- gcc/wide-int.h  2014-05-02 16:28:09.561842842 +0100
+++ gcc/wide-int.h  2014-05-02 16:44:04.015463718 +0100
@@ -2046,6 +2046,8 @@ wi::sext (const T &x, unsigned int offse
unsigned int precision = get_precision (result);
WIDE_INT_REF_FOR (T) xi (x, precision);
  
+  gcc_checking_assert (offset != 0);

+
if (offset <= HOST_BITS_PER_WIDE_INT)
  {
val[0] = sext_hwi (xi.ulow (), offset);
@@ -2065,6 +2067,8 @@ wi::zext (const T &x, unsigned int offse
unsigned int precision = get_precision (result);
WIDE_INT_REF_FOR (T) xi (x, precision);
  
+  gcc_checking_assert (offset != 0);

+
/* This is not just an optimization, it is actually required to
   maintain canonization.  */
if (offset >= precision)
@@ -2102,6 +2106,9 @@ wi::set_bit (const T &x, unsigned int bi
WI_UNARY_RESULT_VAR (result, val, T, x);
unsigned int precision = get_precision (result);
WIDE_INT_REF_FOR (T) xi (x, precision);
+
+  gcc_checking_assert (bit < precision);
+
if (precision <= HOST_BITS_PER_WIDE_INT)
  {
val[0] = xi.ulow () | ((unsigned HOST_WIDE_INT) 1 << bit);




Re: [wide-int] Stricter type checking in wide_int constructor

2014-04-28 Thread Kenneth Zadeck

On 04/28/2014 12:25 PM, Mike Stump wrote:

On Apr 28, 2014, at 2:36 AM, Richard Sandiford  
wrote:

Ping.  FWIW this is the last patch I have lined up before the merge.
I repeated the asm comparison test I did a few months ago on one target
per config/ architecture and there were no unexpected changes.

Nice, thanks.

by "nice thanks", he meant "accepted".


Re: [PATCH][RFC][wide-int] Fix some build errors on arm in wide-int branch and report ICE

2014-04-28 Thread Kenneth Zadeck

ok to commit.

kenny
On 04/28/2014 11:42 AM, Richard Sandiford wrote:

Kyrill Tkachov  writes:

With that patch bootstrap now still fails at dwarf2out.c with the same
message. I'm attaching a gzipped dwarf2out.ii

Thanks.  This is a nice proof of why clz_zero and ctz_zero were as bogus
as claimed.  It meant that the behaviour of floor_log2 depended on the
target and would return the wrong value if clz (0) was anything other
than the precision.

This patch makes the wide-int functions behave like the double_int
ones and pushes the target dependency back to the callers that care,
which is where it belongs.  The "new" *_DEFINED_VALUE_AT_ZERO checks
are really reinstating what's already on trunk.  There are other tree
uses of ctz that I think relied on the double_int behaviour.

Tests still ongoing, but could you check what the arm results are like
with this?

Thanks,
Richard


Index: gcc/builtins.c
===
--- gcc/builtins.c  2014-04-28 16:30:59.939239843 +0100
+++ gcc/builtins.c  2014-04-28 16:31:00.252238996 +0100
@@ -8080,6 +8080,7 @@ fold_builtin_bitop (tree fndecl, tree ar
/* Optimize for constant argument.  */
if (TREE_CODE (arg) == INTEGER_CST && !TREE_OVERFLOW (arg))
  {
+  tree type = TREE_TYPE (arg);
int result;
  
switch (DECL_FUNCTION_CODE (fndecl))

@@ -8089,11 +8090,17 @@ fold_builtin_bitop (tree fndecl, tree ar
  break;
  
  	CASE_INT_FN (BUILT_IN_CLZ):

- result = wi::clz (arg);
+ if (wi::ne_p (arg, 0))
+   result = wi::clz (arg);
+ else if (! CLZ_DEFINED_VALUE_AT_ZERO (TYPE_MODE (type), result))
+   result = TYPE_PRECISION (type);
  break;
  
  	CASE_INT_FN (BUILT_IN_CTZ):

- result = wi::ctz (arg);
+ if (wi::ne_p (arg, 0))
+   result = wi::ctz (arg);
+ else if (! CTZ_DEFINED_VALUE_AT_ZERO (TYPE_MODE (type), result))
+   result = TYPE_PRECISION (type);
  break;
  
  	CASE_INT_FN (BUILT_IN_CLRSB):

Index: gcc/simplify-rtx.c
===
--- gcc/simplify-rtx.c  2014-04-28 16:30:59.941239838 +0100
+++ gcc/simplify-rtx.c  2014-04-28 16:31:00.254238990 +0100
@@ -1656,6 +1656,7 @@ simplify_const_unary_operation (enum rtx
wide_int result;
enum machine_mode imode = op_mode == VOIDmode ? mode : op_mode;
rtx_mode_t op0 = std::make_pair (op, imode);
+  int int_value;
  
  #if TARGET_SUPPORTS_WIDE_INT == 0

/* This assert keeps the simplification from producing a result
@@ -1686,7 +1687,11 @@ simplify_const_unary_operation (enum rtx
  break;
  
  	case CLZ:

- result = wi::shwi (wi::clz (op0), mode);
+ if (wi::ne_p (op0, 0))
+   int_value = wi::clz (op0);
+ else if (! CLZ_DEFINED_VALUE_AT_ZERO (mode, int_value))
+   int_value = GET_MODE_PRECISION (mode);
+ result = wi::shwi (int_value, mode);
  break;
  
  	case CLRSB:

@@ -1694,7 +1699,11 @@ simplify_const_unary_operation (enum rtx
  break;
  
  	case CTZ:

- result = wi::shwi (wi::ctz (op0), mode);
+ if (wi::ne_p (op0, 0))
+   int_value = wi::ctz (op0);
+ else if (! CTZ_DEFINED_VALUE_AT_ZERO (mode, int_value))
+   int_value = GET_MODE_PRECISION (mode);
+ result = wi::shwi (int_value, mode);
  break;
  
  	case POPCOUNT:

Index: gcc/wide-int.cc
===
--- gcc/wide-int.cc 2014-04-28 16:30:59.941239838 +0100
+++ gcc/wide-int.cc 2014-04-28 16:31:00.254238990 +0100
@@ -1137,46 +1137,6 @@ wi::add_large (HOST_WIDE_INT *val, const
return canonize (val, len, prec);
  }
  
-/* This is bogus.  We should always return the precision and leave the

-   caller to handle target dependencies.  */
-static int
-clz_zero (unsigned int precision)
-{
-  unsigned int count;
-
-  enum machine_mode mode = mode_for_size (precision, MODE_INT, 0);
-  if (mode == BLKmode)
-mode_for_size (precision, MODE_PARTIAL_INT, 0);
-
-  /* Even if the value at zero is undefined, we have to come up
- with some replacement.  Seems good enough.  */
-  if (mode == BLKmode)
-count = precision;
-  else if (!CLZ_DEFINED_VALUE_AT_ZERO (mode, count))
-count = precision;
-  return count;
-}
-
-/* This is bogus.  We should always return the precision and leave the
-   caller to handle target dependencies.  */
-static int
-ctz_zero (unsigned int precision)
-{
-  unsigned int count;
-
-  enum machine_mode mode = mode_for_size (precision, MODE_INT, 0);
-  if (mode == BLKmode)
-mode_for_size (precision, MODE_PARTIAL_INT, 0);
-
-  /* Even if the value at zero is undefined, we have to come up
- with some replacement.  Seems good enough.  */
-  if (mode == BLKmode)
-count = precision;
-  else if (!CTZ_DEFINED_VALUE_AT_ZERO (mode, count))
-count = precision;
-  return count;
-}
-
  /* Subrout

Re: [wide-int 4/5] Fix canonize handling of "small_prec" case

2014-04-26 Thread Kenneth Zadeck

this is fine.

kenny
On 04/25/2014 09:44 AM, Richard Sandiford wrote:

We should write back the sign-extended value.

Tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard


Index: gcc/wide-int.cc
===
--- gcc/wide-int.cc 2014-04-25 09:15:14.297359380 +0100
+++ gcc/wide-int.cc 2014-04-25 09:15:50.755637670 +0100
@@ -81,7 +81,7 @@ canonize (HOST_WIDE_INT *val, unsigned i
  
top = val[len - 1];

if (len * HOST_BITS_PER_WIDE_INT > precision)
-top = sext_hwi (top, precision % HOST_BITS_PER_WIDE_INT);
+val[len - 1] = top = sext_hwi (top, precision % HOST_BITS_PER_WIDE_INT);
if (top != 0 && top != (HOST_WIDE_INT)-1)
  return len;
  




Re: [wide-int 5/5] Add dump () method for gdb debugging

2014-04-26 Thread Kenneth Zadeck
i am sorry, i missed the fact that the loop counts up but you were 
reversing the order in the indexes.


kenny
On 04/26/2014 04:26 AM, Richard Sandiford wrote:

Kenneth Zadeck  writes:

don't you think that it would be easier to understand the number if you
printed it largest index first, as in the routines in wide-int-print.cc?

Yeah, that's what the patch does.  E.g. (for 32-bit HWI):

   [...,0x3,0x8000]

is 7 << 31.

Thanks,
Richard


kenny
On 04/25/2014 09:58 AM, Richard Sandiford wrote:

This patch adds a dump () method so that it's easier to read the
contents of the various wide-int types in gdb.  I've deliberately not
done any extension for "small_prec" cases because I think here
we want to see the raw values as much as possible.

Tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard


Index: gcc/wide-int.cc
===
--- gcc/wide-int.cc 2014-04-25 14:48:03.263213403 +0100
+++ gcc/wide-int.cc 2014-04-25 14:48:25.186333842 +0100
@@ -2130,3 +2130,9 @@ wi::only_sign_bit_p (const wide_int_ref
   void gt_ggc_mx (widest_int *) { }
   void gt_pch_nx (widest_int *, void (*) (void *, void *), void *) { }
   void gt_pch_nx (widest_int *) { }
+
+template void wide_int::dump () const;
+template void generic_wide_int  >::dump () const;
+template void generic_wide_int  >::dump () const;
+template void offset_int::dump () const;
+template void widest_int::dump () const;
Index: gcc/wide-int.h
===
--- gcc/wide-int.h  2014-04-25 14:34:54.823652619 +0100
+++ gcc/wide-int.h  2014-04-25 14:48:06.218229309 +0100
@@ -709,6 +709,9 @@ #define INCDEC_OPERATOR(OP, DELTA) \
   #undef ASSIGNMENT_OPERATOR
   #undef INCDEC_OPERATOR
   
+  /* Debugging functions.  */

+  void dump () const;
+
 static const bool is_sign_extended
   = wi::int_traits  >::is_sign_extended;
   };
@@ -859,6 +862,23 @@ generic_wide_int ::operator = (
 return *this;
   }
   
+/* Dump the contents of the integer to stderr, for debugging.  */

+template 
+void
+generic_wide_int ::dump () const
+{
+  unsigned int len = this->get_len ();
+  const HOST_WIDE_INT *val = this->get_val ();
+  unsigned int precision = this->get_precision ();
+  fprintf (stderr, "[");
+  if (len * HOST_BITS_PER_WIDE_INT < precision)
+fprintf (stderr, "...,");
+  for (unsigned int i = 0; i < len - 1; ++i)
+fprintf (stderr, HOST_WIDE_INT_PRINT_HEX ",", val[len - 1 - i]);
+  fprintf (stderr, HOST_WIDE_INT_PRINT_HEX "], precision = %d\n",
+  val[0], precision);
+}
+
   namespace wi
   {
 template <>




Re: [wide-int 3/5] Fix large widths in shifted_mask

2014-04-25 Thread Kenneth Zadeck

approved
On 04/25/2014 09:40 AM, Richard Sandiford wrote:

Very minor, but since shifted_mask copes with out-of-range widths,
I think mask should too.

Tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard


Index: gcc/wide-int.cc
===
--- gcc/wide-int.cc 2014-04-25 09:26:57.025944460 +0100
+++ gcc/wide-int.cc 2014-04-25 09:37:16.873811137 +0100
@@ -716,7 +716,7 @@ wi::mask (HOST_WIDE_INT *val, unsigned i
gcc_assert (width < 4 * MAX_BITSIZE_MODE_ANY_INT);
gcc_assert (prec <= 4 * MAX_BITSIZE_MODE_ANY_INT);
  
-  if (width == prec)

+  if (width >= prec)
  {
val[0] = negate ? 0 : -1;
return 1;




Re: [wide-int 2/5] Fix large widths in shifted_mask

2014-04-25 Thread Kenneth Zadeck

approved.
On 04/25/2014 09:39 AM, Richard Sandiford wrote:

shifted_mask would mishandle cases where the start bit is in the middle
of a HWI and the end bit is in a different HWI.  The "000111000" case
needs to check that the start and end are in the same block.

In the changed lines, "shift" is the position of the lowest mask bit
in its containing HWI.  The end is in the same HWI if shift + width
< HOST_BITS_PER_WIDE_INT.

Tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard


Index: gcc/wide-int.cc
===
--- gcc/wide-int.cc 2014-04-25 09:50:37.850073188 +0100
+++ gcc/wide-int.cc 2014-04-25 09:51:05.992292099 +0100
@@ -768,8 +768,8 @@ wi::shifted_mask (HOST_WIDE_INT *val, un
if (shift)
  {
HOST_WIDE_INT block = ((unsigned HOST_WIDE_INT) 1 << shift) - 1;
-  shift = end & (HOST_BITS_PER_WIDE_INT - 1);
-  if (shift)
+  shift += width;
+  if (shift < HOST_BITS_PER_WIDE_INT)
{
  /* case 000111000 */
  block = ((unsigned HOST_WIDE_INT) 1 << shift) - block - 1;




Re: [wide-int 5/5] Add dump () method for gdb debugging

2014-04-25 Thread Kenneth Zadeck
don't you think that it would be easier to understand the number if you 
printed it largest index first, as in the routines in wide-int-print.cc?


kenny
On 04/25/2014 09:58 AM, Richard Sandiford wrote:

This patch adds a dump () method so that it's easier to read the
contents of the various wide-int types in gdb.  I've deliberately not
done any extension for "small_prec" cases because I think here
we want to see the raw values as much as possible.

Tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard


Index: gcc/wide-int.cc
===
--- gcc/wide-int.cc 2014-04-25 14:48:03.263213403 +0100
+++ gcc/wide-int.cc 2014-04-25 14:48:25.186333842 +0100
@@ -2130,3 +2130,9 @@ wi::only_sign_bit_p (const wide_int_ref
  void gt_ggc_mx (widest_int *) { }
  void gt_pch_nx (widest_int *, void (*) (void *, void *), void *) { }
  void gt_pch_nx (widest_int *) { }
+
+template void wide_int::dump () const;
+template void generic_wide_int  >::dump () const;
+template void generic_wide_int  >::dump () const;
+template void offset_int::dump () const;
+template void widest_int::dump () const;
Index: gcc/wide-int.h
===
--- gcc/wide-int.h  2014-04-25 14:34:54.823652619 +0100
+++ gcc/wide-int.h  2014-04-25 14:48:06.218229309 +0100
@@ -709,6 +709,9 @@ #define INCDEC_OPERATOR(OP, DELTA) \
  #undef ASSIGNMENT_OPERATOR
  #undef INCDEC_OPERATOR
  
+  /* Debugging functions.  */

+  void dump () const;
+
static const bool is_sign_extended
  = wi::int_traits  >::is_sign_extended;
  };
@@ -859,6 +862,23 @@ generic_wide_int ::operator = (
return *this;
  }
  
+/* Dump the contents of the integer to stderr, for debugging.  */

+template 
+void
+generic_wide_int ::dump () const
+{
+  unsigned int len = this->get_len ();
+  const HOST_WIDE_INT *val = this->get_val ();
+  unsigned int precision = this->get_precision ();
+  fprintf (stderr, "[");
+  if (len * HOST_BITS_PER_WIDE_INT < precision)
+fprintf (stderr, "...,");
+  for (unsigned int i = 0; i < len - 1; ++i)
+fprintf (stderr, HOST_WIDE_INT_PRINT_HEX ",", val[len - 1 - i]);
+  fprintf (stderr, HOST_WIDE_INT_PRINT_HEX "], precision = %d\n",
+  val[0], precision);
+}
+
  namespace wi
  {
template <>




Re: [wide-int 3/5] Fix large widths in shifted_mask

2014-04-25 Thread Kenneth Zadeck

richard,

I think that this patch is fine as is.but in looking at the 
surrounding code, i saw something that appears to be somewhat troubling.


I am worried about the two asserts.   Given that we now require that 
some users write code similar to the code in tree-vrp.c:2628, it seems 
that these asserts are only latent land mines.


kenny
On 04/25/2014 09:40 AM, Richard Sandiford wrote:

Very minor, but since shifted_mask copes with out-of-range widths,
I think mask should too.

Tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard


Index: gcc/wide-int.cc
===
--- gcc/wide-int.cc 2014-04-25 09:26:57.025944460 +0100
+++ gcc/wide-int.cc 2014-04-25 09:37:16.873811137 +0100
@@ -716,7 +716,7 @@ wi::mask (HOST_WIDE_INT *val, unsigned i
gcc_assert (width < 4 * MAX_BITSIZE_MODE_ANY_INT);
gcc_assert (prec <= 4 * MAX_BITSIZE_MODE_ANY_INT);
  
-  if (width == prec)

+  if (width >= prec)
  {
val[0] = negate ? 0 : -1;
return 1;




Re: [wide-int 1/5] Cosmetic fixes to wide-int.{cc,h}

2014-04-25 Thread Kenneth Zadeck

i see nothing in this patch that requires a review.
On 04/25/2014 09:35 AM, Richard Sandiford wrote:

This series of patches is from a read-through of wide-int.h and wide-int.cc.
(The series from earlier in the week was from a diff of preexisting files.)

This first patch fixes some comments, typos and formatting.  It also
removes some dead code.  I'll follow up with a gdb-friendly replacement
for the old dump function.

Tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard


Index: gcc/wide-int.cc
===
--- gcc/wide-int.cc 2014-04-25 11:33:57.809153807 +0100
+++ gcc/wide-int.cc 2014-04-25 12:33:59.202021025 +0100
@@ -37,12 +37,6 @@ typedef unsigned int UDItype __attribute
  #include "longlong.h"
  #endif
  
-/* This is the maximal size of the buffer needed for dump.  */

-const unsigned int MAX_SIZE = (4 * (MAX_BITSIZE_MODE_ANY_INT / 4
-   + (MAX_BITSIZE_MODE_ANY_INT
-  / HOST_BITS_PER_WIDE_INT)
-   + 32));
-
  static const HOST_WIDE_INT zeros[WIDE_INT_MAX_ELTS] = {};
  
  /*

@@ -51,7 +45,7 @@ static const HOST_WIDE_INT zeros[WIDE_IN
  
  /* Quantities to deal with values that hold half of a wide int.  Used

 in multiply and divide.  */
-#define HALF_INT_MASK (((HOST_WIDE_INT)1 << HOST_BITS_PER_HALF_WIDE_INT) - 1)
+#define HALF_INT_MASK (((HOST_WIDE_INT) 1 << HOST_BITS_PER_HALF_WIDE_INT) - 1)
  
  #define BLOCK_OF(TARGET) ((TARGET) / HOST_BITS_PER_WIDE_INT)

  #define BLOCKS_NEEDED(PREC) \
@@ -129,8 +123,8 @@ wi::from_array (HOST_WIDE_INT *val, cons
  
  /* Construct a wide int from a buffer of length LEN.  BUFFER will be

 read according to byte endianess and word endianess of the target.
-   Only the lower LEN bytes of the result are set; the remaining high
-   bytes are cleared.  */
+   Only the lower BUFFER_LEN bytes of the result are set; the remaining
+   high bytes are cleared.  */
  wide_int
  wi::from_buffer (const unsigned char *buffer, unsigned int buffer_len)
  {
@@ -212,7 +206,7 @@ wi::to_mpz (const wide_int_ref &x, mpz_t
  mpz_import (result, len, -1, sizeof (HOST_WIDE_INT), 0, 0, v);
  }
  
-/* Returns VAL converted to TYPE.  If WRAP is true, then out-of-range

+/* Returns X converted to TYPE.  If WRAP is true, then out-of-range
 values of VAL will be wrapped; otherwise, they will be set to the
 appropriate minimum or maximum TYPE bound.  */
  wide_int
@@ -243,8 +237,8 @@ wi::from_mpz (const_tree type, mpz_t x,
   for representing the value.  The code to calculate count is
   extracted from the GMP manual, section "Integer Import and Export":
   http://gmplib.org/manual/Integer-Import-and-Export.html  */
-  numb = 8*sizeof(HOST_WIDE_INT);
-  count = (mpz_sizeinbase (x, 2) + numb-1) / numb;
+  numb = 8 * sizeof(HOST_WIDE_INT);
+  count = (mpz_sizeinbase (x, 2) + numb - 1) / numb;
HOST_WIDE_INT *val = res.write_val ();
mpz_export (val, &count, -1, sizeof (HOST_WIDE_INT), 0, 0, x);
if (count < 1)
@@ -352,9 +346,8 @@ wi::force_to_size (HOST_WIDE_INT *val, c
 where we do allow comparisons of values of different precisions.  */
  static inline HOST_WIDE_INT
  selt (const HOST_WIDE_INT *a, unsigned int len,
- unsigned int blocks_needed,
- unsigned int small_prec,
- unsigned int index, signop sgn)
+  unsigned int blocks_needed, unsigned int small_prec,
+  unsigned int index, signop sgn)
  {
HOST_WIDE_INT val;
if (index < len)
@@ -388,7 +381,7 @@ top_bit_of (const HOST_WIDE_INT *a, unsi
  
  /*

   * Comparisons, note that only equality is an operator.  The other
- * comparisons cannot be operators since they are inherently singed or
+ * comparisons cannot be operators since they are inherently signed or
   * unsigned and C++ has no such operators.
   */
  
@@ -401,7 +394,7 @@ wi::eq_p_large (const HOST_WIDE_INT *op0

int l0 = op0len - 1;
unsigned int small_prec = prec & (HOST_BITS_PER_WIDE_INT - 1);
  
-  while (op0len != op1len)

+  if (op0len != op1len)
  return false;
  
if (op0len == BLOCKS_NEEDED (prec) && small_prec)

@@ -741,7 +734,7 @@ wi::mask (HOST_WIDE_INT *val, unsigned i
unsigned int shift = width & (HOST_BITS_PER_WIDE_INT - 1);
if (shift != 0)
  {
-  HOST_WIDE_INT last = (((unsigned HOST_WIDE_INT) 1) << shift) - 1;
+  HOST_WIDE_INT last = ((unsigned HOST_WIDE_INT) 1 << shift) - 1;
val[i++] = negate ? ~last : last;
  }
else
@@ -774,12 +767,12 @@ wi::shifted_mask (HOST_WIDE_INT *val, un
unsigned int shift = start & (HOST_BITS_PER_WIDE_INT - 1);
if (shift)
  {
-  HOST_WIDE_INT block = (((unsigned HOST_WIDE_INT) 1) << shift) - 1;
+  HOST_WIDE_INT block = ((unsigned HOST_WIDE_INT) 1 << shift) - 1;
shift = end & (HOST_BITS_PER_WIDE_INT - 1);
if (shift)
{
  /* case 000111000 */
- block = (((unsigned HOST_WIDE_INT) 1) << shift) - bl

Re: [wide-int] Fix signed min / -1 quotient

2014-04-24 Thread Kenneth Zadeck

This is fine with me.

kenny

On 04/24/2014 10:34 AM, Richard Sandiford wrote:

For signed min / -1 we set the overflow flag (good) but also returned a
quotient of 0.  It should be 0x80...0 instead.  Since that's also the
value of the original dividend, we can just copy the representation over.

The value for division by 0 is probably pretty arbitrary.  double-int.c
seems to treat it as division by one:

   if (hden == 0 && lden == 0)
 overflow = 1, lden = 1;

and while not important, it has the nice feature that here too we could
just copy the original dividend, rather than treat it differently from
signed min / -1.

This fixes libjava's Divide_1, which was the last remaining regression
on x86_64-linux-gnu.  I have a couple of other patches queued up, but
I'm still running tests to compare the asm output for the gcc and g++
testsuites on a range of targets.  (FWIW, the first round of these
tests picked up the same bug as Divide_1, in the output for
gcc.c-torture/compile/20010404-1.c.)

OK to install?

Thanks,
Richard


Index: gcc/wide-int.cc
===
--- gcc/wide-int.cc 2014-04-24 09:51:08.056774754 +0100
+++ gcc/wide-int.cc 2014-04-24 11:52:35.996541888 +0100
@@ -1726,8 +1726,10 @@ wi::divmod_internal (HOST_WIDE_INT *quot
&& wi::only_sign_bit_p (dividend))
  overflow = true;
  
-  /* If overflow is set, just get out.  There will only be grief by

- continuing.  */
+  /* Handle the overflow cases.  Viewed as unsigned value, the quotient of
+ (signed min / -1) has the same representation as the orignal dividend.
+ We have traditionally made division by zero act as division by one,
+ so there too we use the original dividend.  */
if (overflow)
  {
if (remainder)
@@ -1738,8 +1741,9 @@ wi::divmod_internal (HOST_WIDE_INT *quot
if (oflow != 0)
*oflow = true;
if (quotient)
-   quotient[0] = 0;
-  return 1;
+   for (unsigned int i = 0; i < dividend_len; ++i)
+ quotient[i] = dividend_val[i];
+  return dividend_len;
  }
  
if (oflow)






Re: [PATCH][RFC][wide-int] Fix some build errors on arm in wide-int branch and report ICE

2014-04-23 Thread Kenneth Zadeck

On 04/23/2014 10:36 AM, Richard Biener wrote:

On Wed, Apr 23, 2014 at 4:29 PM, Kenneth Zadeck
 wrote:

On 04/23/2014 05:47 AM, Richard Biener wrote:

On Tue, Apr 22, 2014 at 6:04 PM, Mike Stump  wrote:

On Apr 22, 2014, at 8:33 AM, Richard Sandiford
 wrote:

Kyrill Tkachov  writes:

Ping.
http://gcc.gnu.org/ml/gcc-patches/2014-04/msg00769.html
Any ideas? I recall chatter on IRC that we want to merge wide-int into
trunk
soon. Bootstrap failure on arm would prevent that...

Sorry for the late reply.  I hadn't forgotten, but I wanted to wait
until I had chance to look into the ICE before replying, which I haven't
had chance to do yet.

They are separable issues, so, I checked in the change.


It's a shame we can't use C++ style casts,
but I suppose that's the price to pay for being able to write
"unsigned HOST_WIDE_INT”.

unsigned_HOST_WIDE_INT isn’t horrible, but, yeah, my fingers were
expecting a typedef or better.  I slightly prefer the int (1) style, but I
think we should go the direction of the patch.

Well, on my list of things to try for 4.10 is to kill off HOST_WIDE_* and
require a 64bit integer type on the host and force all targets to use
a 64bit 'hwi'.  Thus, s/HOST_WIDE_INT/int64_t/ (and the appropriate
related changes).

Richard.

I should point out that there is a community that wants to go in the
opposite direction here.   They are the people with real 32 bit hosts who
want to go back to a world where they are allowed to make hwi a 32 bit
value.They have been waiting wide-int to be committed because they see
this as a way to get back to world where most of the math is done natively.

I am not part of this community but they feel that if the math that has the
potential to be big to be is done in wide-ints, then they can go back to
using a 32 bit hwi for everything else.For them, a wide-int built on 32
hwi's would be a win.

That wide-int builds on HWI is an implementation detail.  It can easily
be changed to build on int32_t.

Btw, what important target still supports a 32bit HWI?  None for what
I know.  Look at config.gcc and what does _not_ set need_64bit_hwint.
Even plain arm needs it.
I think that originally, hwi was supposed to be a natural integer on the 
host machine and it was corrupted to always be a 64 bit integer.


Right now, wide-int is built on hwis which are always 64 bits.On a 
32 bit machine, this means that there are two levels of abstraction to 
get to the hardware,  one to get from wide-int to 64 bits and one to get 
from 64 bits to 32 bits.


The easy part of converting wide-int to run natively on a 32 bit machine 
is going to be the internals of wide-int.  Of course until you test it 
you never know, but we did try very hard not to care about the internal 
size of the rep. The hard part will be the large number of places 
where wide-int converts to or from hwi. Some of those callers expect 
things to really be 64 bits and some of them deal with numbers that are 
always small enough to be implemented in the efficient native 
representation.  I think that the push against you is that the latter 
case should not be converted to int64_t.



Richard.


kenny




Re: [PATCH][RFC][wide-int] Fix some build errors on arm in wide-int branch and report ICE

2014-04-23 Thread Kenneth Zadeck

On 04/23/2014 05:47 AM, Richard Biener wrote:

On Tue, Apr 22, 2014 at 6:04 PM, Mike Stump  wrote:

On Apr 22, 2014, at 8:33 AM, Richard Sandiford  
wrote:

Kyrill Tkachov  writes:

Ping.
http://gcc.gnu.org/ml/gcc-patches/2014-04/msg00769.html
Any ideas? I recall chatter on IRC that we want to merge wide-int into trunk
soon. Bootstrap failure on arm would prevent that...

Sorry for the late reply.  I hadn't forgotten, but I wanted to wait
until I had chance to look into the ICE before replying, which I haven't
had chance to do yet.

They are separable issues, so, I checked in the change.


It's a shame we can't use C++ style casts,
but I suppose that's the price to pay for being able to write
"unsigned HOST_WIDE_INT”.

unsigned_HOST_WIDE_INT isn’t horrible, but, yeah, my fingers were expecting a 
typedef or better.  I slightly prefer the int (1) style, but I think we should 
go the direction of the patch.

Well, on my list of things to try for 4.10 is to kill off HOST_WIDE_* and
require a 64bit integer type on the host and force all targets to use
a 64bit 'hwi'.  Thus, s/HOST_WIDE_INT/int64_t/ (and the appropriate
related changes).

Richard.
I should point out that there is a community that wants to go in the 
opposite direction here.   They are the people with real 32 bit hosts 
who want to go back to a world where they are allowed to make hwi a 32 
bit value.They have been waiting wide-int to be committed because 
they see this as a way to get back to world where most of the math is 
done natively.


I am not part of this community but they feel that if the math that has 
the potential to be big to be is done in wide-ints, then they can go 
back to using a 32 bit hwi for everything else.For them, a wide-int 
built on 32 hwi's would be a win.


kenny


Re: [wide-int 5/8] Use LOG2_BITS_PER_UNIT

2014-04-22 Thread Kenneth Zadeck



On 04/22/2014 04:02 PM, Richard Sandiford wrote:

Looks like a few uses of the old idiom:

   BITS_PER_UNIT == 8 ? 3 : exact_log2 (BITS_PER_UNIT)

I do not think that these crept in as much as they were never squished out.


have crept in.  This patch replaces them with LOG2_BITS_PER_UNIT.

Tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard


Index: gcc/expr.c
===
--- gcc/expr.c  2014-04-22 20:58:26.969683484 +0100
+++ gcc/expr.c  2014-04-22 21:00:26.377614881 +0100
@@ -6801,8 +6801,7 @@ get_inner_reference (tree exp, HOST_WIDE
  if (!integer_zerop (off))
{
  offset_int boff, coff = mem_ref_offset (exp);
- boff = wi::lshift (coff, (BITS_PER_UNIT == 8
-   ? 3 : exact_log2 (BITS_PER_UNIT)));
+ boff = wi::lshift (coff, LOG2_BITS_PER_UNIT);
  bit_offset += boff;
}
  exp = TREE_OPERAND (TREE_OPERAND (exp, 0), 0);
@@ -6828,8 +6827,7 @@ get_inner_reference (tree exp, HOST_WIDE
  {
offset_int tem = wi::sext (wi::to_offset (offset),
 TYPE_PRECISION (sizetype));
-  tem = wi::lshift (tem, (BITS_PER_UNIT == 8
- ? 3 : exact_log2 (BITS_PER_UNIT)));
+  tem = wi::lshift (tem, LOG2_BITS_PER_UNIT);
tem += bit_offset;
if (wi::fits_shwi_p (tem))
{
@@ -6844,16 +6842,12 @@ get_inner_reference (tree exp, HOST_WIDE
/* Avoid returning a negative bitpos as this may wreak havoc later.  */
if (wi::neg_p (bit_offset))
  {
- offset_int mask
-   = wi::mask  (BITS_PER_UNIT == 8
-? 3 : exact_log2 (BITS_PER_UNIT),
-false);
+ offset_int mask = wi::mask  (LOG2_BITS_PER_UNIT, false);
  offset_int tem = bit_offset.and_not (mask);
  /* TEM is the bitpos rounded to BITS_PER_UNIT towards -Inf.
 Subtract it to BIT_OFFSET and add it (scaled) to OFFSET.  */
  bit_offset -= tem;
- tem = wi::arshift (tem, (BITS_PER_UNIT == 8
-  ? 3 : exact_log2 (BITS_PER_UNIT)));
+ tem = wi::arshift (tem, LOG2_BITS_PER_UNIT);
  offset = size_binop (PLUS_EXPR, offset,
   wide_int_to_tree (sizetype, tem));
}
Index: gcc/tree-dfa.c
===
--- gcc/tree-dfa.c  2014-04-22 20:58:27.020683881 +0100
+++ gcc/tree-dfa.c  2014-04-22 21:00:26.378614888 +0100
@@ -463,10 +463,7 @@ get_ref_base_and_extent (tree exp, HOST_
  {
offset_int tem = (wi::to_offset (ssize)
  - wi::to_offset (fsize));
-   if (BITS_PER_UNIT == 8)
- tem = wi::lshift (tem, 3);
-   else
- tem *= BITS_PER_UNIT;
+   tem = wi::lshift (tem, LOG2_BITS_PER_UNIT);
tem -= woffset;
maxsize += tem;
  }
@@ -583,8 +580,7 @@ get_ref_base_and_extent (tree exp, HOST_
  else
{
  offset_int off = mem_ref_offset (exp);
- off = wi::lshift (off, (BITS_PER_UNIT == 8
- ? 3 : exact_log2 (BITS_PER_UNIT)));
+ off = wi::lshift (off, LOG2_BITS_PER_UNIT);
  off += bit_offset;
  if (wi::fits_shwi_p (off))
{
Index: gcc/tree-ssa-alias.c
===
--- gcc/tree-ssa-alias.c2014-04-22 20:58:26.969683484 +0100
+++ gcc/tree-ssa-alias.c2014-04-22 21:00:26.378614888 +0100
@@ -1041,8 +1041,7 @@ indirect_ref_may_alias_decl_p (tree ref1
/* The offset embedded in MEM_REFs can be negative.  Bias them
   so that the resulting offset adjustment is positive.  */
offset_int moff = mem_ref_offset (base1);
-  moff = wi::lshift (moff, (BITS_PER_UNIT == 8
-   ? 3 : exact_log2 (BITS_PER_UNIT)));
+  moff = wi::lshift (moff, LOG2_BITS_PER_UNIT);
if (wi::neg_p (moff))
  offset2p += (-moff).to_short_addr ();
else
@@ -1118,8 +1117,7 @@ indirect_ref_may_alias_decl_p (tree ref1
|| TREE_CODE (dbase2) == TARGET_MEM_REF)
  {
offset_int moff = mem_ref_offset (dbase2);
-  moff = wi::lshift (moff, (BITS_PER_UNIT == 8
-   ? 3 : exact_log2 (BITS_PER_UNIT)));
+  moff = wi::lshift (moff, LOG2_BITS_PER_UNIT);
if (wi::neg_p (moff))
doffset1 -= (-moff).to_short_addr ();
else
@@ -1217,15 +1215,13 @@ indirect_refs_may_alias_p (tree ref1 ATT
/* The offset embedded in MEM_REF

[wide-int] fixed vector testcases.

2014-01-16 Thread Kenneth Zadeck
several test cases started failing as a result of making the size of the 
wide-int buffer smaller.
this patch fixes them.   This failure was unrelated to the wide-int 
buffer size directly, but a hard constant in the truck code was replaced 
by MAX_BITSIZE_MODE_ANY_INT when it should have been replaced by 
MAX_BITSIZE_MODE_ANY_MODE.


committed as revision 206689.

kenny
Index: gcc/simplify-rtx.c
===
--- gcc/simplify-rtx.c	(revision 206688)
+++ gcc/simplify-rtx.c	(working copy)
@@ -5396,7 +5396,7 @@ simplify_immed_subreg (enum machine_mode
 	case MODE_DECIMAL_FLOAT:
 	  {
 	REAL_VALUE_TYPE r;
-	long tmp[MAX_BITSIZE_MODE_ANY_INT / 32];
+	long tmp[MAX_BITSIZE_MODE_ANY_MODE / 32];
 
 	/* real_from_target wants its input in words affected by
 	   FLOAT_WORDS_BIG_ENDIAN.  However, we ignore this,


[wide-int] fixed several regressions in branch.

2014-01-14 Thread Kenneth Zadeck
This patch fixes what appears to have been a long standing failure in 
the conversion of tree-vect-generic.c:build_replicated_const. This 
failure caused several regressions on the branch.


Committed as revision 206616
Index: gcc/tree-vect-generic.c
===
--- gcc/tree-vect-generic.c	(revision 206609)
+++ gcc/tree-vect-generic.c	(working copy)
@@ -57,7 +57,8 @@ static tree
 build_replicated_const (tree type, tree inner_type, HOST_WIDE_INT value)
 {
   int width = tree_to_uhwi (TYPE_SIZE (inner_type));
-  int n = TYPE_PRECISION (type) / width;
+  int n = (TYPE_PRECISION (type) + HOST_BITS_PER_WIDE_INT - 1) 
+/ HOST_BITS_PER_WIDE_INT;
   unsigned HOST_WIDE_INT low, mask;
   HOST_WIDE_INT a[WIDE_INT_MAX_ELTS];
   int i;


Re: wide-int, tree-ssa

2014-01-03 Thread Kenneth Zadeck

On 11/26/2013 07:34 AM, Richard Biener wrote:

--- a/gcc/tree-ssa-ccp.c
+++ b/gcc/tree-ssa-ccp.c
@@ -98,6 +98,15 @@ along with GCC; see the file COPYING3.  If not see
 array CONST_VAL[i].VALUE.  That is fed into substitute_and_fold for
 final substitution and folding.

+   This algorithm uses wide-ints at the max precision of the target.
+   This means that, with one uninteresting exception, variables with
+   UNSIGNED types never go to VARYING because the bits above the
+   precision of the type of the variable are always zero.  The
+   uninteresting case is a variable of UNSIGNED type that has the
+   maximum precision of the target.  Such variables can go to VARYING,
+   but this causes no loss of infomation since these variables will
+   never be extended.
+

I don't understand this.  In CCP a SSA name drops to varying when
it's not constant.  How is that related to signedness or precision?!


Richi,

This is a bogosity that is inherited from the double-int code. Consider 
an unsigned int in double-int (or for that matter in widest-int).   It 
has a rep of a bunch of leading zeros followed by 32 bits of real 
stuff.Even if you know nothing about the value, it still has the 
upper bits of zero that you know are constant zeros.   Of course this is 
not true for signed numbers because the upper bits are smeared with the 
sign bits.


This was what i thought you were talking about when you argued many 
months ago when you said that infinite precision was more natural and 
that i would have trouble with a fixed precision based on the size of 
the type or mode and is the reason that tree-ssa-ccp uses widest int 
rather than wide-int.   I do actually believe that wide-int is more 
natural here.   However this bogosity does buy you a few constants.


My first cut at this pass used wide-int rather than widest int, but 
there are in fact constants that the pass finds that depend on this 
being done this way so to satisfy the bit for bit same code rule for 
values smaller than timode, so i left this the way that it was but 
decided to document it.


The proper way to do this is in fact to use wide-int, not widest-int but 
that would require major surgery to the code that converts values from 
one type to another.I plan to do this during the next stage 1.   At 
that time, i will also enhance the code to expand the sign bit if it is 
known for signed types.


Kenny


Re: wide-int, rtl

2014-01-02 Thread Kenneth Zadeck

On 01/02/2014 05:26 PM, Eric Botcazou wrote:

So, I'd like to ping the original patch and Kenny's patch to resolve the
issues you found.  If you have any other concerns or thoughts, let us
know.

Almost OK, but remove the strange quotes in the comment for the INTEGER_CST
case of expand_expr_real_1 (and optionally add the fast path for the common
case written down by Richard).


eric,

the fast path is not any faster.   that was why we did not do it. And by 
the time you add the tests it would be slower.The thing is that on 
the inside of the so called fast path, you are still converting the tree 
to wide-int first and that conversion still is checking the rest of 
these things.


the quotes will be removed before committing.

kenny


Re: wide-int more performance fixes for wide multiplication.

2013-12-16 Thread Kenneth Zadeck

On 12/16/2013 01:37 PM, Richard Biener wrote:

Kenneth Zadeck  wrote:

On 12/16/2013 09:37 AM, Richard Biener wrote:

Kenneth Zadeck  wrote:

On 12/16/2013 06:19 AM, Richard Biener wrote:

On 12/15/13 7:48 PM, Kenneth Zadeck wrote:

On 12/15/2013 11:40 AM, Richard Sandiford wrote:

Kenneth Zadeck  writes:

it is certainly true that in order to do an unbounded set of

operations,

you would have to check on every operation.   so my suggestion

that we

should remove the checking from the infinite precision would not

support

this. but the reality is that there are currently no places

in

the

compiler that do this.

Currently all of the uses of widest-int are one or two

operations,

and

the style of code writing is that you do these and then you deal

with

the overflow at the time that you convert the widest-int to a

tree.   I

think that it is important to maintain the style of programming

where

for a small finite number of computations do not need to check

until

they convert back.

The problem with making the buffer size so tight is that we do

not

have

an adequate reserves to allow this style for any supportable

type.

I personally think that 2x + some small n is what we need to

have.


i am not as familiar with how this is used (or to be used when

all

of

the offset math is converted to use wide-int), but there appear

to

be

two uses of multiply.one is the "harmless" mult by 3" and

the

other

is where people are trying to compute the size of arrays.

These

last

operations do need to be checked for overflow.The question

here is

do you want to force those operations to overflow individually

or

do you

want to check when you convert out.Again, i think 2x + some

small

number is what we might want to consider.

It's a fair question, but personally I think checking for

overflow

on the operation is much more robust.  Checking on conversion

doesn't

allow you to stop thinking about overflow, it just changes the

way

you

think about it: rather than handling explicit overflow flags, you

have

to remember to ask "is the range of the unconverted result within

the

range of widest_int", which I bet it is something that would be

easily

forgotten once widest_int & co. are part of the furniture.

E.g. the SPARC operation (picked only because I remember it):

 for (i = 0; i < VECTOR_CST_NELTS (arg0); ++i)
   {
 tree e0 = VECTOR_CST_ELT (arg0, i);
 tree e1 = VECTOR_CST_ELT (arg1, i);

 bool neg1_ovf, neg2_ovf, add1_ovf, add2_ovf;

 tmp = wi::neg (e1, &neg1_ovf);
 tmp = wi::add (e0, tmp, SIGNED, &add1_ovf);
 if (wi::neg_p (tmp))
   tmp = wi::neg (tmp, &neg2_ovf);
 else
   neg2_ovf = false;
 result = wi::add (result, tmp, SIGNED, &add2_ovf);
 overflow |= neg1_ovf | neg2_ovf | add1_ovf |

add2_ovf;

   }

 gcc_assert (!overflow);

 return wide_int_to_tree (rtype, result);

seems pretty natural.  If instead it was modelled as a widest_int
chain without overflow then it would be less obviously correct.

Thanks,
Richard

Let us for the sake of argument assume that this was common code

rather

than code in a particular port, because code in a particular port

can

know more about the environment than common code is allowed to.

My main point is that this code is in wide-int not widest-int

because at

this level the writer of this code actually wants to model what

the

target wants to do.   So doing the adds in precision and testing
overflow is perfectly fine at every step.But this loop CANNOT

be

written in a style where you tested the overflow at the end

because

if

this is common code you cannot make any assumptions about the

largest

mode on the machine. If the buffer was 2x + n in size, then it

would

be reasonably safe to assume that the number of elements in the

vector

could be represented in an integer and so you could wait till the

end.

I think that my point was that (and i feel a little uncomfortable
putting words in richi's mouth but i believe that this was his

point

early on) was that he thinks of the widest int as an infinite

precision

representation.he was the one who was pushing for the entire

rep

to

be done with a large internal (or perhaps unbounded) rep because

he

felt

that this was more natural to not have to think about overflow.

He

wanted you to be able to chain a mult and a divide and not see the
product get truncated before the divide was done.The rep that

we

have now really sucks with respect to this because widest int

truncates

if you are close to the largest precision on the machine and does

not if

you are small with respect to that.

My other point is that while you think that the example above is

nice,

the experience with double-int is contrary to this.   people will

say

(

Re: wide-int more performance fixes for wide multiplication.

2013-12-16 Thread Kenneth Zadeck

On 12/16/2013 09:37 AM, Richard Biener wrote:

Kenneth Zadeck  wrote:

On 12/16/2013 06:19 AM, Richard Biener wrote:

On 12/15/13 7:48 PM, Kenneth Zadeck wrote:

On 12/15/2013 11:40 AM, Richard Sandiford wrote:

Kenneth Zadeck  writes:

it is certainly true that in order to do an unbounded set of

operations,

you would have to check on every operation.   so my suggestion

that we

should remove the checking from the infinite precision would not

support

this. but the reality is that there are currently no places in

the

compiler that do this.

Currently all of the uses of widest-int are one or two operations,

and

the style of code writing is that you do these and then you deal

with

the overflow at the time that you convert the widest-int to a

tree.   I

think that it is important to maintain the style of programming

where

for a small finite number of computations do not need to check

until

they convert back.

The problem with making the buffer size so tight is that we do not

have

an adequate reserves to allow this style for any supportable type.
I personally think that 2x + some small n is what we need to have.


i am not as familiar with how this is used (or to be used when all

of

the offset math is converted to use wide-int), but there appear to

be

two uses of multiply.one is the "harmless" mult by 3" and the

other

is where people are trying to compute the size of arrays.These

last

operations do need to be checked for overflow.The question

here is

do you want to force those operations to overflow individually or

do you

want to check when you convert out.Again, i think 2x + some

small

number is what we might want to consider.

It's a fair question, but personally I think checking for overflow
on the operation is much more robust.  Checking on conversion

doesn't

allow you to stop thinking about overflow, it just changes the way

you

think about it: rather than handling explicit overflow flags, you

have

to remember to ask "is the range of the unconverted result within

the

range of widest_int", which I bet it is something that would be

easily

forgotten once widest_int & co. are part of the furniture.

E.g. the SPARC operation (picked only because I remember it):

for (i = 0; i < VECTOR_CST_NELTS (arg0); ++i)
  {
tree e0 = VECTOR_CST_ELT (arg0, i);
tree e1 = VECTOR_CST_ELT (arg1, i);

bool neg1_ovf, neg2_ovf, add1_ovf, add2_ovf;

tmp = wi::neg (e1, &neg1_ovf);
tmp = wi::add (e0, tmp, SIGNED, &add1_ovf);
if (wi::neg_p (tmp))
  tmp = wi::neg (tmp, &neg2_ovf);
else
  neg2_ovf = false;
result = wi::add (result, tmp, SIGNED, &add2_ovf);
overflow |= neg1_ovf | neg2_ovf | add1_ovf | add2_ovf;
  }

gcc_assert (!overflow);

return wide_int_to_tree (rtype, result);

seems pretty natural.  If instead it was modelled as a widest_int
chain without overflow then it would be less obviously correct.

Thanks,
Richard

Let us for the sake of argument assume that this was common code

rather

than code in a particular port, because code in a particular port

can

know more about the environment than common code is allowed to.

My main point is that this code is in wide-int not widest-int

because at

this level the writer of this code actually wants to model what the
target wants to do.   So doing the adds in precision and testing
overflow is perfectly fine at every step.But this loop CANNOT be
written in a style where you tested the overflow at the end because

if

this is common code you cannot make any assumptions about the

largest

mode on the machine. If the buffer was 2x + n in size, then it

would

be reasonably safe to assume that the number of elements in the

vector

could be represented in an integer and so you could wait till the

end.

I think that my point was that (and i feel a little uncomfortable
putting words in richi's mouth but i believe that this was his point
early on) was that he thinks of the widest int as an infinite

precision

representation.he was the one who was pushing for the entire rep

to

be done with a large internal (or perhaps unbounded) rep because he

felt

that this was more natural to not have to think about overflow.

He

wanted you to be able to chain a mult and a divide and not see the
product get truncated before the divide was done.The rep that we
have now really sucks with respect to this because widest int

truncates

if you are close to the largest precision on the machine and does

not if

you are small with respect to that.

My other point is that while you think that the example above is

nice,

the experience with double-int is contrary to this.   people will

say

(and test) the normal modes and anyone trying to use large modes

will

die a terrible death of a thousand cuts.

Re: wide-int more performance fixes for wide multiplication.

2013-12-16 Thread Kenneth Zadeck

On 12/16/2013 06:19 AM, Richard Biener wrote:

On 12/15/13 7:48 PM, Kenneth Zadeck wrote:

On 12/15/2013 11:40 AM, Richard Sandiford wrote:

Kenneth Zadeck  writes:

it is certainly true that in order to do an unbounded set of operations,
you would have to check on every operation.   so my suggestion that we
should remove the checking from the infinite precision would not support
this. but the reality is that there are currently no places in the
compiler that do this.

Currently all of the uses of widest-int are one or two operations, and
the style of code writing is that you do these and then you deal with
the overflow at the time that you convert the widest-int to a tree.   I
think that it is important to maintain the style of programming where
for a small finite number of computations do not need to check until
they convert back.

The problem with making the buffer size so tight is that we do not have
an adequate reserves to allow this style for any supportable type.
I personally think that 2x + some small n is what we need to have.


i am not as familiar with how this is used (or to be used when all of
the offset math is converted to use wide-int), but there appear to be
two uses of multiply.one is the "harmless" mult by 3" and the other
is where people are trying to compute the size of arrays.These last
operations do need to be checked for overflow.The question here is
do you want to force those operations to overflow individually or do you
want to check when you convert out.Again, i think 2x + some small
number is what we might want to consider.

It's a fair question, but personally I think checking for overflow
on the operation is much more robust.  Checking on conversion doesn't
allow you to stop thinking about overflow, it just changes the way you
think about it: rather than handling explicit overflow flags, you have
to remember to ask "is the range of the unconverted result within the
range of widest_int", which I bet it is something that would be easily
forgotten once widest_int & co. are part of the furniture.

E.g. the SPARC operation (picked only because I remember it):

   for (i = 0; i < VECTOR_CST_NELTS (arg0); ++i)
 {
   tree e0 = VECTOR_CST_ELT (arg0, i);
   tree e1 = VECTOR_CST_ELT (arg1, i);

   bool neg1_ovf, neg2_ovf, add1_ovf, add2_ovf;

   tmp = wi::neg (e1, &neg1_ovf);
   tmp = wi::add (e0, tmp, SIGNED, &add1_ovf);
   if (wi::neg_p (tmp))
 tmp = wi::neg (tmp, &neg2_ovf);
   else
 neg2_ovf = false;
   result = wi::add (result, tmp, SIGNED, &add2_ovf);
   overflow |= neg1_ovf | neg2_ovf | add1_ovf | add2_ovf;
 }

   gcc_assert (!overflow);

   return wide_int_to_tree (rtype, result);

seems pretty natural.  If instead it was modelled as a widest_int
chain without overflow then it would be less obviously correct.

Thanks,
Richard

Let us for the sake of argument assume that this was common code rather
than code in a particular port, because code in a particular port can
know more about the environment than common code is allowed to.

My main point is that this code is in wide-int not widest-int because at
this level the writer of this code actually wants to model what the
target wants to do.   So doing the adds in precision and testing
overflow is perfectly fine at every step.But this loop CANNOT be
written in a style where you tested the overflow at the end because if
this is common code you cannot make any assumptions about the largest
mode on the machine. If the buffer was 2x + n in size, then it would
be reasonably safe to assume that the number of elements in the vector
could be represented in an integer and so you could wait till the end.

I think that my point was that (and i feel a little uncomfortable
putting words in richi's mouth but i believe that this was his point
early on) was that he thinks of the widest int as an infinite precision
representation.he was the one who was pushing for the entire rep to
be done with a large internal (or perhaps unbounded) rep because he felt
that this was more natural to not have to think about overflow. He
wanted you to be able to chain a mult and a divide and not see the
product get truncated before the divide was done.The rep that we
have now really sucks with respect to this because widest int truncates
if you are close to the largest precision on the machine and does not if
you are small with respect to that.

My other point is that while you think that the example above is nice,
the experience with double-int is contrary to this.   people will say
(and test) the normal modes and anyone trying to use large modes will
die a terrible death of a thousand cuts.

Well - the cases that matter in practice are

1) the things we have offset_int for - code that does bit vs. byte
quantity calculations on addresses or address of

Re: wide-int more performance fixes for wide multiplication.

2013-12-15 Thread Kenneth Zadeck


On 12/15/2013 11:40 AM, Richard Sandiford wrote:

Kenneth Zadeck  writes:

it is certainly true that in order to do an unbounded set of operations,
you would have to check on every operation.   so my suggestion that we
should remove the checking from the infinite precision would not support
this. but the reality is that there are currently no places in the
compiler that do this.

Currently all of the uses of widest-int are one or two operations, and
the style of code writing is that you do these and then you deal with
the overflow at the time that you convert the widest-int to a tree.   I
think that it is important to maintain the style of programming where
for a small finite number of computations do not need to check until
they convert back.

The problem with making the buffer size so tight is that we do not have
an adequate reserves to allow this style for any supportable type.
I personally think that 2x + some small n is what we need to have.


i am not as familiar with how this is used (or to be used when all of
the offset math is converted to use wide-int), but there appear to be
two uses of multiply.one is the "harmless" mult by 3" and the other
is where people are trying to compute the size of arrays.These last
operations do need to be checked for overflow.The question here is
do you want to force those operations to overflow individually or do you
want to check when you convert out.Again, i think 2x + some small
number is what we might want to consider.

It's a fair question, but personally I think checking for overflow
on the operation is much more robust.  Checking on conversion doesn't
allow you to stop thinking about overflow, it just changes the way you
think about it: rather than handling explicit overflow flags, you have
to remember to ask "is the range of the unconverted result within the
range of widest_int", which I bet it is something that would be easily
forgotten once widest_int & co. are part of the furniture.

E.g. the SPARC operation (picked only because I remember it):

  for (i = 0; i < VECTOR_CST_NELTS (arg0); ++i)
{
  tree e0 = VECTOR_CST_ELT (arg0, i);
  tree e1 = VECTOR_CST_ELT (arg1, i);

  bool neg1_ovf, neg2_ovf, add1_ovf, add2_ovf;

  tmp = wi::neg (e1, &neg1_ovf);
  tmp = wi::add (e0, tmp, SIGNED, &add1_ovf);
  if (wi::neg_p (tmp))
tmp = wi::neg (tmp, &neg2_ovf);
  else
neg2_ovf = false;
  result = wi::add (result, tmp, SIGNED, &add2_ovf);
  overflow |= neg1_ovf | neg2_ovf | add1_ovf | add2_ovf;
}

  gcc_assert (!overflow);

  return wide_int_to_tree (rtype, result);

seems pretty natural.  If instead it was modelled as a widest_int
chain without overflow then it would be less obviously correct.

Thanks,
Richard
Let us for the sake of argument assume that this was common code rather 
than code in a particular port, because code in a particular port can 
know more about the environment than common code is allowed to.


My main point is that this code is in wide-int not widest-int because at 
this level the writer of this code actually wants to model what the 
target wants to do.   So doing the adds in precision and testing 
overflow is perfectly fine at every step.But this loop CANNOT be 
written in a style where you tested the overflow at the end because if 
this is common code you cannot make any assumptions about the largest 
mode on the machine. If the buffer was 2x + n in size, then it would 
be reasonably safe to assume that the number of elements in the vector 
could be represented in an integer and so you could wait till the end.


I think that my point was that (and i feel a little uncomfortable 
putting words in richi's mouth but i believe that this was his point 
early on) was that he thinks of the widest int as an infinite precision 
representation.he was the one who was pushing for the entire rep to 
be done with a large internal (or perhaps unbounded) rep because he felt 
that this was more natural to not have to think about overflow. He 
wanted you to be able to chain a mult and a divide and not see the 
product get truncated before the divide was done.The rep that we 
have now really sucks with respect to this because widest int truncates 
if you are close to the largest precision on the machine and does not if 
you are small with respect to that.


My other point is that while you think that the example above is nice, 
the experience with double-int is contrary to this.   people will say 
(and test) the normal modes and anyone trying to use large modes will 
die a terrible death of a thousand cuts.






Re: wide-int more performance fixes for wide multiplication.

2013-12-15 Thread Kenneth Zadeck


On 12/15/2013 03:54 AM, Richard Sandiford wrote:

Kenneth Zadeck  writes:

The current world
is actually structured so that we never ask about overflow for the two
larger classes because the reason that you used those classes was that
you never wanted to have this discussion. So if you never ask about
overflow, then it really does not matter because we are not going to
return enough bits for you to care what happened on the inside.  Of
course that could change and someone could say that they wanted overflow
on widest-int.   Then the comment makes sense, with revisions, unless
your review of the code that wants overflow on widest int suggests that
they are just being stupid.

But widest_int is now supposed to be at least 1 bit wider than widest
input type (unlike previously where it was double the widest input type).
So I can definitely see cases where we'd want to know whether a
widest_int * widest_int result overflows.

My point is that the widest_int * widest_int would normally be a signed
multiplication rather than an unsigned multiplication, since the extra
1 bit of precision allows every operation to be signed.  So it isn't
a case of whether the top bit of a widest_int will be set, but whether
we ever reach here for widest_int in the first place.  We should be
going down the sgn == SIGNED path rather than the sgn == UNSIGNED path.

widest_int can represent an all-1s value, usually interpreted as -1.
If we do go down this sgn == UNSIGNED path for widest_int then we will
instead treat the all-1s value as the maximum unsigned number, just like
for any other kind of wide int.

As far as this function goes there really is no difference between
wide_int, offset_int and widest_int.  Which is good, because offset_int
and widest_int should just be wide_ints that are optimised for a specific
and fixed precision.

Thanks,
Richard

I am now seriously regretting letting richi talk me into changing the
size of the wide int buffer from being 2x of the largest mode on the
machine.   It was a terrible mistake AND i would guess making it smaller
does not provide any real benefit.

The problem is that when you use widest-int (and by analogy offset int)
it should NEVER EVER overflow.  Furthermore we need to change the
interfaces for these two so that you cannot even ask!!(i do not
believe that anyone does ask so the change would be small.)

offset_int * offset_int could overflow too, at least in the sense that
there are combinations of valid offset_ints whose product can't be
represented in an offset_int.  E.g. (1ULL << 67) * (1ULL << 67).
I think that was always the case.


see answer below.



There are a huge set of bugs on the trunk that are "fixed" with wide-int
because people wrote code for double-int thinking that it was infinite
precision.So they never tested the cases of what happens when the
size of the variable needed two HWIs.   Most of those cases were
resolved by making passes like tree-vrp use wide-int and then being
explicit about the overflow on every operation, because with wide-int
the issue is in your face since things overflow even for 32 bit
numbers.  However, with the current widest-int, we will only be safe for
add and subtract by adding the extra bit.  In multiply we are exposed.
The perception is that widest-int is a good as infinite precision and no
one will ever write the code to check if it overflowed because it only
rarely happens.

All operations can overflow.  We would need 2 extra bits rather than 1
extra bit to stop addition overflowing, because the 1 extra bit we already
have is to allow unsigned values to be treated as signed.  But 2 extra bits
is only good for one addition, not a chain of two additions.

That's why ignoring overflow seems dangerous to me.  The old wide-int
way might have allowed any x * y to be represented, but if nothing
checked whether x * y was bigger than expected then x * y + z could
overflow.

Thanks,
Richard


it is certainly true that in order to do an unbounded set of operations, 
you would have to check on every operation.   so my suggestion that we 
should remove the checking from the infinite precision would not support 
this. but the reality is that there are currently no places in the 
compiler that do this.


Currently all of the uses of widest-int are one or two operations, and 
the style of code writing is that you do these and then you deal with 
the overflow at the time that you convert the widest-int to a tree.   I 
think that it is important to maintain the style of programming where 
for a small finite number of computations do not need to check until 
they convert back.


The problem with making the buffer size so tight is that we do not have 
an adequate reserves to allow this style for any supportable type.   
I personally think that 2x + some small n is what we need to have.



i am not as familiar with how this is used (or to be used when all of 
the offset math is converte

Re: wide-int more performance fixes for wide multiplication.

2013-12-14 Thread Kenneth Zadeck



+  vallen = canonize (val, (uvlen + 1) >> 1, prec);
+
+  /* Shift is not always safe to write over one of the
+operands, so we must copy.  */
+  HOST_WIDE_INT tval[2 * WIDE_INT_MAX_ELTS];
+  memcpy (tval, val, vallen * CHAR_BIT / HOST_BITS_PER_WIDE_INT);




vallen * sizeof (HOST_WIDE_INT) would be more typical.
But why not unpack into tval directly and avoid the copy?
I could special case this, but the old code was not correct for odd 
precisions.





Re: wide-int more performance fixes for wide multiplication.

2013-12-14 Thread Kenneth Zadeck


On 12/14/2013 09:30 AM, Richard Sandiford wrote:

+  /* True if the result needs to be negated.  */
+  bool is_neg = false;
   
 /* If the top level routine did not really pass in an overflow, then

just make sure that we never attempt to set it.  */
 bool needs_overflow = (overflow != 0);
+  const HOST_WIDE_INT zero = 0;
 if (needs_overflow)
   *overflow = false;
   
@@ -1382,61 +1392,96 @@ wi::mul_internal (HOST_WIDE_INT *val, co

 return 1;
   }
   
-  /* We do unsigned mul and then correct it.  */

-  wi_unpack (u, (const unsigned HOST_WIDE_INT *) op1val, op1len,
-half_blocks_needed, prec, SIGNED);
-  wi_unpack (v, (const unsigned HOST_WIDE_INT *) op2val, op2len,
-half_blocks_needed, prec, SIGNED);
-
-  /* The 2 is for a full mult.  */
-  memset (r, 0, half_blocks_needed * 2
- * HOST_BITS_PER_HALF_WIDE_INT / CHAR_BIT);
+  ulen = op1len * 2;
+  vlen = op2len * 2;
+
+  if (sgn == SIGNED)
+{
+  if (wi::neg_p (op1))
+   {
+ HOST_WIDE_INT nop1[2 * WIDE_INT_MAX_ELTS];
+   

E.g. following on from the above:

  /* Add an extra HWI so that we can represent the negative of
 the minimum.  */
  HOST_WIDE_INT *nop1 = XALLOCAVEC (HOST_WIDE_INT, op1len + 1);


+ int nop1len
+ = wi::sub_large (nop1, &zero, 1, op1val, op1len, prec + 1, SIGNED,
0);

Not sure about the "prec + 1" here, since it could cause nop1len to be
greater than the maximum length for "prec".  And you go on to unpack
nop1 in "prec" rather than "prec + 1".

the unpacking with prec is the wrong part.  That should have been prec + 1

Hmm, I still think it's the opposite, because...


AIUI, once you've taken the absolutes, you've got two unsigned numbers
of precision "prec" and create a "prec * 2" result, so naybe:
sub_large (..., prec, UNSIGNED, 0) instead?

the signed and unsigned does not matter here.the prec + 1 means we
never overflow.

...my point is that after this we treat the number as unsigned,
so it isn't a question of whether the absolute value overflows
the precision as a signed number but whether it overflows as an
unsigned number.  And it never will.  It might need 1 extra HWI
than the original input if op1len < blocks_needed, but it shouldn't
need extra precision.

E.g. say for 8-bit HWIs we start out with a signed -0x80 * -0x80.
We convert that into an unsigned 0x80 * 0x80, which is still
precision 8 * precision 8 -> precision 16.  And we'd need to be
able to handle those precision-8 values correctly (with no extra
zero HWI) if we had an unsigned 0x80 * 0x80 from the outset.
it will not work like that. You are thinking that i really can do an 
8 bit X 8 bit multiply.I cannot.
The case that i am worried about is 115 bits x 115 bits.in this case 
i have to make sure that the top bits of the top block are provisioned 
properly.If you think about your case above where bit 114 is set and 
the bits below it are zeros, then i need to invert it so that the bit 
115 and above are zero. otherwise i get the top of the block 
polluted with the 1 bits which would mess up the top block multiplied by 
any of the lower blocks.


however, you are, at a deeper level correct.if i change the parms to 
wi_unpack to UNSIGNED then it is likely you are correct.I will think 
this through today and see if i can find a counter example.but it 
may be true that if i invert the number, i always want the top bits of 
the block to be 0 which is what the sign parameter controls.



+ is_neg = !is_neg;
+ ulen = nop1len * 2;
+ wi_unpack (u, (const unsigned HOST_WIDE_INT*)nop1, nop1len,
+ulen, prec, SIGNED);

Back on the alloca theme, we'd have:

  u = XALLOCAVEC (unsigned HOST_HALF_WIDE_INT, ulen);

before this and other unpacks.

Note that there's supposed to be space after ")" in casts.  Lots of
instances in the patch :-)

i actually did not know this.   i will fix.



+  /* UNSIGNED mul. */
+  /* For large positive integers inside regular wide-ints, we must
+explictly expand out the 1s to full width for the mul to be
+correct. Note that the top bit will never be set for a
+unsigned number in offset_int of widest-int.  */

s/of/or/.  But I think the comment is confusing because the unsignedness
is a property of the multiplication rather than the inputs.  We should never
be doing an unsigned operation on widest_int to begin with.  And if
offset_ints are being treated as having a sign then the same is true there.

If we ever do have an unsigned multiplication on those types for some reason
then -1 will (rightly) be treated as the maximum unsigned value.

The real question here is "do you want to do an assert or do you want to
explain what happens if the input comes in that way?"






Re: wide-int more performance fixes for wide multiplication.

2013-12-14 Thread Kenneth Zadeck



The current world
is actually structured so that we never ask about overflow for the two
larger classes because the reason that you used those classes was that
you never wanted to have this discussion. So if you never ask about
overflow, then it really does not matter because we are not going to
return enough bits for you to care what happened on the inside.  Of
course that could change and someone could say that they wanted overflow
on widest-int.   Then the comment makes sense, with revisions, unless
your review of the code that wants overflow on widest int suggests that
they are just being stupid.

But widest_int is now supposed to be at least 1 bit wider than widest
input type (unlike previously where it was double the widest input type).
So I can definitely see cases where we'd want to know whether a
widest_int * widest_int result overflows.

My point is that the widest_int * widest_int would normally be a signed
multiplication rather than an unsigned multiplication, since the extra
1 bit of precision allows every operation to be signed.  So it isn't
a case of whether the top bit of a widest_int will be set, but whether
we ever reach here for widest_int in the first place.  We should be
going down the sgn == SIGNED path rather than the sgn == UNSIGNED path.

widest_int can represent an all-1s value, usually interpreted as -1.
If we do go down this sgn == UNSIGNED path for widest_int then we will
instead treat the all-1s value as the maximum unsigned number, just like
for any other kind of wide int.

As far as this function goes there really is no difference between
wide_int, offset_int and widest_int.  Which is good, because offset_int
and widest_int should just be wide_ints that are optimised for a specific
and fixed precision.

Thanks,
Richard
I am now seriously regretting letting richi talk me into changing the 
size of the wide int buffer from being 2x of the largest mode on the 
machine.   It was a terrible mistake AND i would guess making it smaller 
does not provide any real benefit.


The problem is that when you use widest-int (and by analogy offset int) 
it should NEVER EVER overflow.  Furthermore we need to change the 
interfaces for these two so that you cannot even ask!!(i do not 
believe that anyone does ask so the change would be small.)


There are a huge set of bugs on the trunk that are "fixed" with wide-int 
because people wrote code for double-int thinking that it was infinite 
precision.So they never tested the cases of what happens when the 
size of the variable needed two HWIs.   Most of those cases were 
resolved by making passes like tree-vrp use wide-int and then being 
explicit about the overflow on every operation, because with wide-int 
the issue is in your face since things overflow even for 32 bit 
numbers.  However, with the current widest-int, we will only be safe for 
add and subtract by adding the extra bit.  In multiply we are exposed.   
The perception is that widest-int is a good as infinite precision and no 
one will ever write the code to check if it overflowed because it only 
rarely happens.






Re: wide-int more performance fixes for wide multiplication.

2013-12-14 Thread Kenneth Zadeck


On 12/14/2013 06:40 AM, Richard Sandiford wrote:

Hi Kenny,

Sorry for the slow response.

Kenneth Zadeck  writes:

Index: gcc/wide-int.cc
===
--- gcc/wide-int.cc (revision 205765)
+++ gcc/wide-int.cc (working copy)
@@ -1275,23 +1275,31 @@ wi::mul_internal (HOST_WIDE_INT *val, co
unsigned int j;
unsigned int blocks_needed = BLOCKS_NEEDED (prec);
unsigned int half_blocks_needed = blocks_needed * 2;
-  /* The sizes here are scaled to support a 2x largest mode by 2x
- largest mode yielding a 4x largest mode result.  This is what is
- needed by vpn.  */
  
+  /* The sizes here are scaled to support a 2*largest mode + a little

+ by 2*largest mode + a little yielding a 4*largest mode + a
+ little.  This is what is needed by vpn.  */
unsigned HOST_HALF_WIDE_INT
-u[4 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_HALF_WIDE_INT];
+u[2 * WIDE_INT_MAX_PRECISION / HOST_BITS_PER_HALF_WIDE_INT];
+  unsigned int ulen;
unsigned HOST_HALF_WIDE_INT
-v[4 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_HALF_WIDE_INT];
-  /* The '2' in 'R' is because we are internally doing a full
+v[2 * WIDE_INT_MAX_PRECISION / HOST_BITS_PER_HALF_WIDE_INT];
+  unsigned int vlen;
+  unsigned int uvlen;
+  unsigned int vallen;
+
+  /* The '4' in 'R' is because we are internally doing a wide
   multiply.  */
unsigned HOST_HALF_WIDE_INT
-r[2 * 4 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_HALF_WIDE_INT];
-  HOST_WIDE_INT mask = ((HOST_WIDE_INT)1 << HOST_BITS_PER_HALF_WIDE_INT) - 1;
+r[4 * WIDE_INT_MAX_PRECISION / HOST_BITS_PER_HALF_WIDE_INT];

While we're here I think we should convert these arrays into allocas,
so that we're not hard-coding the specialness of vpn in wide-int.cc.
I.e. rather than having arrays of maximum size, use XALLOCAVEC to
allocate a specific number of stack HWIs, once we know how many
we actually need.  That includes the new arrays added in the patch.
I had thought of doing this but there is a part of me that has always 
thought of this as being more expensive, but in the scheme of things it 
is just noise here, so i will do it.



+  /* True if the result needs to be negated.  */
+  bool is_neg = false;
  
/* If the top level routine did not really pass in an overflow, then

   just make sure that we never attempt to set it.  */
bool needs_overflow = (overflow != 0);
+  const HOST_WIDE_INT zero = 0;
if (needs_overflow)
  *overflow = false;
  
@@ -1382,61 +1392,96 @@ wi::mul_internal (HOST_WIDE_INT *val, co

return 1;
  }
  
-  /* We do unsigned mul and then correct it.  */

-  wi_unpack (u, (const unsigned HOST_WIDE_INT *) op1val, op1len,
-half_blocks_needed, prec, SIGNED);
-  wi_unpack (v, (const unsigned HOST_WIDE_INT *) op2val, op2len,
-half_blocks_needed, prec, SIGNED);
-
-  /* The 2 is for a full mult.  */
-  memset (r, 0, half_blocks_needed * 2
- * HOST_BITS_PER_HALF_WIDE_INT / CHAR_BIT);
+  ulen = op1len * 2;
+  vlen = op2len * 2;
+
+  if (sgn == SIGNED)
+{
+  if (wi::neg_p (op1))
+   {
+ HOST_WIDE_INT nop1[2 * WIDE_INT_MAX_ELTS];
+   

E.g. following on from the above:

  /* Add an extra HWI so that we can represent the negative of
 the minimum.  */
  HOST_WIDE_INT *nop1 = XALLOCAVEC (HOST_WIDE_INT, op1len + 1);


+ int nop1len
+   = wi::sub_large (nop1, &zero, 1, op1val, op1len, prec + 1, SIGNED, 
0);

Not sure about the "prec + 1" here, since it could cause nop1len to be
greater than the maximum length for "prec".  And you go on to unpack
nop1 in "prec" rather than "prec + 1".

the unpacking with prec is the wrong part.  That should have been prec + 1


AIUI, once you've taken the absolutes, you've got two unsigned numbers
of precision "prec" and create a "prec * 2" result, so naybe:
sub_large (..., prec, UNSIGNED, 0) instead?


the signed and unsigned does not matter here.the prec + 1 means we 
never overflow.

+ is_neg = !is_neg;
+ ulen = nop1len * 2;
+ wi_unpack (u, (const unsigned HOST_WIDE_INT*)nop1, nop1len,
+ulen, prec, SIGNED);

Back on the alloca theme, we'd have:

  u = XALLOCAVEC (unsigned HOST_HALF_WIDE_INT, ulen);

before this and other unpacks.

Note that there's supposed to be space after ")" in casts.  Lots of
instances in the patch :-)


+  /* UNSIGNED mul. */
+  /* For large positive integers inside regular wide-ints, we must
+explictly expand out the 1s to full width for the mul to be
+correct. Note that the top bit will never be set for a
+unsigned number in offset_int of widest-int.  */

s/of/or/.  But I think the comment is confusing because the unsignedness
is a property of the multiplication 

[trunk] added missing parenthesis to #defines generated by gen-modes.c

2013-12-13 Thread Kenneth Zadeck

committed as revision 205970 as obvious.

kenny
Index: gcc/genmodes.c
===
--- gcc/genmodes.c	(revision 205967)
+++ gcc/genmodes.c	(working copy)
@@ -891,7 +891,7 @@ emit_max_int (void)
 	  max = i->bytesize;
   if (max > mmax)
 	mmax = max;
-  printf ("#define MAX_BITSIZE_MODE_ANY_INT %d*BITS_PER_UNIT\n", mmax);
+  printf ("#define MAX_BITSIZE_MODE_ANY_INT (%d*BITS_PER_UNIT)\n", mmax);
 }
   else
 printf ("#define MAX_BITSIZE_MODE_ANY_INT %d\n", max_bitsize_mode_any_int);
@@ -901,7 +901,7 @@ emit_max_int (void)
 for (i = modes[j]; i; i = i->next)
   if (mmax < i->bytesize)
 	mmax = i->bytesize;
-  printf ("#define MAX_BITSIZE_MODE_ANY_MODE %d*BITS_PER_UNIT\n", mmax);
+  printf ("#define MAX_BITSIZE_MODE_ANY_MODE (%d*BITS_PER_UNIT)\n", mmax);
 }
 
 static void


Re: [trunk]: Patch to move BITS_PER_UNIT to be available for genmodes.c

2013-12-13 Thread Kenneth Zadeck

committed as revision 205964 with updated comment.

kenny

2013-12-13  Kenneth Zadeck  

* config/arc/arc.h (BITS_PER_UNIT): Removed.
* config/bfin/bfin.h (BITS_PER_UNIT): Removed.
* config/lm32/lm32.h (BITS_PER_UNIT): Removed.
* config/m32c/m32c.h (BITS_PER_UNIT): Removed.
* config/microblaze/microblaze.h (BITS_PER_UNIT): Removed.
* config/picochip/picochip.h (BITS_PER_UNIT): Removed.
* config/spu/spu.h (BITS_PER_UNIT): Removed.
* defaults.h (BITS_PER_UNIT): Removed.
* config/i386/i386-modes.def (MAX_BITSIZE_MODE_ANY_INT): New.
* doc/rtl (BITS_PER_UNIT): Moved from tm.texi.
(MAX_BITSIZE_MODE_ANY_INT): Updated.
* doc/tm.texi (BITS_PER_UNIT): Removed.
* doc/tm.texi.in (BITS_PER_UNIT): Removed.
* genmodes.c (bits_per_unit, max_bitsize_mode_any_int): New.
(create_modes): Added code to set bits_per_unit and
max_bitsize_mode_any_int.
(emit_max_int): Changed code generation.
* mkconfig.sh: Added insn-modes.h.


On 12/13/2013 06:11 AM, Richard Biener wrote:

On Fri, 13 Dec 2013, Uros Bizjak wrote:


Hello!


In addition, this target also changes the way that MAX_BITSIZE_MODE_ANY_INT is 
calculated.
The value is heavily used on the wide-int branch to allocate buffers that are 
used to hold large
integer values. The change in the way it is computed was motivated by the i386 
port, but there
may be other ports that have the same problem. The i386 port defines two very 
large integer
modes that are only used as containers for large vectors. They are never used 
for large integers.
The new way of computing this allows a port to say (very early) that some of 
these integer modes
are never used to hold numbers and so smaller buffers can be used for integer 
calculations. Other
ports that play the same game should follow suit.

Index: gcc/config/i386/i386-modes.def
===
--- gcc/config/i386/i386-modes.def (revision 205895)
+++ gcc/config/i386/i386-modes.def (working copy)
@@ -90,5 +90,10 @@ VECTOR_MODE (INT, QI, 2); /*
  INT_MODE (OI, 32);
  INT_MODE (XI, 64);

+/* Keep the OI and XI modes from confusing the compiler into thinking
+   that these modes could actually be used for computation.  They are
+   only holders for vectors during data movement.  */
+#define MAX_BITSIZE_MODE_ANY_INT (128)
+
  /* The symbol Pmode stands for one of the above machine modes (usually 
SImode).
 The tm.h file specifies which one.  It is not a distinct mode.  */

__int128 is avaialble only for x86_64 - 64bit targets, so:

#define MAX_BITSIZE_MODE_ANY_INT (TARGET_64BIT ? 128 : 64)

It needs to be a compile-time constant.

Richard.


Index: gcc/ChangeLog
===
--- gcc/ChangeLog	(revision 205962)
+++ gcc/ChangeLog	(working copy)
@@ -1,3 +1,24 @@
+2013-12-13  Kenneth Zadeck  
+
+	* config/arc/arc.h (BITS_PER_UNIT): Removed.
+	* config/bfin/bfin.h (BITS_PER_UNIT): Removed.
+	* config/lm32/lm32.h (BITS_PER_UNIT): Removed.
+	* config/m32c/m32c.h (BITS_PER_UNIT): Removed.
+	* config/microblaze/microblaze.h (BITS_PER_UNIT): Removed.
+	* config/picochip/picochip.h (BITS_PER_UNIT): Removed.
+	* config/spu/spu.h (BITS_PER_UNIT): Removed.
+	* defaults.h (BITS_PER_UNIT): Removed.
+	* config/i386/i386-modes.def (MAX_BITSIZE_MODE_ANY_INT): New.
+	* doc/rtl (BITS_PER_UNIT): Moved from tm.texi.
+	(MAX_BITSIZE_MODE_ANY_INT): Updated.
+	* doc/tm.texi (BITS_PER_UNIT): Removed.
+	* doc/tm.texi.in (BITS_PER_UNIT): Removed.
+	* genmodes.c (bits_per_unit, max_bitsize_mode_any_int): New.
+	(create_modes): Added code to set bits_per_unit and
+	max_bitsize_mode_any_int.
+	(emit_max_int): Changed code generation.
+	* mkconfig.sh: Added insn-modes.h.
+
 2013-12-13  Yuri Rumyantsev  
 
 	* config/i386/i386.c (slm_cost): Fix imul cost for HI.
Index: gcc/config/arc/arc.h
===
--- gcc/config/arc/arc.h	(revision 205962)
+++ gcc/config/arc/arc.h	(working copy)
@@ -303,9 +303,6 @@ along with GCC; see the file COPYING3.
numbered.  */
 #define WORDS_BIG_ENDIAN (TARGET_BIG_ENDIAN)
 
-/* Number of bits in an addressable storage unit.  */
-#define BITS_PER_UNIT 8
-
 /* Width in bits of a "word", which is the contents of a machine register.
Note that this is not necessarily the width of data type `int';
if using 16-bit ints on a 68000, this would still be 32.
Index: gcc/config/bfin/bfin.h
===
--- gcc/config/bfin/bfin.h	(revision 205962)
+++ gcc/config/bfin/bfin.h	(working copy)
@@ -859,9 +859,6 @@ typedef struct {
 /* Define this if most significant word of a multiword number is numbered. */
 #define WORDS_BIG_ENDIAN 0
 
-/* number of bits in an addressable storage unit */
-#define BITS_PER_UNIT 8
-
 /* Width in bits of a "word", which is the contents of a machine register.
Note that this is not necessarily th

Re: [trunk]: Patch to move BITS_PER_UNIT to be available for genmodes.c

2013-12-13 Thread Kenneth Zadeck




On Dec 13, 2013, at 5:11 AM, Richard Biener  wrote:

> On Fri, 13 Dec 2013, Uros Bizjak wrote:
> 
>> Hello!
>> 
>>> In addition, this target also changes the way that MAX_BITSIZE_MODE_ANY_INT 
>>> is calculated.
>>> The value is heavily used on the wide-int branch to allocate buffers that 
>>> are used to hold large
>>> integer values. The change in the way it is computed was motivated by the 
>>> i386 port, but there
>>> may be other ports that have the same problem. The i386 port defines two 
>>> very large integer
>>> modes that are only used as containers for large vectors. They are never 
>>> used for large integers.
>>> The new way of computing this allows a port to say (very early) that some 
>>> of these integer modes
>>> are never used to hold numbers and so smaller buffers can be used for 
>>> integer calculations. Other
>>> ports that play the same game should follow suit.
>> 
>> Index: gcc/config/i386/i386-modes.def
>> ===
>> --- gcc/config/i386/i386-modes.def (revision 205895)
>> +++ gcc/config/i386/i386-modes.def (working copy)
>> @@ -90,5 +90,10 @@ VECTOR_MODE (INT, QI, 2); /*
>> INT_MODE (OI, 32);
>> INT_MODE (XI, 64);
>> 
>> +/* Keep the OI and XI modes from confusing the compiler into thinking
>> +   that these modes could actually be used for computation.  They are
>> +   only holders for vectors during data movement.  */
>> +#define MAX_BITSIZE_MODE_ANY_INT (128)
>> +
>> /* The symbol Pmode stands for one of the above machine modes (usually 
>> SImode).
>>The tm.h file specifies which one.  It is not a distinct mode.  */
>> 
>> __int128 is avaialble only for x86_64 - 64bit targets, so:
>> 
>> #define MAX_BITSIZE_MODE_ANY_INT (TARGET_64BIT ? 128 : 64)
> 
> It needs to be a compile-time constant.
> 
> Richard.
I will add a line to the effect to the doc before checking it in.



Re: [trunk]: Patch to move BITS_PER_UNIT to be available for genmodes.c

2013-12-11 Thread Kenneth Zadeck

On 12/11/2013 08:42 PM, DJ Delorie wrote:

The m32c part is OK.

thanks for the fast reply.

kenny


[trunk]: Patch to move BITS_PER_UNIT to be available for genmodes.c

2013-12-11 Thread Kenneth Zadeck



This patch is for the trunk, but it solves a problem that comes up for 
wide-int.   For wide-int we need to have the BITS_PER_UNIT available 
earlier.So this patch sets the default value (8) in genmodes.c so 
that it is available by anyone who includes insn-modes.h.  The generator 
for tm.h was modified to include insn-modes.h.The value for 
BITS_PER_UNIT can be overridden by any port by placing a define for it 
in their target modes file.


This patch removes the definition of BITS_PER_UNIT from 7 platform .h 
files.   All of those platforms initialized it to the default value so 
there was no need for additions to their target modes file.


In addition, this target also changes the way that 
MAX_BITSIZE_MODE_ANY_INT is calculated.The value is heavily used on 
the wide-int branch to allocate buffers that are used to hold large 
integer values.   The change in the way it is computed was motivated by 
the i386 port, but there may be other ports that have the same 
problem.   The i386 port defines two very large integer modes that are 
only used as containers for large vectors.   They are never used for 
large integers.  The new way of computing this allows a port to say 
(very early) that some of these integer modes are never used to hold 
numbers and so smaller buffers can be used for integer calculations.   
Other ports that play the same game should follow suit.


This patch has been bootstrapped and regression tested on x86-64. Ok to 
commit?


Kenny
Index: gcc/config/arc/arc.h
===
--- gcc/config/arc/arc.h	(revision 205895)
+++ gcc/config/arc/arc.h	(working copy)
@@ -303,9 +303,6 @@ along with GCC; see the file COPYING3.
numbered.  */
 #define WORDS_BIG_ENDIAN (TARGET_BIG_ENDIAN)
 
-/* Number of bits in an addressable storage unit.  */
-#define BITS_PER_UNIT 8
-
 /* Width in bits of a "word", which is the contents of a machine register.
Note that this is not necessarily the width of data type `int';
if using 16-bit ints on a 68000, this would still be 32.
Index: gcc/config/bfin/bfin.h
===
--- gcc/config/bfin/bfin.h	(revision 205895)
+++ gcc/config/bfin/bfin.h	(working copy)
@@ -859,9 +859,6 @@ typedef struct {
 /* Define this if most significant word of a multiword number is numbered. */
 #define WORDS_BIG_ENDIAN 0
 
-/* number of bits in an addressable storage unit */
-#define BITS_PER_UNIT 8
-
 /* Width in bits of a "word", which is the contents of a machine register.
Note that this is not necessarily the width of data type `int';
if using 16-bit ints on a 68000, this would still be 32.
Index: gcc/config/lm32/lm32.h
===
--- gcc/config/lm32/lm32.h	(revision 205895)
+++ gcc/config/lm32/lm32.h	(working copy)
@@ -73,7 +73,6 @@
 #define BYTES_BIG_ENDIAN 1
 #define WORDS_BIG_ENDIAN 1
 
-#define BITS_PER_UNIT 8
 #define BITS_PER_WORD 32
 #define UNITS_PER_WORD 4
 
Index: gcc/config/m32c/m32c.h
===
--- gcc/config/m32c/m32c.h	(revision 205895)
+++ gcc/config/m32c/m32c.h	(working copy)
@@ -140,7 +140,6 @@ machine_function;
matches "int".  Pointers are 16 bits for R8C/M16C (when TARGET_A16
is true) and 24 bits for M32CM/M32C (when TARGET_A24 is true), but
24-bit pointers are stored in 32-bit words.  */
-#define BITS_PER_UNIT 8
 #define UNITS_PER_WORD 2
 #define POINTER_SIZE (TARGET_A16 ? 16 : 32)
 #define POINTERS_EXTEND_UNSIGNED 1
Index: gcc/config/microblaze/microblaze.h
===
--- gcc/config/microblaze/microblaze.h	(revision 205895)
+++ gcc/config/microblaze/microblaze.h	(working copy)
@@ -193,7 +193,6 @@ extern enum pipeline_type microblaze_pip
 #define BITS_BIG_ENDIAN 0
 #define BYTES_BIG_ENDIAN (TARGET_LITTLE_ENDIAN == 0)
 #define WORDS_BIG_ENDIAN (BYTES_BIG_ENDIAN)
-#define BITS_PER_UNIT   8
 #define BITS_PER_WORD   32
 #define UNITS_PER_WORD  4
 #define MIN_UNITS_PER_WORD  4
Index: gcc/config/picochip/picochip.h
===
--- gcc/config/picochip/picochip.h	(revision 205895)
+++ gcc/config/picochip/picochip.h	(working copy)
@@ -92,8 +92,6 @@ extern enum picochip_dfa_type picochip_s
 #define BYTES_BIG_ENDIAN 0
 #define WORDS_BIG_ENDIAN 0
 
-#define BITS_PER_UNIT 8
-
 #define BITS_PER_WORD 16
 #define UNITS_PER_WORD (BITS_PER_WORD / BITS_PER_UNIT)
 
Index: gcc/config/spu/spu.h
===
--- gcc/config/spu/spu.h	(revision 205895)
+++ gcc/config/spu/spu.h	(working copy)
@@ -54,8 +54,6 @@ extern GTY(()) int spu_tune;
 
 #define WORDS_BIG_ENDIAN 1
 
-#define BITS_PER_UNIT 8
-
 /* GCC uses word_mode in many places, assuming that it is the fastest
integer mode.  That is not the case for SPU though.  We can't us

Re: [wide-int] small cleanup in wide-int.*

2013-12-11 Thread Kenneth Zadeck

On 12/09/2013 10:01 AM, Richard Biener wrote:

On Mon, Dec 9, 2013 at 3:49 PM, Kenneth Zadeck  wrote:

On 12/08/2013 05:35 AM, Richard Biener wrote:

Richard Sandiford  wrote:

Kenneth Zadeck  writes:

 #define WIDE_INT_MAX_ELTS \
-  ((4 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \
-   / HOST_BITS_PER_WIDE_INT)
+  (((MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1)\
+/ HOST_BITS_PER_WIDE_INT) + 1)

I think this should be:

  (MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT + 1)

We only need an extra HWI if MAX_BITSIZE_MODE_ANY_INT is an exact

multiple

of HOST_BITS_PER_WIDE_INT.

we will do this later when some other issues that Eric B raised are

settled.

I think you're talking about defining MAX_BITSIZE_MODE_ANY_INT as
MAX_SIZE_MODE_ANY_INT * BITS_PER_UNIT, but that's orthogonal to the
change above.  IMO it doesn't make sense to both round up the

division

and also add 1 to the result.  So I think we should make this change
regardless of whatever follows.

Looks good to me otherwise, thanks.

Richard


so this one works the way you want.While it is true the the

problems

are disjoint, the solution will likely evolving change the same lines

of

source in two different ways.

Well, if we're changing the definition of WIDE_INT_MAX_ELTS now (and we
are)
I think we should change it to something that makes sense.  All we want
is
1 extra bit.  We don't need to round up the size and then add a whole
HWI on top.  Doing that will potentially pessimise targets that don't
need anything beyond SImode.

I agree.

Richard.

Done, ok to commit?

+   early examination of the target's mode file.  The WIDE_INT_MAX_ELTS
+   can accomodate at least 1 more bit so that unsigned numbers of that
+   mode can be represented.  Note that it is still possible to create
+   fixed_wide_ints that have precisions greater than
+   MAX_BITSIZE_MODE_ANY_INT.  This can be useful when representing a
+   double-width multiplication result, for example.  */
  #define WIDE_INT_MAX_ELTS \
-  ((4 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \
-   / HOST_BITS_PER_WIDE_INT)
+  (((MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1)\
+/ HOST_BITS_PER_WIDE_INT))
+

that doesn't reserve 1 more bit, it just rounds up.  For one more bit you
need to add 1, so

  #define WIDE_INT_MAX_ELTS \
+  (((MAX_BITSIZE_MODE_ANY_INT + 1 + HOST_BITS_PER_WIDE_INT - 1)\
+/ HOST_BITS_PER_WIDE_INT))

no?

Otherwise ok.

Thanks,
Richard.


kenny



It's not like I can approve the patch anyway though, so I'll leave it
there.

Thanks,
Richard



committed as revision 205900 after rebootstrap.

kenny
Index: gcc/tree.h
===
--- gcc/tree.h	(revision 205897)
+++ gcc/tree.h	(working copy)
@@ -4545,7 +4545,7 @@ namespace wi
 static const unsigned int precision = N;
   };
 
-  generic_wide_int  >
+  generic_wide_int  >
   to_widest (const_tree);
 
   generic_wide_int  > to_offset (const_tree);
@@ -4566,7 +4566,7 @@ wi::int_traits ::decompose (
 			  precision);
 }
 
-inline generic_wide_int  >
+inline generic_wide_int  >
 wi::to_widest (const_tree t)
 {
   return t;
@@ -4605,7 +4605,7 @@ wi::extended_tree ::get_len () const
 {
   if (N == ADDR_MAX_PRECISION)
 return TREE_INT_CST_OFFSET_NUNITS (m_t);
-  else if (N == MAX_BITSIZE_MODE_ANY_INT)
+  else if (N >= WIDE_INT_MAX_PRECISION)
 return TREE_INT_CST_EXT_NUNITS (m_t);
   else
 /* This class is designed to be used for specific output precisions
Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c	(revision 205897)
+++ gcc/tree-vrp.c	(working copy)
@@ -2620,23 +2620,24 @@ extract_range_from_binary_expr_1 (value_
 
   signop sign = TYPE_SIGN (expr_type);
   unsigned int prec = TYPE_PRECISION (expr_type);
-  unsigned int prec2 = (prec * 2) + (sign == UNSIGNED ? 2 : 0);
 
   if (range_int_cst_p (&vr0)
 	  && range_int_cst_p (&vr1)
 	  && TYPE_OVERFLOW_WRAPS (expr_type))
 	{
-	  wide_int sizem1 = wi::mask (prec, false, prec2);
-	  wide_int size = sizem1 + 1;
+	  typedef FIXED_WIDE_INT (WIDE_INT_MAX_PRECISION * 2) vrp_int;
+	  typedef generic_wide_int
+  > vrp_int_cst;
+	  vrp_int sizem1 = wi::mask  (prec, false);
+	  vrp_int size = sizem1 + 1;
 
 	  /* Extend the values using the sign of the result to PREC2.
 	 From here on out, everthing is just signed math no matter
 	 what the input types were.  */
-	  wide_int min0 = wide_int::from (vr0.min, prec2, sign);
-	  wide_int max0 = wide_int::from (vr0.max, prec2, sign);
-	  wide_int min1 = wide_int::from (vr1.min, prec2, sign);
-	  wide_int max1 = wide_int::from (vr1.max, prec2, sign);
-
+  vrp_int min0 = vrp_int_cst (vr0.min);
+  vrp_int max0 = vrp_int_cst (vr0.max);
+  vrp_int min1 = vrp_int_cst (vr1.min);
+  vrp_in

wide-int more performance fixes for wide multiplication.

2013-12-09 Thread Kenneth Zadeck

This patch is the last performance patch that i have for wide-int.

This patch changes large multiply from taking precision/hbpwi * 
precision/hbpwi multiplies to taking
#significant_bits1/hbpwi * #significant_bits2/hbpwi multiplications.   
That was a significant number of multiplies on machines that had a large 
largest integer.


This patch was mostly tested before richard sandifords patch which uses 
the longlong stuff.   That patch gets many of the cases that this would 
get.  but if either of the operands does not fit into a hwi, then things 
got really expensive without this patch.


This patch works a lot differently from the version in the branch. In 
the branch, the inputs were assumed to unsigned numbers and the multiply 
happened with what ever came in.   After the multiply happened and if a 
signed multiply was requested, the product was adjusted.


in this version, the adjustment happens first if we are doing signed 
multiply.   Thus the operands are always unsigned numbers to the actual 
multiply loops.   We may need an extra bit for this, but we have it.   
The advantage of this is that negative numbers have a lot of leading 1s 
and compensating for these is complex.   The easy solution is to convert 
this to a positive number and then count many hwi's are needed to 
represent that.Of course, if exactly one of the operands was 
negative, then we end up negating the product at the end.


This patch was tested on x86-64 with richards longlong short cut 
commented out to get it to hit a lot of times.   And for this it makes a 
lot of difference in the number of multiplies that are performed.
Without commenting that out, the number of hits is small when one looks 
only at the bootstrap and the regression tests.   However, this patch 
will also get more hit more often for ports that decide to revert back 
to having a HWI be 32 bits because any time one of the operands is 
larger than 32 bits, this patch will take control.   Also, this patch 
hits if any of the operands really do not fit in a hwi.


This patch only works after that patch to change tree-vrp has been applied.

http://gcc.gnu.org/ml/gcc-patches/2013-12/msg00877.html

kenny
Index: gcc/wide-int.cc
===
--- gcc/wide-int.cc	(revision 205765)
+++ gcc/wide-int.cc	(working copy)
@@ -1275,23 +1275,31 @@ wi::mul_internal (HOST_WIDE_INT *val, co
   unsigned int j;
   unsigned int blocks_needed = BLOCKS_NEEDED (prec);
   unsigned int half_blocks_needed = blocks_needed * 2;
-  /* The sizes here are scaled to support a 2x largest mode by 2x
- largest mode yielding a 4x largest mode result.  This is what is
- needed by vpn.  */
 
+  /* The sizes here are scaled to support a 2*largest mode + a little
+ by 2*largest mode + a little yielding a 4*largest mode + a
+ little.  This is what is needed by vpn.  */
   unsigned HOST_HALF_WIDE_INT
-u[4 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_HALF_WIDE_INT];
+u[2 * WIDE_INT_MAX_PRECISION / HOST_BITS_PER_HALF_WIDE_INT];
+  unsigned int ulen;
   unsigned HOST_HALF_WIDE_INT
-v[4 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_HALF_WIDE_INT];
-  /* The '2' in 'R' is because we are internally doing a full
+v[2 * WIDE_INT_MAX_PRECISION / HOST_BITS_PER_HALF_WIDE_INT];
+  unsigned int vlen;
+  unsigned int uvlen;
+  unsigned int vallen;
+
+  /* The '4' in 'R' is because we are internally doing a wide
  multiply.  */
   unsigned HOST_HALF_WIDE_INT
-r[2 * 4 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_HALF_WIDE_INT];
-  HOST_WIDE_INT mask = ((HOST_WIDE_INT)1 << HOST_BITS_PER_HALF_WIDE_INT) - 1;
+r[4 * WIDE_INT_MAX_PRECISION / HOST_BITS_PER_HALF_WIDE_INT];
+
+  /* True if the result needs to be negated.  */
+  bool is_neg = false;
 
   /* If the top level routine did not really pass in an overflow, then
  just make sure that we never attempt to set it.  */
   bool needs_overflow = (overflow != 0);
+  const HOST_WIDE_INT zero = 0; 
   if (needs_overflow)
 *overflow = false;
 
@@ -1382,61 +1392,96 @@ wi::mul_internal (HOST_WIDE_INT *val, co
   return 1;
 }
 
-  /* We do unsigned mul and then correct it.  */
-  wi_unpack (u, (const unsigned HOST_WIDE_INT *) op1val, op1len,
-	 half_blocks_needed, prec, SIGNED);
-  wi_unpack (v, (const unsigned HOST_WIDE_INT *) op2val, op2len,
-	 half_blocks_needed, prec, SIGNED);
-
-  /* The 2 is for a full mult.  */
-  memset (r, 0, half_blocks_needed * 2
-	  * HOST_BITS_PER_HALF_WIDE_INT / CHAR_BIT);
+  ulen = op1len * 2;
+  vlen = op2len * 2;
+
+  if (sgn == SIGNED)
+{
+  if (wi::neg_p (op1))
+	{
+	  HOST_WIDE_INT nop1[2 * WIDE_INT_MAX_ELTS];
+	  
+	  int nop1len 
+	= wi::sub_large (nop1, &zero, 1, op1val, op1len, prec + 1, SIGNED, 0);
+	  is_neg = !is_neg;
+	  ulen = nop1len * 2;
+	  wi_unpack (u, (const unsigned HOST_WIDE_INT*)nop1, nop1len,
+		 ulen, prec, SIGNED);
+	}
+  else
+	wi_unpack (u, (const unsigned HOST_WIDE_INT*)op1val, o

Re: [wide-int] small cleanup in wide-int.*

2013-12-09 Thread Kenneth Zadeck

On 12/09/2013 10:12 AM, Richard Sandiford wrote:

Richard Biener  writes:

On Mon, Dec 9, 2013 at 3:49 PM, Kenneth Zadeck  wrote:

On 12/08/2013 05:35 AM, Richard Biener wrote:

Richard Sandiford  wrote:

Kenneth Zadeck  writes:

 #define WIDE_INT_MAX_ELTS \
-  ((4 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \
-   / HOST_BITS_PER_WIDE_INT)
+  (((MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1)\
+/ HOST_BITS_PER_WIDE_INT) + 1)

I think this should be:

  (MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT + 1)

We only need an extra HWI if MAX_BITSIZE_MODE_ANY_INT is an exact

multiple

of HOST_BITS_PER_WIDE_INT.

we will do this later when some other issues that Eric B raised are

settled.

I think you're talking about defining MAX_BITSIZE_MODE_ANY_INT as
MAX_SIZE_MODE_ANY_INT * BITS_PER_UNIT, but that's orthogonal to the
change above.  IMO it doesn't make sense to both round up the

division

and also add 1 to the result.  So I think we should make this change
regardless of whatever follows.

Looks good to me otherwise, thanks.

Richard


so this one works the way you want.While it is true the the

problems

are disjoint, the solution will likely evolving change the same lines

of

source in two different ways.

Well, if we're changing the definition of WIDE_INT_MAX_ELTS now (and we
are)
I think we should change it to something that makes sense.  All we want
is
1 extra bit.  We don't need to round up the size and then add a whole
HWI on top.  Doing that will potentially pessimise targets that don't
need anything beyond SImode.

I agree.

Richard.

Done, ok to commit?

+   early examination of the target's mode file.  The WIDE_INT_MAX_ELTS
+   can accomodate at least 1 more bit so that unsigned numbers of that
+   mode can be represented.  Note that it is still possible to create
+   fixed_wide_ints that have precisions greater than
+   MAX_BITSIZE_MODE_ANY_INT.  This can be useful when representing a
+   double-width multiplication result, for example.  */
  #define WIDE_INT_MAX_ELTS \
-  ((4 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \
-   / HOST_BITS_PER_WIDE_INT)
+  (((MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1)\
+/ HOST_BITS_PER_WIDE_INT))
+

that doesn't reserve 1 more bit, it just rounds up.  For one more bit you
need to add 1, so

  #define WIDE_INT_MAX_ELTS \
+  (((MAX_BITSIZE_MODE_ANY_INT + 1 + HOST_BITS_PER_WIDE_INT - 1)\
+/ HOST_BITS_PER_WIDE_INT))

no?

Or just:

(MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT + 1)

as above.

Richard


This time for sure.   Tested on x86-64

kenny
Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c	(revision 205726)
+++ gcc/tree-vrp.c	(working copy)
@@ -2620,23 +2620,24 @@ extract_range_from_binary_expr_1 (value_
 
   signop sign = TYPE_SIGN (expr_type);
   unsigned int prec = TYPE_PRECISION (expr_type);
-  unsigned int prec2 = (prec * 2) + (sign == UNSIGNED ? 2 : 0);
 
   if (range_int_cst_p (&vr0)
 	  && range_int_cst_p (&vr1)
 	  && TYPE_OVERFLOW_WRAPS (expr_type))
 	{
-	  wide_int sizem1 = wi::mask (prec, false, prec2);
-	  wide_int size = sizem1 + 1;
+	  typedef FIXED_WIDE_INT (WIDE_INT_MAX_PRECISION * 2) vrp_int;
+	  typedef generic_wide_int
+  > vrp_int_cst;
+	  vrp_int sizem1 = wi::mask  (prec, false);
+	  vrp_int size = sizem1 + 1;
 
 	  /* Extend the values using the sign of the result to PREC2.
 	 From here on out, everthing is just signed math no matter
 	 what the input types were.  */
-	  wide_int min0 = wide_int::from (vr0.min, prec2, sign);
-	  wide_int max0 = wide_int::from (vr0.max, prec2, sign);
-	  wide_int min1 = wide_int::from (vr1.min, prec2, sign);
-	  wide_int max1 = wide_int::from (vr1.max, prec2, sign);
-
+  vrp_int min0 = vrp_int_cst (vr0.min);
+  vrp_int max0 = vrp_int_cst (vr0.max);
+  vrp_int min1 = vrp_int_cst (vr1.min);
+  vrp_int max1 = vrp_int_cst (vr1.max);
 	  /* Canonicalize the intervals.  */
 	  if (sign == UNSIGNED)
 	{
@@ -2653,17 +2654,17 @@ extract_range_from_binary_expr_1 (value_
 		}
 	}
 
-	  wide_int prod0 = min0 * min1;
-	  wide_int prod1 = min0 * max1;
-	  wide_int prod2 = max0 * min1;
-	  wide_int prod3 = max0 * max1;
+	  vrp_int prod0 = min0 * min1;
+	  vrp_int prod1 = min0 * max1;
+	  vrp_int prod2 = max0 * min1;
+	  vrp_int prod3 = max0 * max1;
 
 	  /* Sort the 4 products so that min is in prod0 and max is in
 	 prod3.  */
 	  /* min0min1 > max0max1 */
 	  if (wi::gts_p (prod0, prod3))
 	{
-	  wide_int tmp = prod3;
+	  vrp_int tmp = prod3;
 	  prod3 = prod0;
 	  prod0 = tmp;
 	}
@@ -2671,21 +2672,21 @@ extract_range_from_binary_expr_1 (value_
 	  /* min0max1 > max0min1 */
 	  if (wi::gts_p (prod1, prod2))
 	{
-	  wide_int tmp = prod2;
+	  vrp_int tmp = prod2;
 	  prod

Re: [wide-int] small cleanup in wide-int.*

2013-12-09 Thread Kenneth Zadeck

On 12/09/2013 10:01 AM, Richard Biener wrote:

On Mon, Dec 9, 2013 at 3:49 PM, Kenneth Zadeck  wrote:

On 12/08/2013 05:35 AM, Richard Biener wrote:

Richard Sandiford  wrote:

Kenneth Zadeck  writes:

 #define WIDE_INT_MAX_ELTS \
-  ((4 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \
-   / HOST_BITS_PER_WIDE_INT)
+  (((MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1)\
+/ HOST_BITS_PER_WIDE_INT) + 1)

I think this should be:

  (MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT + 1)

We only need an extra HWI if MAX_BITSIZE_MODE_ANY_INT is an exact

multiple

of HOST_BITS_PER_WIDE_INT.

we will do this later when some other issues that Eric B raised are

settled.

I think you're talking about defining MAX_BITSIZE_MODE_ANY_INT as
MAX_SIZE_MODE_ANY_INT * BITS_PER_UNIT, but that's orthogonal to the
change above.  IMO it doesn't make sense to both round up the

division

and also add 1 to the result.  So I think we should make this change
regardless of whatever follows.

Looks good to me otherwise, thanks.

Richard


so this one works the way you want.While it is true the the

problems

are disjoint, the solution will likely evolving change the same lines

of

source in two different ways.

Well, if we're changing the definition of WIDE_INT_MAX_ELTS now (and we
are)
I think we should change it to something that makes sense.  All we want
is
1 extra bit.  We don't need to round up the size and then add a whole
HWI on top.  Doing that will potentially pessimise targets that don't
need anything beyond SImode.

I agree.

Richard.

Done, ok to commit?

+   early examination of the target's mode file.  The WIDE_INT_MAX_ELTS
+   can accomodate at least 1 more bit so that unsigned numbers of that
+   mode can be represented.  Note that it is still possible to create
+   fixed_wide_ints that have precisions greater than
+   MAX_BITSIZE_MODE_ANY_INT.  This can be useful when representing a
+   double-width multiplication result, for example.  */
  #define WIDE_INT_MAX_ELTS \
-  ((4 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \
-   / HOST_BITS_PER_WIDE_INT)
+  (((MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1)\
+/ HOST_BITS_PER_WIDE_INT))
+

that doesn't reserve 1 more bit, it just rounds up.  For one more bit you
need to add 1, so

  #define WIDE_INT_MAX_ELTS \
+  (((MAX_BITSIZE_MODE_ANY_INT + 1 + HOST_BITS_PER_WIDE_INT - 1)\
+/ HOST_BITS_PER_WIDE_INT))

no?


oops?   i am testing it correctly this time.   Will report back in an hour.


kenny


Otherwise ok.

Thanks,
Richard.


kenny



It's not like I can approve the patch anyway though, so I'll leave it
there.

Thanks,
Richard






Re: [wide-int] small cleanup in wide-int.*

2013-12-09 Thread Kenneth Zadeck

On 12/08/2013 05:35 AM, Richard Biener wrote:

Richard Sandiford  wrote:

Kenneth Zadeck  writes:

#define WIDE_INT_MAX_ELTS \
-  ((4 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \
-   / HOST_BITS_PER_WIDE_INT)
+  (((MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1)\
+/ HOST_BITS_PER_WIDE_INT) + 1)

I think this should be:

 (MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT + 1)

We only need an extra HWI if MAX_BITSIZE_MODE_ANY_INT is an exact

multiple

of HOST_BITS_PER_WIDE_INT.

we will do this later when some other issues that Eric B raised are

settled.

I think you're talking about defining MAX_BITSIZE_MODE_ANY_INT as
MAX_SIZE_MODE_ANY_INT * BITS_PER_UNIT, but that's orthogonal to the
change above.  IMO it doesn't make sense to both round up the

division

and also add 1 to the result.  So I think we should make this change
regardless of whatever follows.

Looks good to me otherwise, thanks.

Richard


so this one works the way you want.While it is true the the

problems

are disjoint, the solution will likely evolving change the same lines

of

source in two different ways.

Well, if we're changing the definition of WIDE_INT_MAX_ELTS now (and we
are)
I think we should change it to something that makes sense.  All we want
is
1 extra bit.  We don't need to round up the size and then add a whole
HWI on top.  Doing that will potentially pessimise targets that don't
need anything beyond SImode.

I agree.

Richard.

Done, ok to commit?

kenny


It's not like I can approve the patch anyway though, so I'll leave it
there.

Thanks,
Richard




Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c	(revision 205726)
+++ gcc/tree-vrp.c	(working copy)
@@ -2620,23 +2620,24 @@ extract_range_from_binary_expr_1 (value_
 
   signop sign = TYPE_SIGN (expr_type);
   unsigned int prec = TYPE_PRECISION (expr_type);
-  unsigned int prec2 = (prec * 2) + (sign == UNSIGNED ? 2 : 0);
 
   if (range_int_cst_p (&vr0)
 	  && range_int_cst_p (&vr1)
 	  && TYPE_OVERFLOW_WRAPS (expr_type))
 	{
-	  wide_int sizem1 = wi::mask (prec, false, prec2);
-	  wide_int size = sizem1 + 1;
+	  typedef FIXED_WIDE_INT (WIDE_INT_MAX_PRECISION * 2) vrp_int;
+	  typedef generic_wide_int
+  > vrp_int_cst;
+	  vrp_int sizem1 = wi::mask  (prec, false);
+	  vrp_int size = sizem1 + 1;
 
 	  /* Extend the values using the sign of the result to PREC2.
 	 From here on out, everthing is just signed math no matter
 	 what the input types were.  */
-	  wide_int min0 = wide_int::from (vr0.min, prec2, sign);
-	  wide_int max0 = wide_int::from (vr0.max, prec2, sign);
-	  wide_int min1 = wide_int::from (vr1.min, prec2, sign);
-	  wide_int max1 = wide_int::from (vr1.max, prec2, sign);
-
+  vrp_int min0 = vrp_int_cst (vr0.min);
+  vrp_int max0 = vrp_int_cst (vr0.max);
+  vrp_int min1 = vrp_int_cst (vr1.min);
+  vrp_int max1 = vrp_int_cst (vr1.max);
 	  /* Canonicalize the intervals.  */
 	  if (sign == UNSIGNED)
 	{
@@ -2653,17 +2654,17 @@ extract_range_from_binary_expr_1 (value_
 		}
 	}
 
-	  wide_int prod0 = min0 * min1;
-	  wide_int prod1 = min0 * max1;
-	  wide_int prod2 = max0 * min1;
-	  wide_int prod3 = max0 * max1;
+	  vrp_int prod0 = min0 * min1;
+	  vrp_int prod1 = min0 * max1;
+	  vrp_int prod2 = max0 * min1;
+	  vrp_int prod3 = max0 * max1;
 
 	  /* Sort the 4 products so that min is in prod0 and max is in
 	 prod3.  */
 	  /* min0min1 > max0max1 */
 	  if (wi::gts_p (prod0, prod3))
 	{
-	  wide_int tmp = prod3;
+	  vrp_int tmp = prod3;
 	  prod3 = prod0;
 	  prod0 = tmp;
 	}
@@ -2671,21 +2672,21 @@ extract_range_from_binary_expr_1 (value_
 	  /* min0max1 > max0min1 */
 	  if (wi::gts_p (prod1, prod2))
 	{
-	  wide_int tmp = prod2;
+	  vrp_int tmp = prod2;
 	  prod2 = prod1;
 	  prod1 = tmp;
 	}
 
 	  if (wi::gts_p (prod0, prod1))
 	{
-	  wide_int tmp = prod1;
+	  vrp_int tmp = prod1;
 	  prod1 = prod0;
 	  prod0 = tmp;
 	}
 
 	  if (wi::gts_p (prod2, prod3))
 	{
-	  wide_int tmp = prod3;
+	  vrp_int tmp = prod3;
 	  prod3 = prod2;
 	  prod2 = tmp;
 	}
Index: gcc/tree.h
===
--- gcc/tree.h	(revision 205726)
+++ gcc/tree.h	(working copy)
@@ -4549,7 +4549,7 @@ namespace wi
 static const unsigned int precision = N;
   };
 
-  generic_wide_int  >
+  generic_wide_int  >
   to_widest (const_tree);
 
   generic_wide_int  > to_offset (const_tree);
@@ -4570,7 +4570,7 @@ wi::int_traits ::decompose (
 			  precision);
 }
 
-inline generic_wide_int  >
+inline generic_wide_int  >
 wi::to_widest (const_tree t)
 {
   return t;
@@ -4609,7 +4609,7 @@ wi::extended_tree ::get_len () const
 {
   if (N == ADDR_MAX_PRECISION)
 return TR

Re: [wide-int] small cleanup in wide-int.*

2013-12-06 Thread Kenneth Zadeck

On 12/06/2013 01:32 PM, Richard Sandiford wrote:

Kenneth Zadeck  writes:

On 12/03/2013 11:52 AM, Richard Sandiford wrote:

Kenneth Zadeck  writes:

Index: tree-vrp.c
===
--- tree-vrp.c  (revision 205597)
+++ tree-vrp.c  (working copy)
@@ -2611,22 +2611,28 @@ extract_range_from_binary_expr_1 (value_
   
 signop sign = TYPE_SIGN (expr_type);

 unsigned int prec = TYPE_PRECISION (expr_type);
-  unsigned int prec2 = (prec * 2) + (sign == UNSIGNED ? 2 : 0);
   
 if (range_int_cst_p (&vr0)

  && range_int_cst_p (&vr1)
  && TYPE_OVERFLOW_WRAPS (expr_type))
{
- wide_int sizem1 = wi::mask (prec, false, prec2);
- wide_int size = sizem1 + 1;
+ /* vrp_int is twice as wide as anything that the target
+supports so it can support a full width multiply.  No
+need to add any more padding for an extra sign bit
+because that comes with the way that WIDE_INT_MAX_ELTS is
+defined.  */
+ typedef FIXED_WIDE_INT (WIDE_INT_MAX_PRECISION * 2)
+   vrp_int;
+ vrp_int sizem1 = wi::mask  (prec, false);
+ vrp_int size = sizem1 + 1;
   
   	  /* Extend the values using the sign of the result to PREC2.

 From here on out, everthing is just signed math no matter
 what the input types were.  */
- wide_int min0 = wide_int::from (vr0.min, prec2, sign);
- wide_int max0 = wide_int::from (vr0.max, prec2, sign);
- wide_int min1 = wide_int::from (vr1.min, prec2, sign);
- wide_int max1 = wide_int::from (vr1.max, prec2, sign);
+ vrp_int min0 = wi::to_vrp (vr0.min);
+ vrp_int max0 = wi::to_vrp (vr0.max);
+ vrp_int min1 = wi::to_vrp (vr1.min);
+ vrp_int max1 = wi::to_vrp (vr1.max);

I think we should avoid putting to_vrp in tree.h if vrp_int is only
local to this block.  Instead you could have:

  typedef generic_wide_int
> vrp_int_cst;
...
vrp_int_cst min0 = vr0.min;
vrp_int_cst max0 = vr0.max;
vrp_int_cst min1 = vr1.min;
vrp_int_cst max1 = vr1.max;


i did this in a different way because i had trouble doing it as you
suggested.the short answer is that all of the vrp_int code is now
local to tree-vrp.c which i think was your primary goal

Ah, so we later assign to these variables:

  /* Canonicalize the intervals.  */
  if (sign == UNSIGNED)
{
  if (wi::ltu_p (size, min0 + max0))
{
  min0 -= size;
  max0 -= size;
}

  if (wi::ltu_p (size, min1 + max1))
{
  min1 -= size;
  max1 -= size;
}
}

OK, in that case I suppose a temporary is needed.  But I'd prefer
not to put local stuff in the wi:: namespace.  You could just have:

  typedef generic_wide_int
 > vrp_int_cst;

   vrp_int min0 = vrp_int_cst (vr0.min);
   vrp_int max0 = vrp_int_cst (vr0.max);
   vrp_int min1 = vrp_int_cst (vr1.min);
   vrp_int max1 = vrp_int_cst (vr1.max);

which removes the need for:

+/* vrp_int is twice as wide as anything that the target supports so it
+   can support a full width multiply.  No need to add any more padding
+   for an extra sign bit because that comes with the way that
+   WIDE_INT_MAX_ELTS is defined.  */
+typedef FIXED_WIDE_INT (WIDE_INT_MAX_PRECISION * 2) vrp_int;
+namespace wi
+{
+  generic_wide_int  > to_vrp 
(const_tree);
+}
+
+inline generic_wide_int  >
+wi::to_vrp (const_tree t)
+{
+  return t;
+}
+


   #define WIDE_INT_MAX_ELTS \
-  ((4 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \
-   / HOST_BITS_PER_WIDE_INT)
+  (((MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1)\
+/ HOST_BITS_PER_WIDE_INT) + 1)

I think this should be:

(MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT + 1)

We only need an extra HWI if MAX_BITSIZE_MODE_ANY_INT is an exact multiple
of HOST_BITS_PER_WIDE_INT.

we will do this later when some other issues that Eric B raised are settled.

I think you're talking about defining MAX_BITSIZE_MODE_ANY_INT as
MAX_SIZE_MODE_ANY_INT * BITS_PER_UNIT, but that's orthogonal to the
change above.  IMO it doesn't make sense to both round up the division
and also add 1 to the result.  So I think we should make this change
regardless of whatever follows.

Looks good to me otherwise, thanks.

Richard

so this one works the way you want.While it is true the the problems 
are disjoint, the solution will likely evolving change the same lines of 
source in two different ways.


ok to commit.

kenny
Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c	(revision 205726

Re: wide-int, rtl

2013-12-06 Thread Kenneth Zadeck

On 11/27/2013 11:24 AM, Eric Botcazou wrote:

Richi has asked the we break the wide-int patch so that the individual port
and front end maintainers can review their parts without have to go through
the entire patch.This patch covers the first half of the rtl code.

--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -2336,15 +2336,23 @@ hash_rtx_cb (const_rtx x, enum machine_mode mode,
 + (unsigned int) INTVAL (x));
return hash;
  
+case CONST_WIDE_INT:

+  {
+   int i;
+   for (i = 0; i < CONST_WIDE_INT_NUNITS (x); i++)
+ hash += CONST_WIDE_INT_ELT (x, i);
+  }
+  return hash;

You can write "for (int i = 0; ..." now and remove the parentheses.


done

--- a/gcc/cselib.c
+++ b/gcc/cselib.c
@@ -1121,15 +1120,23 @@ cselib_hash_rtx (rtx x, int create, enum machine_mode
memmode)
hash += ((unsigned) CONST_INT << 7) + INTVAL (x);
return hash ? hash : (unsigned int) CONST_INT;
  
+case CONST_WIDE_INT:

+  {
+   int i;
+   for (i = 0; i < CONST_WIDE_INT_NUNITS (x); i++)
+ hash += CONST_WIDE_INT_ELT (x, i);
+  }
+  return hash;
+

Likewise.

done

--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -3298,23 +3268,13 @@ choose_multiplier (unsigned HOST_WIDE_INT d, int n,
int precision,
pow = n + lgup;
pow2 = n + lgup - precision;
  
-  /* We could handle this with some effort, but this case is much

- better handled directly with a scc insn, so rely on caller using
- that.  */
-  gcc_assert (pow != HOST_BITS_PER_DOUBLE_INT);

Why removing it?

the code no longer has restrictions on the size/mode of the variables.

--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -722,64 +722,33 @@ convert_modes (enum machine_mode mode, enum machine_mode
oldmode, rtx x, int uns
if (mode == oldmode)
  return x;
  
-  /* There is one case that we must handle specially: If we are converting

- a CONST_INT into a mode whose size is twice HOST_BITS_PER_WIDE_INT and
- we are to interpret the constant as unsigned, gen_lowpart will do
- the wrong if the constant appears negative.  What we want to do is
- make the high-order word of the constant zero, not all ones.  */
-
-  if (unsignedp && GET_MODE_CLASS (mode) == MODE_INT
-  && GET_MODE_BITSIZE (mode) == HOST_BITS_PER_DOUBLE_INT
-  && CONST_INT_P (x) && INTVAL (x) < 0)
+  if (CONST_SCALAR_INT_P (x)
+  && GET_MODE_CLASS (mode) == MODE_INT)

On a single line.

done


  {
-  double_int val = double_int::from_uhwi (INTVAL (x));
-
-  /* We need to zero extend VAL.  */
-  if (oldmode != VOIDmode)
-   val = val.zext (GET_MODE_BITSIZE (oldmode));
-
-  return immed_double_int_const (val, mode);
+  /* If the caller did not tell us the old mode, then there is
+not much to do with respect to canonization.  We have to assume
+that all the bits are significant.  */
+  if (GET_MODE_CLASS (oldmode) != MODE_INT)
+   oldmode = MAX_MODE_INT;
+  wide_int w = wide_int::from (std::make_pair (x, oldmode),
+  GET_MODE_PRECISION (mode),
+  unsignedp ? UNSIGNED : SIGNED);
+  return immed_wide_int_const (w, mode);

canonicalization?

done


@@ -5301,10 +5271,10 @@ store_expr (tree exp, rtx target, int call_param_p,
bool nontemporal)
   &alt_rtl);
  }
  
-  /* If TEMP is a VOIDmode constant and the mode of the type of EXP is not

- the same as that of TARGET, adjust the constant.  This is needed, for
- example, in case it is a CONST_DOUBLE and we want only a word-sized
- value.  */
+  /* If TEMP is a VOIDmode constant and the mode of the type of EXP is
+ not the same as that of TARGET, adjust the constant.  This is
+ needed, for example, in case it is a CONST_DOUBLE or
+ CONST_WIDE_INT and we want only a word-sized value.  */
if (CONSTANT_P (temp) && GET_MODE (temp) == VOIDmode
&& TREE_CODE (exp) != ERROR_MARK
&& GET_MODE (target) != TYPE_MODE (TREE_TYPE (exp)))

Why reformatting the whole comment?

it is what emacs did.  I have formatted it back.

@@ -9481,11 +9459,19 @@ expand_expr_real_1 (tree exp, rtx target, enum
machine_mode tmode,
return decl_rtl;
  
  case INTEGER_CST:

-  temp = immed_double_const (TREE_INT_CST_LOW (exp),
-TREE_INT_CST_HIGH (exp), mode);
-
-  return temp;
-
+  {
+   tree type = TREE_TYPE (exp);

Redundant, see the beginning of the function.

fixed

+   /* One could argue that GET_MODE_PRECISION (TYPE_MODE (type))
+  should always be the same as TYPE_PRECISION (type).
+  However, it is not.  Since we are converting from tree to
+  rtl, we have to expose this ugly truth here.  */
+   temp = immed_wide_int_const (wide_int::from
+  (exp,
+   GET_MODE_PRECISION (TYPE_MODE (type)),
+   

Re: [wide-int] small cleanup in wide-int.*

2013-12-06 Thread Kenneth Zadeck
Richard asked to see the patch where i did the changes the way that he 
asked in his email. Doing it the way he wants potentially has advantages 
over the way that i did it, but his technique fails for non obvious 
reasons. The failure in building is:


g++ -c -g -DIN_GCC -fno-exceptions -fno-rtti 
-fasynchronous-unwind-tables -W -Wall -Wwrite-strings -Wcast-qual 
-Wno-format -Wmissing-format-attribute -pedantic -Wno-long-long 
-Wno-variadic-macros -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H 
-I. -I. -I../../gccBadMulVrp/gcc -I../../gccBadMulVrp/gcc/. 
-I../../gccBadMulVrp/gcc/../include 
-I../../gccBadMulVrp/gcc/../libcpp/include 
-I../../gccBadMulVrp/gcc/../libdecnumber 
-I../../gccBadMulVrp/gcc/../libdecnumber/bid -I../libdecnumber 
-I../../gccBadMulVrp/gcc/../libbacktrace -o tree-vrp.o -MT tree-vrp.o 
-MMD -MP -MF ./.deps/tree-vrp.TPo ../../gccBadMulVrp/gcc/tree-vrp.c

In file included from ../../gccBadMulVrp/gcc/double-int.h:23:0,
from ../../gccBadMulVrp/gcc/tree-core.h:28,
from ../../gccBadMulVrp/gcc/tree.h:23,
from ../../gccBadMulVrp/gcc/tree-vrp.c:26:
../../gccBadMulVrp/gcc/wide-int.h: In member function 
‘generic_wide_int& generic_wide_int::operator=(const T&) 
[with T = generic_wide_int >, storage = 
wi::extended_tree<1152>]’:
../../gccBadMulVrp/gcc/wide-int.h:701:3: instantiated from 
‘generic_wide_int& generic_wide_int::operator-=(const T&) [with T 
= generic_wide_int >, storage = 
wi::extended_tree<1152>, generic_wide_int = 
generic_wide_int >]’

../../gccBadMulVrp/gcc/tree-vrp.c:2646:13: instantiated from here
../../gccBadMulVrp/gcc/wide-int.h:860:3: error: no matching function for 
call to ‘generic_wide_int >::operator=(const 
generic_wide_int >&)’

../../gccBadMulVrp/gcc/wide-int.h:860:3: note: candidate is:
../../gccBadMulVrp/gcc/tree.h:4529:9: note: wi::extended_tree<1152>& 
wi::extended_tree<1152>::operator=(const wi::extended_tree<1152>&)
../../gccBadMulVrp/gcc/tree.h:4529:9: note: no known conversion for 
argument 1 from ‘const generic_wide_int >’ 
to ‘const wi::extended_tree<1152>&’

make[3]: *** [tree-vrp.o] Error 1
make[3]: Leaving directory `/home/zadeck/gcc/gbbBadMulVrp/gcc'
make[2]: *** [all-stage1-gcc] Error 2
make[2]: Leaving directory `/home/zadeck/gcc/gbbBadMulVrp'
make[1]: *** [stage1-bubble] Error 2
make[1]: Leaving directory `/home/zadeck/gcc/gbbBadMulVrp'
make: *** [all] Error 2
heracles:~/gcc/gbbBadMulVrp(9) cd ../gccBadMulVrp/



On 12/06/2013 11:45 AM, Kenneth Zadeck wrote:

On 12/03/2013 11:52 AM, Richard Sandiford wrote:

Kenneth Zadeck  writes:

Index: tree-vrp.c
===
--- tree-vrp.c (revision 205597)
+++ tree-vrp.c (working copy)
@@ -2611,22 +2611,28 @@ extract_range_from_binary_expr_1 (value_
signop sign = TYPE_SIGN (expr_type);
unsigned int prec = TYPE_PRECISION (expr_type);
- unsigned int prec2 = (prec * 2) + (sign == UNSIGNED ? 2 : 0);
if (range_int_cst_p (&vr0)
&& range_int_cst_p (&vr1)
&& TYPE_OVERFLOW_WRAPS (expr_type))
{
- wide_int sizem1 = wi::mask (prec, false, prec2);
- wide_int size = sizem1 + 1;
+ /* vrp_int is twice as wide as anything that the target
+ supports so it can support a full width multiply. No
+ need to add any more padding for an extra sign bit
+ because that comes with the way that WIDE_INT_MAX_ELTS is
+ defined. */
+ typedef FIXED_WIDE_INT (WIDE_INT_MAX_PRECISION * 2)
+ vrp_int;
+ vrp_int sizem1 = wi::mask  (prec, false);
+ vrp_int size = sizem1 + 1;
/* Extend the values using the sign of the result to PREC2.
From here on out, everthing is just signed math no matter
what the input types were. */
- wide_int min0 = wide_int::from (vr0.min, prec2, sign);
- wide_int max0 = wide_int::from (vr0.max, prec2, sign);
- wide_int min1 = wide_int::from (vr1.min, prec2, sign);
- wide_int max1 = wide_int::from (vr1.max, prec2, sign);
+ vrp_int min0 = wi::to_vrp (vr0.min);
+ vrp_int max0 = wi::to_vrp (vr0.max);
+ vrp_int min1 = wi::to_vrp (vr1.min);
+ vrp_int max1 = wi::to_vrp (vr1.max);

I think we should avoid putting to_vrp in tree.h if vrp_int is only
local to this block. Instead you could have:

typedef generic_wide_int
 > vrp_int_cst;
...
vrp_int_cst min0 = vr0.min;
vrp_int_cst max0 = vr0.max;
vrp_int_cst min1 = vr1.min;
vrp_int_cst max1 = vr1.max;

i did this in a different way because i had trouble doing it as you 
suggested. the short answer is that all of the vrp_int code is now 
local to tree-vrp.c which i think was your primary goal

@@ -228,15 +228,16 @@ along with GCC; see the file COPYING3.
#endif
/* The MAX_BITSIZE_MODE_ANY_INT is automatically generated by a very
- early examination of the target's mode file. Thus it is safe that
- some small multiple of this number is easily larger than any number
- that that target could compute. The place in the compiler that
- currently needs the widest ints is the code that determines th

Re: [wide-int] small cleanup in wide-int.*

2013-12-06 Thread Kenneth Zadeck

On 12/03/2013 11:52 AM, Richard Sandiford wrote:

Kenneth Zadeck  writes:

Index: tree-vrp.c
===
--- tree-vrp.c  (revision 205597)
+++ tree-vrp.c  (working copy)
@@ -2611,22 +2611,28 @@ extract_range_from_binary_expr_1 (value_
  
signop sign = TYPE_SIGN (expr_type);

unsigned int prec = TYPE_PRECISION (expr_type);
-  unsigned int prec2 = (prec * 2) + (sign == UNSIGNED ? 2 : 0);
  
if (range_int_cst_p (&vr0)

  && range_int_cst_p (&vr1)
  && TYPE_OVERFLOW_WRAPS (expr_type))
{
- wide_int sizem1 = wi::mask (prec, false, prec2);
- wide_int size = sizem1 + 1;
+ /* vrp_int is twice as wide as anything that the target
+supports so it can support a full width multiply.  No
+need to add any more padding for an extra sign bit
+because that comes with the way that WIDE_INT_MAX_ELTS is
+defined.  */
+ typedef FIXED_WIDE_INT (WIDE_INT_MAX_PRECISION * 2)
+   vrp_int;
+ vrp_int sizem1 = wi::mask  (prec, false);
+ vrp_int size = sizem1 + 1;
  
  	  /* Extend the values using the sign of the result to PREC2.

 From here on out, everthing is just signed math no matter
 what the input types were.  */
- wide_int min0 = wide_int::from (vr0.min, prec2, sign);
- wide_int max0 = wide_int::from (vr0.max, prec2, sign);
- wide_int min1 = wide_int::from (vr1.min, prec2, sign);
- wide_int max1 = wide_int::from (vr1.max, prec2, sign);
+ vrp_int min0 = wi::to_vrp (vr0.min);
+ vrp_int max0 = wi::to_vrp (vr0.max);
+ vrp_int min1 = wi::to_vrp (vr1.min);
+ vrp_int max1 = wi::to_vrp (vr1.max);

I think we should avoid putting to_vrp in tree.h if vrp_int is only
local to this block.  Instead you could have:

  typedef generic_wide_int
   > vrp_int_cst;
   ...
   vrp_int_cst min0 = vr0.min;
   vrp_int_cst max0 = vr0.max;
   vrp_int_cst min1 = vr1.min;
   vrp_int_cst max1 = vr1.max;

i did this in a different way because i had trouble doing it as you 
suggested.the short answer is that all of the vrp_int code is now 
local to tree-vrp.c which i think was your primary goal

@@ -228,15 +228,16 @@ along with GCC; see the file COPYING3.
  #endif
  
  /* The MAX_BITSIZE_MODE_ANY_INT is automatically generated by a very

-   early examination of the target's mode file.  Thus it is safe that
-   some small multiple of this number is easily larger than any number
-   that that target could compute.  The place in the compiler that
-   currently needs the widest ints is the code that determines the
-   range of a multiply.  This code needs 2n + 2 bits.  */
-
+   early examination of the target's mode file.  The WIDE_INT_MAX_ELTS
+   can accomodate at least 1 more bit so that unsigned numbers of that
+   mode can be represented.  This will accomodate every place in the
+   compiler except for a multiply routine in tree-vrp.  That function
+   makes its own arrangements for larger wide-ints.  */

I think we should drop the "This will accomodate..." bit, since it'll soon
get out of date.  Maybe something like:

 Note that it is still possible to create fixed_wide_ints that have
 precisions greater than MAX_BITSIZE_MODE_ANY_INT.  This can be useful
 when representing a double-width multiplication result, for example.  */

done

  #define WIDE_INT_MAX_ELTS \
-  ((4 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \
-   / HOST_BITS_PER_WIDE_INT)
+  (((MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1)\
+/ HOST_BITS_PER_WIDE_INT) + 1)

I think this should be:

   (MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT + 1)

We only need an extra HWI if MAX_BITSIZE_MODE_ANY_INT is an exact multiple
of HOST_BITS_PER_WIDE_INT.

we will do this later when some other issues that Eric B raised are settled.

ok to commit to the branch?

kenny

Looks good to me otherwise FWIW.

You probably already realise this, but for avoidance of doubt, Richard
was also asking that we reduce MAX_BITSIZE_MODE_ANY_INT to 128 on x86_64,
since that's the largest scalar_mode_supported_p mode.

Thanks,
Richard



Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c	(revision 205726)
+++ gcc/tree-vrp.c	(working copy)
@@ -2213,6 +2213,22 @@ extract_range_from_multiplicative_op_1 (
 set_value_range (vr, type, min, max, NULL);
 }
 
+/* vrp_int is twice as wide as anything that the target supports so it
+   can support a full width multiply.  No need to add any more padding
+   for an extra sign bit because that comes with the way that
+   WIDE_INT_MAX_ELTS is defined.  */ 
+typedef FIXED_WIDE_INT (WIDE_INT_MAX_PRECISION * 2) vrp_int;
+namespace wi
+{
+  generic_wide_int  > to_vr

Re: [patch] combine ICE fix

2013-12-04 Thread Kenneth Zadeck

On 12/03/2013 02:38 PM, Jeff Law wrote:

On 12/03/13 12:25, Kenneth Zadeck wrote:

On 12/03/2013 01:52 PM, Mike Stump wrote:

On Dec 2, 2013, at 10:26 PM, Jeff Law  wrote:

On 11/27/13 17:13, Cesar Philippidis wrote:
I looked into adding support for incremental DF scanning from 
df*.[ch]

in combine but there are a couple of problems. First of all, combine
does its own DF analysis. It does so because its usage falls under 
this

category (df-core.c):

c) If the pass modifies insns several times, this incremental
   updating may be expensive.

Furthermore, combine's DF relies on the DF scanning to be 
deferred, so
the DF_REF_DEF_COUNT values would be off. Eg, calls to 
SET_INSN_DELETED

take place before it updates the notes for those insns. Also, combine
has a tendency to undo its changes occasionally.

I think at this stage of the release cycle, converting combine to
incremental DF is probably a no-go.  However, we should keep it in
mind for the future -- while hairy I'd really like to see that happen
in the long term.

I think Kenny has some thoughts in this area.  I'll cc him to ensure
he sees it.

it is the tendency to undo things (i would use the word "frequently"
rather than) occasionally that kept me from doing this years ago.
Shove a bunch of things together, simplify, then try to recognize the 
result.  If that fails, undo everything.


In theory, this could be replaced by making a copy of the original, 
doing the combination/simplification, then recognition. If successful, 
then update DF and remove the original I3, if not successful, drop the 
copy.  That avoids the undo nonsense.


jeff

that could certainly work.


Re: [wide-int] Add fast path for hosts with HWI widening multiplication

2013-12-04 Thread Kenneth Zadeck

On 12/04/2013 07:56 AM, Richard Sandiford wrote:

Richard Sandiford  writes:

This patch handles multiplications using a single HWIxHWI->2HWI multiplication
on hosts that have one.  This removes all uses of the slow (half-HWI) path
for insn-recog.ii.  The slow path is still used 58 times for cp/parser.ii
and 168 times for fold-const.ii, but at that kind of level it shouldn't
matter much.

I followed Joseph's suggestion and reused longlong.h.  I copied it from
libgcc rather than glibc since it seemed better for GCC to have a single
version across both gcc/ and libgcc/.  I can put it in include/ if that
seems better.

I've committed the patch to move longlong.h to trunk and merged back
to the branch, so all that's left is the wide-int.cc patch.  OK to install?

Thanks,
Richard


Index: gcc/wide-int.cc
===
--- gcc/wide-int.cc 2013-12-03 23:59:08.133658567 +
+++ gcc/wide-int.cc 2013-12-04 12:55:28.466895358 +
@@ -27,6 +27,16 @@ along with GCC; see the file COPYING3.
  #include "tree.h"
  #include "dumpfile.h"
  
+#if GCC_VERSION >= 3000

+#define W_TYPE_SIZE HOST_BITS_PER_WIDE_INT
+typedef unsigned HOST_HALF_WIDE_INT UHWtype;
+typedef unsigned HOST_WIDE_INT UWtype;
+typedef unsigned int UQItype __attribute__ ((mode (QI)));
+typedef unsigned int USItype __attribute__ ((mode (SI)));
+typedef unsigned int UDItype __attribute__ ((mode (DI)));
+#include "longlong.h"
+#endif
+
  /* This is the maximal size of the buffer needed for dump.  */
  const unsigned int MAX_SIZE = (4 * (MAX_BITSIZE_MODE_ANY_INT / 4
+ (MAX_BITSIZE_MODE_ANY_INT
@@ -1255,8 +1265,8 @@ wi_pack (unsigned HOST_WIDE_INT *result,
 record in *OVERFLOW whether the result overflowed.  SGN controls
 the signedness and is used to check overflow or if HIGH is set.  */
  unsigned int
-wi::mul_internal (HOST_WIDE_INT *val, const HOST_WIDE_INT *op1,
- unsigned int op1len, const HOST_WIDE_INT *op2,
+wi::mul_internal (HOST_WIDE_INT *val, const HOST_WIDE_INT *op1val,
+ unsigned int op1len, const HOST_WIDE_INT *op2val,
  unsigned int op2len, unsigned int prec, signop sgn,
  bool *overflow, bool high)
  {
@@ -1285,24 +1295,53 @@ wi::mul_internal (HOST_WIDE_INT *val, co
if (needs_overflow)
  *overflow = false;
  
+  wide_int_ref op1 = wi::storage_ref (op1val, op1len, prec);

+  wide_int_ref op2 = wi::storage_ref (op2val, op2len, prec);
+
/* This is a surprisingly common case, so do it first.  */
-  if ((op1len == 1 && op1[0] == 0) || (op2len == 1 && op2[0] == 0))
+  if (op1 == 0 || op2 == 0)
  {
val[0] = 0;
return 1;
  }
  
+#ifdef umul_ppmm

+  if (sgn == UNSIGNED)
+{
+  /* If the inputs are single HWIs and the output has room for at
+least two HWIs, we can use umul_ppmm directly.  */
+  if (prec >= HOST_BITS_PER_WIDE_INT * 2
+ && wi::fits_uhwi_p (op1)
+ && wi::fits_uhwi_p (op2))
+   {
+ umul_ppmm (val[1], val[0], op1.ulow (), op2.ulow ());
+ return 1 + (val[1] != 0 || val[0] < 0);
+   }
+  /* Likewise if the output is a full single HWI, except that the
+upper HWI of the result is only used for determining overflow.
+(We handle this case inline when overflow isn't needed.)  */
+  else if (prec == HOST_BITS_PER_WIDE_INT)
+   {
+ unsigned HOST_WIDE_INT upper;
+ umul_ppmm (upper, val[0], op1.ulow (), op2.ulow ());
+ if (needs_overflow)
+   *overflow = (upper != 0);
+ return 1;
+   }
+}
+#endif
+
/* Handle multiplications by 1.  */
-  if (op1len == 1 && op1[0] == 1)
+  if (op1 == 1)
  {
for (i = 0; i < op2len; i++)
-   val[i] = op2[i];
+   val[i] = op2val[i];
return op2len;
  }
-  if (op2len == 1 && op2[0] == 1)
+  if (op2 == 1)
  {
for (i = 0; i < op1len; i++)
-   val[i] = op1[i];
+   val[i] = op1val[i];
return op1len;
  }
  
@@ -1316,13 +1355,13 @@ wi::mul_internal (HOST_WIDE_INT *val, co
  
if (sgn == SIGNED)

{
- o0 = sext_hwi (op1[0], prec);
- o1 = sext_hwi (op2[0], prec);
+ o0 = op1.to_shwi ();
+ o1 = op2.to_shwi ();
}
else
{
- o0 = zext_hwi (op1[0], prec);
- o1 = zext_hwi (op2[0], prec);
+ o0 = op1.to_uhwi ();
+ o1 = op2.to_uhwi ();
}
  
r = o0 * o1;

@@ -1344,9 +1383,9 @@ wi::mul_internal (HOST_WIDE_INT *val, co
  }
  
/* We do unsigned mul and then correct it.  */

-  wi_unpack (u, (const unsigned HOST_WIDE_INT*)op1, op1len,
+  wi_unpack (u, (const unsigned HOST_WIDE_INT *) op1val, op1len,
 half_blocks_needed, prec, SIGNED);
-  wi_unpack (v, (const unsigned HOST_WIDE_INT*)op2, op2len,
+  wi_unpack (v, (const unsigned HOST_WIDE_INT *) op2val, op2len,
 half_blocks_needed, prec, SIGNED);
  
 

Re: [patch] combine ICE fix

2013-12-03 Thread Kenneth Zadeck

On 12/03/2013 01:52 PM, Mike Stump wrote:

On Dec 2, 2013, at 10:26 PM, Jeff Law  wrote:

On 11/27/13 17:13, Cesar Philippidis wrote:

I looked into adding support for incremental DF scanning from df*.[ch]
in combine but there are a couple of problems. First of all, combine
does its own DF analysis. It does so because its usage falls under this
category (df-core.c):

c) If the pass modifies insns several times, this incremental
   updating may be expensive.

Furthermore, combine's DF relies on the DF scanning to be deferred, so
the DF_REF_DEF_COUNT values would be off. Eg, calls to SET_INSN_DELETED
take place before it updates the notes for those insns. Also, combine
has a tendency to undo its changes occasionally.

I think at this stage of the release cycle, converting combine to incremental 
DF is probably a no-go.  However, we should keep it in mind for the future -- 
while hairy I'd really like to see that happen in the long term.

I think Kenny has some thoughts in this area.  I'll cc him to ensure he sees it.
it is the tendency to undo things (i would use the word "frequently" 
rather than) occasionally that kept me from doing this years ago.


kenny


Re: [wide-int] small cleanup in wide-int.*

2013-12-03 Thread Kenneth Zadeck

On 11/29/2013 05:24 AM, Richard Biener wrote:

On Thu, Nov 28, 2013 at 6:11 PM, Kenneth Zadeck
 wrote:

This patch does three things in wide-int:

1) it cleans up some comments.
2) removes a small amount of trash.
3) it changes the max size of the wide int from being 4x of
MAX_BITSIZE_MODE_ANY_INT to 2x +1.   This should improve large muls and divs
as well as perhaps help with some cache behavior.

@@ -235,8 +233,8 @@ along with GCC; see the file COPYING3.
 range of a multiply.  This code needs 2n + 2 bits.  */

  #define WIDE_INT_MAX_ELTS \
-  ((4 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \
-   / HOST_BITS_PER_WIDE_INT)
+  (((2 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \
+/ HOST_BITS_PER_WIDE_INT) + 1)

I always wondered why VRP (if that is the only reason we do 2*n+1)
cannot simply use FIXED_WIDE_INT(MAX_BITSIZE_MODE_ANY_INT*2 + 1)?
Other widest_int users should not suffer IMHO.  widest_int should
strictly cover all modes that the target can do any arithmetic on
(thus not XImode or OImode on x86_64).

Richard.


ok to commit
enclosed is the code to change the large multiplies in tree-vrp as well 
as sets the size of the WIDE_INT_MAX_ELTS as we have discussed.


Bootstrapped and tested on x86-64.

ok to commit?

kenny
Index: tree-vrp.c
===
--- tree-vrp.c	(revision 205597)
+++ tree-vrp.c	(working copy)
@@ -2611,22 +2611,28 @@ extract_range_from_binary_expr_1 (value_
 
   signop sign = TYPE_SIGN (expr_type);
   unsigned int prec = TYPE_PRECISION (expr_type);
-  unsigned int prec2 = (prec * 2) + (sign == UNSIGNED ? 2 : 0);
 
   if (range_int_cst_p (&vr0)
 	  && range_int_cst_p (&vr1)
 	  && TYPE_OVERFLOW_WRAPS (expr_type))
 	{
-	  wide_int sizem1 = wi::mask (prec, false, prec2);
-	  wide_int size = sizem1 + 1;
+	  /* vrp_int is twice as wide as anything that the target
+	 supports so it can support a full width multiply.  No
+	 need to add any more padding for an extra sign bit
+	 because that comes with the way that WIDE_INT_MAX_ELTS is
+	 defined.  */ 
+	  typedef FIXED_WIDE_INT (WIDE_INT_MAX_PRECISION * 2) 
+	vrp_int;
+	  vrp_int sizem1 = wi::mask  (prec, false);
+	  vrp_int size = sizem1 + 1;
 
 	  /* Extend the values using the sign of the result to PREC2.
 	 From here on out, everthing is just signed math no matter
 	 what the input types were.  */
-	  wide_int min0 = wide_int::from (vr0.min, prec2, sign);
-	  wide_int max0 = wide_int::from (vr0.max, prec2, sign);
-	  wide_int min1 = wide_int::from (vr1.min, prec2, sign);
-	  wide_int max1 = wide_int::from (vr1.max, prec2, sign);
+	  vrp_int min0 = wi::to_vrp (vr0.min);
+	  vrp_int max0 = wi::to_vrp (vr0.max);
+	  vrp_int min1 = wi::to_vrp (vr1.min);
+	  vrp_int max1 = wi::to_vrp (vr1.max);
 
 	  /* Canonicalize the intervals.  */
 	  if (sign == UNSIGNED)
@@ -2644,17 +2650,17 @@ extract_range_from_binary_expr_1 (value_
 		}
 	}
 
-	  wide_int prod0 = min0 * min1;
-	  wide_int prod1 = min0 * max1;
-	  wide_int prod2 = max0 * min1;
-	  wide_int prod3 = max0 * max1;
+	  vrp_int prod0 = min0 * min1;
+	  vrp_int prod1 = min0 * max1;
+	  vrp_int prod2 = max0 * min1;
+	  vrp_int prod3 = max0 * max1;
 
 	  /* Sort the 4 products so that min is in prod0 and max is in
 	 prod3.  */
 	  /* min0min1 > max0max1 */
 	  if (wi::gts_p (prod0, prod3))
 	{
-	  wide_int tmp = prod3;
+	  vrp_int tmp = prod3;
 	  prod3 = prod0;
 	  prod0 = tmp;
 	}
@@ -2662,21 +2668,21 @@ extract_range_from_binary_expr_1 (value_
 	  /* min0max1 > max0min1 */
 	  if (wi::gts_p (prod1, prod2))
 	{
-	  wide_int tmp = prod2;
+	  vrp_int tmp = prod2;
 	  prod2 = prod1;
 	  prod1 = tmp;
 	}
 
 	  if (wi::gts_p (prod0, prod1))
 	{
-	  wide_int tmp = prod1;
+	  vrp_int tmp = prod1;
 	  prod1 = prod0;
 	  prod0 = tmp;
 	}
 
 	  if (wi::gts_p (prod2, prod3))
 	{
-	  wide_int tmp = prod3;
+	  vrp_int tmp = prod3;
 	  prod3 = prod2;
 	  prod2 = tmp;
 	}
Index: tree.h
===
--- tree.h	(revision 205597)
+++ tree.h	(working copy)
@@ -4565,10 +4565,12 @@ namespace wi
 static const unsigned int precision = N;
   };
 
-  generic_wide_int  >
+  generic_wide_int  >
   to_widest (const_tree);
 
   generic_wide_int  > to_offset (const_tree);
+
+  generic_wide_int  > to_vrp (const_tree);
 }
 
 inline unsigned int
@@ -4586,7 +4588,7 @@ wi::int_traits ::decompose (
 			  precision);
 }
 
-inline generic_wide_int  >
+inline generic_wide_int  >
 wi::to_widest (const_tree t)
 {
   return t;
@@ -4597,6 +4599,12 @@ wi::to_offset (const_tree t)
 {
   return t;
 }
+
+inline generic_wide_int  >
+wi::to_vrp (const_tree t)
+{
+  return t;
+}
 
 template 
 inline wi::extended_tree ::extended_tree (const_tree t)
Index: wide-int.h
===

Re: [wide-int] Drop some lingering uses of precision 0

2013-12-03 Thread Kenneth Zadeck

if i did not already say so, this is fine.

kenny

On 12/02/2013 03:20 PM, Richard Sandiford wrote:

I noticed that there were still a couple of tests for zero precision.
This patch replaces them with asserts when handling separately-supplied
precisions and simply drops them when handling existing wide_ints.
(The idea is that most code would break for zero precision wide_ints
and only asserting in some use sites would be inconsistent.)

Also, to_shwi is called either with a nonzero precision argument or
with no argument.  I think it'd be clearer to split the two cases into
separate (overloaded) functions.  It's also more efficient, since the
compiler doesn't know that a variable-precision argument must be nonzero.

Tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard


Index: gcc/wide-int.cc
===
--- gcc/wide-int.cc 2013-12-02 20:03:50.112581766 +
+++ gcc/wide-int.cc 2013-12-02 20:12:22.178998274 +
@@ -275,9 +275,8 @@ wi::from_mpz (const_tree type, mpz_t x,
  wide_int
  wi::max_value (unsigned int precision, signop sgn)
  {
-  if (precision == 0)
-return shwi (0, precision);
-  else if (sgn == UNSIGNED)
+  gcc_checking_assert (precision != 0);
+  if (sgn == UNSIGNED)
  /* The unsigned max is just all ones.  */
  return shwi (-1, precision);
else
@@ -290,7 +289,8 @@ wi::max_value (unsigned int precision, s
  wide_int
  wi::min_value (unsigned int precision, signop sgn)
  {
-  if (precision == 0 || sgn == UNSIGNED)
+  gcc_checking_assert (precision != 0);
+  if (sgn == UNSIGNED)
  return uhwi (0, precision);
else
  /* The signed min is all zeros except the top bit.  This must be
@@ -1487,9 +1487,6 @@ wi::popcount (const wide_int_ref &x)
unsigned int i;
int count;
  
-  if (x.precision == 0)

-return 0;
-
/* The high order block is special if it is the last block and the
   precision is not an even multiple of HOST_BITS_PER_WIDE_INT.  We
   have to clear out any ones above the precision before doing
@@ -2082,10 +2079,6 @@ wi::ctz (const wide_int_ref &x)
  int
  wi::exact_log2 (const wide_int_ref &x)
  {
-  /* 0-precision values can only hold 0.  */
-  if (x.precision == 0)
-return -1;
-
/* Reject cases where there are implicit -1 blocks above HIGH.  */
if (x.len * HOST_BITS_PER_WIDE_INT < x.precision && x.sign_mask () < 0)
  return -1;
Index: gcc/wide-int.h
===
--- gcc/wide-int.h  2013-12-02 19:52:05.424989079 +
+++ gcc/wide-int.h  2013-12-02 20:12:22.179998282 +
@@ -644,8 +644,10 @@ class GTY(()) generic_wide_int : public
generic_wide_int (const T &, unsigned int);
  
/* Conversions.  */

-  HOST_WIDE_INT to_shwi (unsigned int = 0) const;
-  unsigned HOST_WIDE_INT to_uhwi (unsigned int = 0) const;
+  HOST_WIDE_INT to_shwi (unsigned int) const;
+  HOST_WIDE_INT to_shwi () const;
+  unsigned HOST_WIDE_INT to_uhwi (unsigned int) const;
+  unsigned HOST_WIDE_INT to_uhwi () const;
HOST_WIDE_INT to_short_addr () const;
  
/* Public accessors for the interior of a wide int.  */

@@ -735,18 +737,23 @@ inline generic_wide_int ::gener
  inline HOST_WIDE_INT
  generic_wide_int ::to_shwi (unsigned int precision) const
  {
-  if (precision == 0)
-{
-  if (is_sign_extended)
-   return this->get_val ()[0];
-  precision = this->get_precision ();
-}
if (precision < HOST_BITS_PER_WIDE_INT)
  return sext_hwi (this->get_val ()[0], precision);
else
  return this->get_val ()[0];
  }
  
+/* Return THIS as a signed HOST_WIDE_INT, in its natural precision.  */

+template 
+inline HOST_WIDE_INT
+generic_wide_int ::to_shwi () const
+{
+  if (is_sign_extended)
+return this->get_val ()[0];
+  else
+return to_shwi (this->get_precision ());
+}
+
  /* Return THIS as an unsigned HOST_WIDE_INT, zero-extending from
 PRECISION.  If THIS does not fit in PRECISION, the information
 is lost.  */
@@ -754,14 +761,20 @@ generic_wide_int ::to_shwi (uns
  inline unsigned HOST_WIDE_INT
  generic_wide_int ::to_uhwi (unsigned int precision) const
  {
-  if (precision == 0)
-precision = this->get_precision ();
if (precision < HOST_BITS_PER_WIDE_INT)
  return zext_hwi (this->get_val ()[0], precision);
else
  return this->get_val ()[0];
  }
  
+/* Return THIS as an signed HOST_WIDE_INT, in its natural precision.  */

+template 
+inline unsigned HOST_WIDE_INT
+generic_wide_int ::to_uhwi () const
+{
+  return to_uhwi (this->get_precision ());
+}
+
  /* TODO: The compiler is half converted from using HOST_WIDE_INT to
 represent addresses to using offset_int to represent addresses.
 We use to_short_addr at the interface from new code to old,
@@ -2289,9 +2302,7 @@ wi::add (const T1 &x, const T2 &y, signo
unsigned HOST_WIDE_INT xl = xi.ulow ();
unsigned HOST_WIDE_INT yl = yi.ulow ();
unsigned HOST_WIDE_INT resultl

Re: [wide-int] i am concerned about the typedef for widest-int.

2013-12-02 Thread Kenneth Zadeck

On 12/02/2013 03:34 PM, Richard Sandiford wrote:

Kenneth Zadeck  writes:

see wide-int.h around line 290

the MAX_BITSIZE_MODE_ANY_INT is the largest mode on the machine. however
if the value coming in is an unsigned number of the type the represents
that mode, don't we loose a bit?

That was the +1 mentioned here:

http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03745.html

I.e. it should be "widest supported arithmetic input + 1".

Thanks,
Richard

do we want 129 or do we want to round that up to the next hwi?


[wide-int] i am concerned about the typedef for widest-int.

2013-12-02 Thread Kenneth Zadeck

see wide-int.h around line 290

the MAX_BITSIZE_MODE_ANY_INT is the largest mode on the machine. however 
if the value coming in is an unsigned number of the type the represents 
that mode, don't we loose a bit?


kenny


Re: [wide-int] small cleanup in wide-int.*

2013-12-02 Thread Kenneth Zadeck

committed as revision 205599 to wide-int branch.

kenny

On 12/02/2013 05:50 AM, Richard Biener wrote:

On Sat, Nov 30, 2013 at 1:55 AM, Kenneth Zadeck
 wrote:

Richi,

this is the first of either 2 or 3 patches to fix this.There are two
places that need be fixed for us to do 1X + 1 and this patch fixes the first
one.   There was an unnecessary call to mul_full and this was the only call
to mul_full.   So this patch removes the call and also the function itself.

The other place is the tree-vpn that is discussed here and will be dealt
with in the other patches.

tested on x86-64.

Ok to commit?

Ok.

Thanks,
Richard.


Kenny



On 11/29/2013 05:24 AM, Richard Biener wrote:

On Thu, Nov 28, 2013 at 6:11 PM, Kenneth Zadeck
 wrote:

This patch does three things in wide-int:

1) it cleans up some comments.
2) removes a small amount of trash.
3) it changes the max size of the wide int from being 4x of
MAX_BITSIZE_MODE_ANY_INT to 2x +1.   This should improve large muls and
divs
as well as perhaps help with some cache behavior.

@@ -235,8 +233,8 @@ along with GCC; see the file COPYING3.
  range of a multiply.  This code needs 2n + 2 bits.  */

   #define WIDE_INT_MAX_ELTS \
-  ((4 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \
-   / HOST_BITS_PER_WIDE_INT)
+  (((2 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \
+/ HOST_BITS_PER_WIDE_INT) + 1)

I always wondered why VRP (if that is the only reason we do 2*n+1)
cannot simply use FIXED_WIDE_INT(MAX_BITSIZE_MODE_ANY_INT*2 + 1)?
Other widest_int users should not suffer IMHO.  widest_int should
strictly cover all modes that the target can do any arithmetic on
(thus not XImode or OImode on x86_64).

Richard.


ok to commit




Index: gcc/wide-int.cc
===
--- gcc/wide-int.cc	(revision 205597)
+++ gcc/wide-int.cc	(working copy)
@@ -1247,22 +1247,18 @@ wi_pack (unsigned HOST_WIDE_INT *result,
 }
 
 /* Multiply Op1 by Op2.  If HIGH is set, only the upper half of the
-   result is returned.  If FULL is set, the entire result is returned
-   in a mode that is twice the width of the inputs.  However, that
-   mode needs to exist if the value is to be usable.  Clients that use
-   FULL need to check for this.
-
-   If HIGH or FULL are not set, throw away the upper half after the
-   check is made to see if it overflows.  Unfortunately there is no
-   better way to check for overflow than to do this.  If OVERFLOW is
-   nonnull, record in *OVERFLOW whether the result overflowed.  SGN
-   controls the signedness and is used to check overflow or if HIGH or
-   FULL is set.  */
+   result is returned.  
+
+   If HIGH is not set, throw away the upper half after the check is
+   made to see if it overflows.  Unfortunately there is no better way
+   to check for overflow than to do this.  If OVERFLOW is nonnull,
+   record in *OVERFLOW whether the result overflowed.  SGN controls
+   the signedness and is used to check overflow or if HIGH is set.  */
 unsigned int
 wi::mul_internal (HOST_WIDE_INT *val, const HOST_WIDE_INT *op1,
 		  unsigned int op1len, const HOST_WIDE_INT *op2,
 		  unsigned int op2len, unsigned int prec, signop sgn,
-		  bool *overflow, bool high, bool full)
+		  bool *overflow, bool high)
 {
   unsigned HOST_WIDE_INT o0, o1, k, t;
   unsigned int i;
@@ -1313,7 +1309,7 @@ wi::mul_internal (HOST_WIDE_INT *val, co
   /* If we need to check for overflow, we can only do half wide
  multiplies quickly because we need to look at the top bits to
  check for the overflow.  */
-  if ((high || full || needs_overflow)
+  if ((high || needs_overflow)
   && (prec <= HOST_BITS_PER_HALF_WIDE_INT))
 {
   unsigned HOST_WIDE_INT r;
@@ -1372,7 +1368,7 @@ wi::mul_internal (HOST_WIDE_INT *val, co
 
   /* We did unsigned math above.  For signed we must adjust the
  product (assuming we need to see that).  */
-  if (sgn == SIGNED && (full || high || needs_overflow))
+  if (sgn == SIGNED && (high || needs_overflow))
 {
   unsigned HOST_WIDE_INT b;
   if (op1[op1len-1] < 0)
@@ -1420,13 +1416,7 @@ wi::mul_internal (HOST_WIDE_INT *val, co
 	  *overflow = true;
 }
 
-  if (full)
-{
-  /* compute [2prec] <- [prec] * [prec] */
-  wi_pack ((unsigned HOST_WIDE_INT *) val, r, 2 * half_blocks_needed);
-  return canonize (val, blocks_needed * 2, prec * 2);
-}
-  else if (high)
+  if (high)
 {
   /* compute [prec] <- ([prec] * [prec]) >> [prec] */
   wi_pack ((unsigned HOST_WIDE_INT *) val,
Index: gcc/fold-const.c
===
--- gcc/fold-const.c	(revision 205597)
+++ gcc/fold-const.c	(working copy)
@@ -5962,11 +5962,12 @@ extract_muldiv_1 (tree t, tree c, enum t
 	 assuming no overflow.  */
   if (tcode == code)
 	{
-	  bool overflow_p;
+	  bool overflow_p = false;
+	  bool overflow_mul_p;
 	  signop sign = TYPE_SIGN (ctype);
-	 

Re: [wide-int] small cleanup in wide-int.*

2013-11-29 Thread Kenneth Zadeck

Richi,

this is the first of either 2 or 3 patches to fix this.There are two 
places that need be fixed for us to do 1X + 1 and this patch fixes the 
first one.   There was an unnecessary call to mul_full and this was the 
only call to mul_full.   So this patch removes the call and also the 
function itself.


The other place is the tree-vpn that is discussed here and will be dealt 
with in the other patches.


tested on x86-64.

Ok to commit?

Kenny


On 11/29/2013 05:24 AM, Richard Biener wrote:

On Thu, Nov 28, 2013 at 6:11 PM, Kenneth Zadeck
 wrote:

This patch does three things in wide-int:

1) it cleans up some comments.
2) removes a small amount of trash.
3) it changes the max size of the wide int from being 4x of
MAX_BITSIZE_MODE_ANY_INT to 2x +1.   This should improve large muls and divs
as well as perhaps help with some cache behavior.

@@ -235,8 +233,8 @@ along with GCC; see the file COPYING3.
 range of a multiply.  This code needs 2n + 2 bits.  */

  #define WIDE_INT_MAX_ELTS \
-  ((4 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \
-   / HOST_BITS_PER_WIDE_INT)
+  (((2 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \
+/ HOST_BITS_PER_WIDE_INT) + 1)

I always wondered why VRP (if that is the only reason we do 2*n+1)
cannot simply use FIXED_WIDE_INT(MAX_BITSIZE_MODE_ANY_INT*2 + 1)?
Other widest_int users should not suffer IMHO.  widest_int should
strictly cover all modes that the target can do any arithmetic on
(thus not XImode or OImode on x86_64).

Richard.


ok to commit


Index: fold-const.c
===
--- fold-const.c	(revision 205496)
+++ fold-const.c	(working copy)
@@ -5962,11 +5962,12 @@ extract_muldiv_1 (tree t, tree c, enum t
 	 assuming no overflow.  */
   if (tcode == code)
 	{
-	  bool overflow_p;
+	  bool overflow_p = false;
+	  bool overflow_mul_p;
 	  signop sign = TYPE_SIGN (ctype);
-	  wide_int mul = wi::mul_full (op1, c, sign);
+	  wide_int mul = wi::mul (op1, c, sign, &overflow_mul_p);
 	  overflow_p = TREE_OVERFLOW (c) | TREE_OVERFLOW (op1);
-	  if (!wi::fits_to_tree_p (mul, ctype)
+	  if (overflow_mul_p
 	  && ((sign == UNSIGNED && tcode != MULT_EXPR) || sign == SIGNED))
 	overflow_p = true;
 	  if (!overflow_p)
Index: wide-int.cc
===
--- wide-int.cc	(revision 205508)
+++ wide-int.cc	(working copy)
@@ -1247,22 +1247,18 @@ wi_pack (unsigned HOST_WIDE_INT *result,
 }
 
 /* Multiply Op1 by Op2.  If HIGH is set, only the upper half of the
-   result is returned.  If FULL is set, the entire result is returned
-   in a mode that is twice the width of the inputs.  However, that
-   mode needs to exist if the value is to be usable.  Clients that use
-   FULL need to check for this.
-
-   If HIGH or FULL are not set, throw away the upper half after the
-   check is made to see if it overflows.  Unfortunately there is no
-   better way to check for overflow than to do this.  If OVERFLOW is
-   nonnull, record in *OVERFLOW whether the result overflowed.  SGN
-   controls the signedness and is used to check overflow or if HIGH or
-   FULL is set.  */
+   result is returned.  
+
+   If HIGH is not set, throw away the upper half after the check is
+   made to see if it overflows.  Unfortunately there is no better way
+   to check for overflow than to do this.  If OVERFLOW is nonnull,
+   record in *OVERFLOW whether the result overflowed.  SGN controls
+   the signedness and is used to check overflow or if HIGH is set.  */
 unsigned int
 wi::mul_internal (HOST_WIDE_INT *val, const HOST_WIDE_INT *op1,
 		  unsigned int op1len, const HOST_WIDE_INT *op2,
 		  unsigned int op2len, unsigned int prec, signop sgn,
-		  bool *overflow, bool high, bool full)
+		  bool *overflow, bool high)
 {
   unsigned HOST_WIDE_INT o0, o1, k, t;
   unsigned int i;
@@ -1292,7 +1288,7 @@ wi::mul_internal (HOST_WIDE_INT *val, co
   /* If we need to check for overflow, we can only do half wide
  multiplies quickly because we need to look at the top bits to
  check for the overflow.  */
-  if ((high || full || needs_overflow)
+  if ((high || needs_overflow)
   && (prec <= HOST_BITS_PER_HALF_WIDE_INT))
 {
   unsigned HOST_WIDE_INT r;
@@ -1351,7 +1347,7 @@ wi::mul_internal (HOST_WIDE_INT *val, co
 
   /* We did unsigned math above.  For signed we must adjust the
  product (assuming we need to see that).  */
-  if (sgn == SIGNED && (full || high || needs_overflow))
+  if (sgn == SIGNED && (high || needs_overflow))
 {
   unsigned HOST_WIDE_INT b;
   if (op1[op1len-1] < 0)
@@ -1399,13 +1395,7 @@ wi::mul_internal (HOST_WIDE_INT *val, co
 	  *overflow = true;
 }
 
-  if (full)
-{
-  /* compute [2prec] <- [prec] * [prec] */
-  wi_pack ((unsigned HOST_WIDE_INT *) val, r, 2 * half_blocks_needed);
-  return canonize (val, blocks_needed

Re: [wide-int] small cleanup in wide-int.*

2013-11-29 Thread Kenneth Zadeck

could be.
On 11/29/2013 06:57 AM, Richard Sandiford wrote:

Looks good to me FWIW, except:

Kenneth Zadeck  writes:

@@ -112,11 +114,11 @@ along with GCC; see the file COPYING3.
 two, the default is the prefered representation.
  
 All three flavors of wide_int are represented as a vector of

-   HOST_WIDE_INTs.  The default and widest_int vectors contain enough elements
-   to hold a value of MAX_BITSIZE_MODE_ANY_INT bits.  offset_int contains only
-   enough elements to hold ADDR_MAX_PRECISION bits.  The values are stored
-   in the vector with the least significant HOST_BITS_PER_WIDE_INT bits
-   in element 0.
+   HOST_WIDE_INTs.  The default and widest_int vectors contain enough
+   elements to hold a value of MAX_BITSIZE_MODE_ANY_INT bits.
+   offset_int contains only enough elements to hold ADDR_MAX_PRECISION
+   bits.  The values are stored in the vector with the least
+   significant HOST_BITS_PER_WIDE_INT bits in element 0.
  
 The default wide_int contains three fields: the vector (VAL),

 the precision and a length (LEN).  The length is the number of HWIs

is this just changing the line breaks?

Thanks,
Richard




Re: [wide-int] small cleanup in wide-int.*

2013-11-29 Thread Kenneth Zadeck




On Nov 29, 2013, at 4:24 AM, Richard Biener  wrote:

> On Thu, Nov 28, 2013 at 6:11 PM, Kenneth Zadeck
>  wrote:
>> This patch does three things in wide-int:
>> 
>> 1) it cleans up some comments.
>> 2) removes a small amount of trash.
>> 3) it changes the max size of the wide int from being 4x of
>> MAX_BITSIZE_MODE_ANY_INT to 2x +1.   This should improve large muls and divs
>> as well as perhaps help with some cache behavior.
> 
> @@ -235,8 +233,8 @@ along with GCC; see the file COPYING3.
>range of a multiply.  This code needs 2n + 2 bits.  */
> 
> #define WIDE_INT_MAX_ELTS \
> -  ((4 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \
> -   / HOST_BITS_PER_WIDE_INT)
> +  (((2 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \
> +/ HOST_BITS_PER_WIDE_INT) + 1)
> 
> I always wondered why VRP (if that is the only reason we do 2*n+1)
> cannot simply use FIXED_WIDE_INT(MAX_BITSIZE_MODE_ANY_INT*2 + 1)?
> Other widest_int users should not suffer IMHO.  widest_int should
> strictly cover all modes that the target can do any arithmetic on
> (thus not XImode or OImode on x86_64).
> 
> Richard.

We did not used to be able to do that.  This code is much older.   So how do 
you detect if someone has a mode and does not do math it? I had thought that 
the only reason that people defined these mode and did not use them was because 
it took a long time to properly understand how broken gcc really is.   Anyway 
there is I call to mul-full that would need reworking.
> 
>> ok to commit


Re: [wide-int] Handle more cmps and cmpu cases inline

2013-11-28 Thread Kenneth Zadeck
like the add/sub patch, enhance the comment so that it says that it is 
designed to hit the widestint and offset int common cases.


kenny
On 11/28/2013 12:34 PM, Richard Sandiford wrote:

As Richi asked, this patch makes cmps use the same shortcuts as lts_p.
It also makes cmpu use the shortcut that I justed added to ltu_p.

On that same fold-const.ii testcase, this reduces the number of cmps_large
calls from 66924 to 916.  It reduces the number of cmpu_large calls from
3462 to 4.

Tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard


Index: gcc/wide-int.h
===
--- gcc/wide-int.h  2001-01-01 00:00:00.0 +
+++ gcc/wide-int.h  2013-11-28 16:08:22.527681077 +
@@ -1858,17 +1858,31 @@ wi::cmps (const T1 &x, const T2 &y)
unsigned int precision = get_binary_precision (x, y);
WIDE_INT_REF_FOR (T1) xi (x, precision);
WIDE_INT_REF_FOR (T2) yi (y, precision);
-  if (precision <= HOST_BITS_PER_WIDE_INT)
+  if (wi::fits_shwi_p (yi))
  {
-  HOST_WIDE_INT xl = xi.to_shwi ();
-  HOST_WIDE_INT yl = yi.to_shwi ();
-  if (xl < yl)
+  /* Special case for comparisons with 0.  */
+  if (STATIC_CONSTANT_P (yi.val[0] == 0))
+   return neg_p (xi) ? -1 : !(xi.len == 1 && xi.val[0] == 0);
+  /* If x fits into a signed HWI, we can compare directly.  */
+  if (wi::fits_shwi_p (xi))
+   {
+ HOST_WIDE_INT xl = xi.to_shwi ();
+ HOST_WIDE_INT yl = yi.to_shwi ();
+ return xl < yl ? -1 : xl > yl;
+   }
+  /* If x doesn't fit and is negative, then it must be more
+negative than any signed HWI, and hence smaller than y.  */
+  if (neg_p (xi))
return -1;
-  else if (xl > yl)
-   return 1;
-  else
-   return 0;
+  /* If x is positive, then it must be larger than any signed HWI,
+and hence greater than y.  */
+  return 1;
  }
+  /* Optimize the opposite case, if it can be detected at compile time.  */
+  if (STATIC_CONSTANT_P (xi.len == 1))
+/* If YI is negative it is lower than the least HWI.
+   If YI is positive it is greater than the greatest HWI.  */
+return neg_p (yi) ? 1 : -1;
return cmps_large (xi.val, xi.len, precision, yi.val, yi.len);
  }
  
@@ -1881,16 +1895,35 @@ wi::cmpu (const T1 &x, const T2 &y)

unsigned int precision = get_binary_precision (x, y);
WIDE_INT_REF_FOR (T1) xi (x, precision);
WIDE_INT_REF_FOR (T2) yi (y, precision);
-  if (precision <= HOST_BITS_PER_WIDE_INT)
+  /* Optimize comparisons with constants.  */
+  if (STATIC_CONSTANT_P (yi.len == 1 && yi.val[0] >= 0))
  {
+  /* If XI doesn't fit in a HWI then it must be larger than YI.  */
+  if (xi.len != 1)
+   return 1;
+  /* Otherwise compare directly.  */
unsigned HOST_WIDE_INT xl = xi.to_uhwi ();
-  unsigned HOST_WIDE_INT yl = yi.to_uhwi ();
-  if (xl < yl)
+  unsigned HOST_WIDE_INT yl = yi.val[0];
+  return xl < yl ? -1 : xl > yl;
+}
+  if (STATIC_CONSTANT_P (xi.len == 1 && xi.val[0] >= 0))
+{
+  /* If YI doesn't fit in a HWI then it must be larger than XI.  */
+  if (yi.len != 1)
return -1;
-  else if (xl == yl)
-   return 0;
-  else
-   return 1;
+  /* Otherwise compare directly.  */
+  unsigned HOST_WIDE_INT xl = xi.val[0];
+  unsigned HOST_WIDE_INT yl = yi.to_uhwi ();
+  return xl < yl ? -1 : xl > yl;
+}
+  /* Optimize the case of two HWIs.  The HWIs are implicitly sign-extended
+ for precisions greater than HOST_BITS_WIDE_INT, but sign-extending both
+ values does not change the result.  */
+  if (xi.len + yi.len == 2)
+{
+  unsigned HOST_WIDE_INT xl = xi.to_uhwi ();
+  unsigned HOST_WIDE_INT yl = yi.to_uhwi ();
+  return xl < yl ? -1 : xl > yl;
  }
return cmpu_large (xi.val, xi.len, precision, yi.val, yi.len);
  }





Re: [wide-int] Handle more ltu_p cases inline

2013-11-28 Thread Kenneth Zadeck

this is fine.

kenny
On 11/28/2013 12:29 PM, Richard Sandiford wrote:

The existing ltu_p fast path can handle any pairs of single-HWI inputs,
even for precision > HOST_BITS_PER_WIDE_INT.  In that case both xl and
yl are implicitly sign-extended to the larger precision, but with the
extended values still being compared as unsigned.  The extension doesn't
change the result in that case.

When compiling a recent fold-const.ii, this reduces the number of
ltu_p_large calls from 23849 to 697.

Tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard


Index: gcc/alias.c
===
--- gcc/alias.c 2013-11-20 12:12:49.393055063 +
+++ gcc/alias.c 2013-11-28 12:24:23.307549245 +
@@ -342,7 +342,7 @@ ao_ref_from_mem (ao_ref *ref, const_rtx
  || (DECL_P (ref->base)
  && (DECL_SIZE (ref->base) == NULL_TREE
  || TREE_CODE (DECL_SIZE (ref->base)) != INTEGER_CST
- || wi::ltu_p (DECL_SIZE (ref->base),
+ || wi::ltu_p (wi::to_offset (DECL_SIZE (ref->base)),
ref->offset + ref->size)
  return false;
  
Index: gcc/wide-int.h

===
--- gcc/wide-int.h  2013-11-28 11:44:39.041731636 +
+++ gcc/wide-int.h  2013-11-28 12:48:36.200764215 +
@@ -1740,13 +1740,15 @@ wi::ltu_p (const T1 &x, const T2 &y)
unsigned int precision = get_binary_precision (x, y);
WIDE_INT_REF_FOR (T1) xi (x, precision);
WIDE_INT_REF_FOR (T2) yi (y, precision);
-  /* Optimize comparisons with constants and with sub-HWI unsigned
- integers.  */
+  /* Optimize comparisons with constants.  */
if (STATIC_CONSTANT_P (yi.len == 1 && yi.val[0] >= 0))
  return xi.len == 1 && xi.to_uhwi () < (unsigned HOST_WIDE_INT) yi.val[0];
if (STATIC_CONSTANT_P (xi.len == 1 && xi.val[0] >= 0))
  return yi.len != 1 || yi.to_uhwi () > (unsigned HOST_WIDE_INT) xi.val[0];
-  if (precision <= HOST_BITS_PER_WIDE_INT)
+  /* Optimize the case of two HWIs.  The HWIs are implicitly sign-extended
+ for precisions greater than HOST_BITS_WIDE_INT, but sign-extending both
+ values does not change the result.  */
+  if (xi.len + yi.len == 2)
  {
unsigned HOST_WIDE_INT xl = xi.to_uhwi ();
unsigned HOST_WIDE_INT yl = yi.to_uhwi ();





Re: [wide-int] Handle more add and sub cases inline

2013-11-28 Thread Kenneth Zadeck
I would like to see some comment to the effect that this to allow 
inlining for the common case for widest int and offset int without 
inlining the uncommon case for regular wide-int.





On 11/28/2013 12:38 PM, Richard Sandiford wrote:

Currently add and sub have no fast path for offset_int and widest_int,
they just call the out-of-line version.  This patch handles the
single-HWI cases inline.  At least on x86_64, this only adds one branch
per call; the fast path itself is straight-line code.

On the same fold-const.ii testcase, this reduces the number of
add_large calls from 877507 to 42459.  It reduces the number of
sub_large calls from 25707 to 148.

Tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard


Index: gcc/wide-int.h
===
--- gcc/wide-int.h  2013-11-28 13:34:19.596839877 +
+++ gcc/wide-int.h  2013-11-28 16:08:11.387731775 +
@@ -2234,6 +2234,17 @@ wi::add (const T1 &x, const T2 &y)
val[0] = xi.ulow () + yi.ulow ();
result.set_len (1);
  }
+  else if (STATIC_CONSTANT_P (precision > HOST_BITS_PER_WIDE_INT)
+  && xi.len + yi.len == 2)
+{
+  unsigned HOST_WIDE_INT xl = xi.ulow ();
+  unsigned HOST_WIDE_INT yl = yi.ulow ();
+  unsigned HOST_WIDE_INT resultl = xl + yl;
+  val[0] = resultl;
+  val[1] = (HOST_WIDE_INT) resultl < 0 ? 0 : -1;
+  result.set_len (1 + (((resultl ^ xl) & (resultl ^ yl))
+  >> (HOST_BITS_PER_WIDE_INT - 1)));
+}
else
  result.set_len (add_large (val, xi.val, xi.len,
   yi.val, yi.len, precision,
@@ -2288,6 +2299,17 @@ wi::sub (const T1 &x, const T2 &y)
val[0] = xi.ulow () - yi.ulow ();
result.set_len (1);
  }
+  else if (STATIC_CONSTANT_P (precision > HOST_BITS_PER_WIDE_INT)
+  && xi.len + yi.len == 2)
+{
+  unsigned HOST_WIDE_INT xl = xi.ulow ();
+  unsigned HOST_WIDE_INT yl = yi.ulow ();
+  unsigned HOST_WIDE_INT resultl = xl - yl;
+  val[0] = resultl;
+  val[1] = (HOST_WIDE_INT) resultl < 0 ? 0 : -1;
+  result.set_len (1 + (((resultl ^ xl) & (xl ^ yl))
+  >> (HOST_BITS_PER_WIDE_INT - 1)));
+}
else
  result.set_len (sub_large (val, xi.val, xi.len,
   yi.val, yi.len, precision,





[wide-int] small cleanup in wide-int.*

2013-11-28 Thread Kenneth Zadeck

This patch does three things in wide-int:

1) it cleans up some comments.
2) removes a small amount of trash.
3) it changes the max size of the wide int from being 4x of 
MAX_BITSIZE_MODE_ANY_INT to 2x +1.   This should improve large muls and 
divs as well as perhaps help with some cache behavior.


ok to commit
Index: gcc/wide-int.h
===
--- gcc/wide-int.h	(revision 205488)
+++ gcc/wide-int.h	(working copy)
@@ -55,10 +55,12 @@ along with GCC; see the file COPYING3.
 
  2) offset_int.  This is a fixed size representation that is
  guaranteed to be large enough to compute any bit or byte sized
- address calculation on the target.  Currently the value is 64 + 4
- bits rounded up to the next number even multiple of
+ address calculation on the target.  Currently the value is 64 + 3
+ + 1 bits rounded up to the next number even multiple of
  HOST_BITS_PER_WIDE_INT (but this can be changed when the first
- port needs more than 64 bits for the size of a pointer).
+ port needs more than 64 bits for the size of a pointer).  The 3
+ bits allow the bits of byte to accessed, the 1 allows any
+ unsigned value to be converted to signed without overflowing.
 
  This flavor can be used for all address math on the target.  In
  this representation, the values are sign or zero extended based
@@ -112,11 +114,11 @@ along with GCC; see the file COPYING3.
two, the default is the prefered representation.
 
All three flavors of wide_int are represented as a vector of
-   HOST_WIDE_INTs.  The default and widest_int vectors contain enough elements
-   to hold a value of MAX_BITSIZE_MODE_ANY_INT bits.  offset_int contains only
-   enough elements to hold ADDR_MAX_PRECISION bits.  The values are stored
-   in the vector with the least significant HOST_BITS_PER_WIDE_INT bits
-   in element 0.
+   HOST_WIDE_INTs.  The default and widest_int vectors contain enough
+   elements to hold a value of MAX_BITSIZE_MODE_ANY_INT bits.
+   offset_int contains only enough elements to hold ADDR_MAX_PRECISION
+   bits.  The values are stored in the vector with the least
+   significant HOST_BITS_PER_WIDE_INT bits in element 0.
 
The default wide_int contains three fields: the vector (VAL),
the precision and a length (LEN).  The length is the number of HWIs
@@ -223,10 +225,6 @@ along with GCC; see the file COPYING3.
 #include "signop.h"
 #include "insn-modes.h"
 
-#if 0
-#define DEBUG_WIDE_INT
-#endif
-
 /* The MAX_BITSIZE_MODE_ANY_INT is automatically generated by a very
early examination of the target's mode file.  Thus it is safe that
some small multiple of this number is easily larger than any number
@@ -235,8 +233,8 @@ along with GCC; see the file COPYING3.
range of a multiply.  This code needs 2n + 2 bits.  */
 
 #define WIDE_INT_MAX_ELTS \
-  ((4 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \
-   / HOST_BITS_PER_WIDE_INT)
+  (((2 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \
+/ HOST_BITS_PER_WIDE_INT) + 1)
 
 /* This is the max size of any pointer on any machine.  It does not
seem to be as easy to sniff this out of the machine description as
Index: gcc/wide-int.cc
===
--- gcc/wide-int.cc	(revision 205488)
+++ gcc/wide-int.cc	(working copy)
@@ -2090,59 +2090,5 @@ void gt_ggc_mx (widest_int *) { }
 void gt_pch_nx (widest_int *, void (*) (void *, void *), void *) { }
 void gt_pch_nx (widest_int *) { }
 
-/*
- * Private debug printing routines.
- */
-#ifdef DEBUG_WIDE_INT
-/* The debugging routines print results of wide operations into the
-   dump files of the respective passes in which they were called.  */
-static char *
-dumpa (const HOST_WIDE_INT *val, unsigned int len, unsigned int prec, char *buf)
-{
-  int i;
-  unsigned int l;
-  const char * sep = "";
 
-  l = sprintf (buf, "[%d (", prec);
-  for (i = len - 1; i >= 0; i--)
-{
-  l += sprintf (&buf[l], "%s" HOST_WIDE_INT_PRINT_HEX, sep, val[i]);
-  sep = " ";
-}
 
-  gcc_assert (len != 0);
-
-  l += sprintf (&buf[l], ")]");
-
-  gcc_assert (l < MAX_SIZE);
-  return buf;
-
-
-}
-#endif
-
-#if 0
-/* The debugging routines print results of wide operations into the
-   dump files of the respective passes in which they were called.  */
-char *
-wide_int_ro::dump (char* buf) const
-{
-  int i;
-  unsigned int l;
-  const char * sep = "";
-
-  l = sprintf (buf, "[%d (", precision);
-  for (i = len - 1; i >= 0; i--)
-{
-  l += sprintf (&buf[l], "%s" HOST_WIDE_INT_PRINT_HEX, sep, val[i]);
-  sep = " ";
-}
-
-  gcc_assert (len != 0);
-
-  l += sprintf (&buf[l], ")]");
-
-  gcc_assert (l < MAX_SIZE);
-  return buf;
-}
-#endif


Re: wide-int, rtl

2013-11-27 Thread Kenneth Zadeck

Eric,

Let me make one high level comment here and the low level comments will 
be responded to when i fix the patch.


CONST_DOUBLE has two hwis in it.   So in practice, you get 128 bits and 
that is it.a CONST_WIDE_INT has an array of HWIs that has as many 
elements as it needs to represent the value.


If TARGET_SUPPORTS_WIDE_INT == 0 (the default and currently the value 
for every public port except the RS6000), then rtl integer constants 
that do not fit in a CONST_INT are put in a CONST_DOUBLE and there are 
never any CONST_WIDE_INTS created.A platform like this cannot 
correctly represent variables larger than 128 bits. i.e. this is like 
the current trunk except that there are fewer bugs for the 128 bit 
variables.


if TARGET_SUPPORTS_WIDE_INT is not 0, then rtl integer constants are put 
in CONST_WIDE_INT and CONST_DOUBLES only hold floating point values.   
For the last comment that you made, if the target does not support wide 
int, then there is simply no way to represent that constant properly and 
if the target accepts types larger than TImode, then the maintainer 
needs to do the work to use CONST_WIDE_INTs.


This is not a lot a work, but it requires target specific knowledge to 
convert the patterns, predicates and constraints to look for 
CONST_WIDE_INTs rather than CONST_DOUBLE for larger integers.


As far as your comment about const_double_operand, i could add some 
conditional code to make this not accept integers if the TARGET... is 
not 0.   Richard Sandiford wanted my to use that test as infrequently as 
possible.


thanks for your comments.

Kenny

On 11/27/2013 11:24 AM, Eric Botcazou wrote:

Richi has asked the we break the wide-int patch so that the individual port
and front end maintainers can review their parts without have to go through
the entire patch.This patch covers the first half of the rtl code.

--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -2336,15 +2336,23 @@ hash_rtx_cb (const_rtx x, enum machine_mode mode,
 + (unsigned int) INTVAL (x));
return hash;
  
+case CONST_WIDE_INT:

+  {
+   int i;
+   for (i = 0; i < CONST_WIDE_INT_NUNITS (x); i++)
+ hash += CONST_WIDE_INT_ELT (x, i);
+  }
+  return hash;

You can write "for (int i = 0; ..." now and remove the parentheses.


--- a/gcc/cselib.c
+++ b/gcc/cselib.c
@@ -1121,15 +1120,23 @@ cselib_hash_rtx (rtx x, int create, enum machine_mode
memmode)
hash += ((unsigned) CONST_INT << 7) + INTVAL (x);
return hash ? hash : (unsigned int) CONST_INT;
  
+case CONST_WIDE_INT:

+  {
+   int i;
+   for (i = 0; i < CONST_WIDE_INT_NUNITS (x); i++)
+ hash += CONST_WIDE_INT_ELT (x, i);
+  }
+  return hash;
+

Likewise.


--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -3298,23 +3268,13 @@ choose_multiplier (unsigned HOST_WIDE_INT d, int n,
int precision,
pow = n + lgup;
pow2 = n + lgup - precision;
  
-  /* We could handle this with some effort, but this case is much

- better handled directly with a scc insn, so rely on caller using
- that.  */
-  gcc_assert (pow != HOST_BITS_PER_DOUBLE_INT);

Why removing it?



--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -722,64 +722,33 @@ convert_modes (enum machine_mode mode, enum machine_mode
oldmode, rtx x, int uns
if (mode == oldmode)
  return x;
  
-  /* There is one case that we must handle specially: If we are converting

- a CONST_INT into a mode whose size is twice HOST_BITS_PER_WIDE_INT and
- we are to interpret the constant as unsigned, gen_lowpart will do
- the wrong if the constant appears negative.  What we want to do is
- make the high-order word of the constant zero, not all ones.  */
-
-  if (unsignedp && GET_MODE_CLASS (mode) == MODE_INT
-  && GET_MODE_BITSIZE (mode) == HOST_BITS_PER_DOUBLE_INT
-  && CONST_INT_P (x) && INTVAL (x) < 0)
+  if (CONST_SCALAR_INT_P (x)
+  && GET_MODE_CLASS (mode) == MODE_INT)

On a single line.


  {
-  double_int val = double_int::from_uhwi (INTVAL (x));
-
-  /* We need to zero extend VAL.  */
-  if (oldmode != VOIDmode)
-   val = val.zext (GET_MODE_BITSIZE (oldmode));
-
-  return immed_double_int_const (val, mode);
+  /* If the caller did not tell us the old mode, then there is
+not much to do with respect to canonization.  We have to assume
+that all the bits are significant.  */
+  if (GET_MODE_CLASS (oldmode) != MODE_INT)
+   oldmode = MAX_MODE_INT;
+  wide_int w = wide_int::from (std::make_pair (x, oldmode),
+  GET_MODE_PRECISION (mode),
+  unsignedp ? UNSIGNED : SIGNED);
+  return immed_wide_int_const (w, mode);

canonicalization?


@@ -5301,10 +5271,10 @@ store_expr (tree exp, rtx target, int call_param_p,
bool nontemporal)
   &alt_rtl);
  }
  
-  /* If TEMP is a VOIDmode constant and the mode of the type of EXP is not

- the same as that of TA

Re: Some wide-int review comments

2013-11-27 Thread Kenneth Zadeck

committed as revision 205448 to trunk.
committed as revision 205455 to wide-int branch.
On 11/27/2013 05:50 AM, Richard Biener wrote:

On Tue, Nov 26, 2013 at 5:33 PM, Kenneth Zadeck
 wrote:

Richi,

patch ping

Ok.

Thanks,
Richard.


also two more pieces of information.With further testing, this seems to
fix

Tests that now work, but didn't before:
===
ext/random/hypergeometric_distribution/operators/values.cc (test for excess
errors)

New tests that PASS:

ext/random/hypergeometric_distribution/operators/values.cc execution test


also, the corresponding frag for fold-const.c on the wide-int branch will
look like

Index: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 205224)
+++ gcc/fold-const.c(working copy)
@@ -1030,51 +1030,51 @@ int_const_binop_1 (enum tree_code code,

  case TRUNC_DIV_EXPR:
  case EXACT_DIV_EXPR:
-  res = wi::div_trunc (arg1, arg2, sign, &overflow);
-  if (overflow)
+  if (arg2 == 0)
  return NULL_TREE;
+  res = wi::div_trunc (arg1, arg2, sign, &overflow);
break;

  case FLOOR_DIV_EXPR:
-  res = wi::div_floor (arg1, arg2, sign, &overflow);
-  if (overflow)
+  if (arg2 == 0)
  return NULL_TREE;
+  res = wi::div_floor (arg1, arg2, sign, &overflow);
break;

  case CEIL_DIV_EXPR:
-  res = wi::div_ceil (arg1, arg2, sign, &overflow);
-  if (overflow)
+  if (arg2 == 0)
  return NULL_TREE;
+  res = wi::div_ceil (arg1, arg2, sign, &overflow);
break;

  case ROUND_DIV_EXPR:
-  res = wi::div_round (arg1, arg2, sign, &overflow);
-  if (overflow)
+  if (arg2 == 0)
  return NULL_TREE;
+  res = wi::div_round (arg1, arg2, sign, &overflow);
break;

  case TRUNC_MOD_EXPR:
-  res = wi::mod_trunc (arg1, arg2, sign, &overflow);
-  if (overflow)
+  if (arg2 == 0)
  return NULL_TREE;
+  res = wi::mod_trunc (arg1, arg2, sign, &overflow);
break;

  case FLOOR_MOD_EXPR:
-  res = wi::mod_floor (arg1, arg2, sign, &overflow);
-  if (overflow)
+  if (arg2 == 0)
  return NULL_TREE;
+  res = wi::mod_floor (arg1, arg2, sign, &overflow);
break;

  case CEIL_MOD_EXPR:
-  res = wi::mod_ceil (arg1, arg2, sign, &overflow);
-  if (overflow)
+  if (arg2 == 0)
  return NULL_TREE;
+  res = wi::mod_ceil (arg1, arg2, sign, &overflow);
break;

  case ROUND_MOD_EXPR:
-  res = wi::mod_round (arg1, arg2, sign, &overflow);
-  if (overflow)
+  if (arg2 == 0)
  return NULL_TREE;
+  res = wi::mod_round (arg1, arg2, sign, &overflow);
break;

      case MIN_EXPR:



On 11/20/2013 06:31 PM, Kenneth Zadeck wrote:

On 11/13/2013 04:59 AM, Richard Biener wrote:

On Tue, Nov 12, 2013 at 5:43 PM, Kenneth Zadeck
 wrote:

On 11/12/2013 11:27 AM, Joseph S. Myers wrote:

On Tue, 12 Nov 2013, Kenneth Zadeck wrote:


Richi,

i am having a little trouble putting this back the way that you want.
The
issue is rem.
what is supposed to happen for INT_MIN % -1?

I would assume because i am failing the last case of
gcc.dg/c90-const-expr-8.c
that INT_MIN %-1 should not overflow even if INT_MIN / -1 does.
however,

Given the conclusion in C11 that a%b should be considered undefined if
a/b
is not representable, I think it's reasonable to say INT_MIN % -1
*should*
be considered to overflow (for all C standard versions) (and bug 30484
is
only a bug for -fwrapv).


however, my local question is what do we want the api to be
int-const-binop-1?The existing behavior seems to be that at least
for
common modes this function silently returns 0 and it is up to the front
ends
to put their own spin on it.

For wide-int you create 1:1 the behavior of current trunk (if a change of
behavior in TImode is not tested in the testsuite then you can ignore
that).
Whatever change you do to semantics of functions you do separately
from wide-int (preferably first on trunk, or at your choice after the
wide-int
merge).

For this case in question I'd say a % -1 should return 0, but for
INT_MIN % -1 that 0 should have TREE_OVERFLOW set (and
thus you need to adjust that c90-const-expr-8.c testcase).

Richard.


kenny

richi,
I have done this exactly as you suggested.   bootstrapped and regression
tested on x86-64.

2013-11-20  Kenneth Zadeck  

 * fold-const.c
 (int_const_binop_1): Make INT_MIN % -1 return 0 with the overflow
 bit set.


2013-11-20  Kenneth Zadeck  

 * gcc.dg/c90-const-expr-8.c: Look for overflow on INT_MIN % -1.
 * gcc.dg/c99-const-expr-8.c: Look for overflow on INT_MIN % -1.

ok to commit?

kenny



Index: gcc/fold-const.c
===
--- gcc/fold-const.c	(revision 205447)
+++ gcc/fold-const.c	(working

Re: wide-int, rs6000

2013-11-26 Thread Kenneth Zadeck
We will of course measure it but the only thing that is different because of 
the conversion is that timode integers are packaged differently



> On Nov 26, 2013, at 6:17 PM, David Edelsohn  wrote:
> 
>> On Tue, Nov 26, 2013 at 5:46 PM, Mike Stump  wrote:
>> 
>> Ok?
> 
> The revised version of the patch looks okay from a technical
> perspective, but I still am concerned about the compile-time impact. I
> thought that I saw a comment on IRC that wide-int seems to have a 3-5%
> compile time impact, which is a concern. I am concerned about the
> rs6000 port being further slowed with respect to other targets.
> 
> Thanks, David


Re: Some wide-int review comments

2013-11-26 Thread Kenneth Zadeck

Richi,

patch ping

also two more pieces of information.With further testing, this seems 
to fix


Tests that now work, but didn't before:
===
ext/random/hypergeometric_distribution/operators/values.cc (test for 
excess errors)


New tests that PASS:

ext/random/hypergeometric_distribution/operators/values.cc execution test


also, the corresponding frag for fold-const.c on the wide-int branch 
will look like


Index: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 205224)
+++ gcc/fold-const.c(working copy)
@@ -1030,51 +1030,51 @@ int_const_binop_1 (enum tree_code code,

 case TRUNC_DIV_EXPR:
 case EXACT_DIV_EXPR:
-  res = wi::div_trunc (arg1, arg2, sign, &overflow);
-  if (overflow)
+  if (arg2 == 0)
 return NULL_TREE;
+  res = wi::div_trunc (arg1, arg2, sign, &overflow);
   break;

 case FLOOR_DIV_EXPR:
-  res = wi::div_floor (arg1, arg2, sign, &overflow);
-  if (overflow)
+  if (arg2 == 0)
 return NULL_TREE;
+  res = wi::div_floor (arg1, arg2, sign, &overflow);
   break;

 case CEIL_DIV_EXPR:
-  res = wi::div_ceil (arg1, arg2, sign, &overflow);
-  if (overflow)
+  if (arg2 == 0)
 return NULL_TREE;
+  res = wi::div_ceil (arg1, arg2, sign, &overflow);
   break;

 case ROUND_DIV_EXPR:
-  res = wi::div_round (arg1, arg2, sign, &overflow);
-  if (overflow)
+  if (arg2 == 0)
 return NULL_TREE;
+  res = wi::div_round (arg1, arg2, sign, &overflow);
   break;

 case TRUNC_MOD_EXPR:
-  res = wi::mod_trunc (arg1, arg2, sign, &overflow);
-  if (overflow)
+  if (arg2 == 0)
 return NULL_TREE;
+  res = wi::mod_trunc (arg1, arg2, sign, &overflow);
   break;

 case FLOOR_MOD_EXPR:
-  res = wi::mod_floor (arg1, arg2, sign, &overflow);
-  if (overflow)
+  if (arg2 == 0)
 return NULL_TREE;
+  res = wi::mod_floor (arg1, arg2, sign, &overflow);
   break;

 case CEIL_MOD_EXPR:
-  res = wi::mod_ceil (arg1, arg2, sign, &overflow);
-  if (overflow)
+  if (arg2 == 0)
 return NULL_TREE;
+  res = wi::mod_ceil (arg1, arg2, sign, &overflow);
   break;

 case ROUND_MOD_EXPR:
-  res = wi::mod_round (arg1, arg2, sign, &overflow);
-  if (overflow)
+  if (arg2 == 0)
 return NULL_TREE;
+  res = wi::mod_round (arg1, arg2, sign, &overflow);
   break;

 case MIN_EXPR:


On 11/20/2013 06:31 PM, Kenneth Zadeck wrote:

On 11/13/2013 04:59 AM, Richard Biener wrote:

On Tue, Nov 12, 2013 at 5:43 PM, Kenneth Zadeck
 wrote:

On 11/12/2013 11:27 AM, Joseph S. Myers wrote:

On Tue, 12 Nov 2013, Kenneth Zadeck wrote:


Richi,

i am having a little trouble putting this back the way that you want.
The
issue is rem.
what is supposed to happen for INT_MIN % -1?

I would assume because i am failing the last case of
gcc.dg/c90-const-expr-8.c
that INT_MIN %-1 should not overflow even if INT_MIN / -1 does.
however,
Given the conclusion in C11 that a%b should be considered undefined 
if a/b
is not representable, I think it's reasonable to say INT_MIN % -1 
*should*
be considered to overflow (for all C standard versions) (and bug 
30484 is

only a bug for -fwrapv).


however, my local question is what do we want the api to be
int-const-binop-1?The existing behavior seems to be that at 
least for
common modes this function silently returns 0 and it is up to the 
front ends

to put their own spin on it.
For wide-int you create 1:1 the behavior of current trunk (if a 
change of
behavior in TImode is not tested in the testsuite then you can ignore 
that).

Whatever change you do to semantics of functions you do separately
from wide-int (preferably first on trunk, or at your choice after the 
wide-int

merge).

For this case in question I'd say a % -1 should return 0, but for
INT_MIN % -1 that 0 should have TREE_OVERFLOW set (and
thus you need to adjust that c90-const-expr-8.c testcase).

Richard.


kenny

richi,
I have done this exactly as you suggested.   bootstrapped and 
regression tested on x86-64.


2013-11-20  Kenneth Zadeck  

* fold-const.c
(int_const_binop_1): Make INT_MIN % -1 return 0 with the overflow
bit set.


2013-11-20  Kenneth Zadeck  

* gcc.dg/c90-const-expr-8.c: Look for overflow on INT_MIN % -1.
* gcc.dg/c99-const-expr-8.c: Look for overflow on INT_MIN % -1.

ok to commit?

kenny





Re: wide-int, ada

2013-11-26 Thread Kenneth Zadeck


On 11/26/2013 09:16 AM, Richard Biener wrote:

On Tue, Nov 26, 2013 at 3:15 PM, H.J. Lu  wrote:

On Tue, Nov 26, 2013 at 6:03 AM,   wrote:



On Nov 26, 2013, at 6:00 AM, "H.J. Lu"  wrote:

On Tue, Nov 26, 2013 at 5:55 AM, Richard Biener
 wrote:

On Tue, Nov 26, 2013 at 2:44 PM, Richard Earnshaw  wrote:

On 26/11/13 09:18, Eric Botcazou wrote:

you are correct - this was an incorrect change.   I believe that the
patch below would be correct, but it is impossible to test it because (i
believe) that gcc no longer works if the host_bits_per_wide_int is 32.
I could be wrong about this but if i am correct, what do you want me to do?

While you're right that most mainstream architectures now require a 64-bit
HWI, not all of them do according to config.gcc, so I don't think that this
path is entirely dead yet.  I'll carry out the testing once we agree on the
final change.

I'm hoping, once this patch series is in that we might be able to
migrate the ARM port back to supporting a 32-bit HWI.  The driving
factor behind the original switch was supporting 128-bit constants for
Neon and these patches should resolve that.

i?86 would be another candidate (if you don't build a compiler with -m64
support).

Not true for x86 since we have

Variable
HOST_WIDE_INT ix86_isa_flags = TARGET_64BIT_DEFAULT |
TARGET_SUBTARGET_ISA_DEFAULT

in i386.opt.   We need more than 32 bits for ix86_isa_flags.

Then that should be HOST_WIDEST_INT instead.


Also one function in i386.c generates less optimal
results when HOST_WIDE_INT is 32-bit such that
we were generating different outputs from the same
input on x86 and on x86-64 with -m32.

Well - we knew the code would bitrot once we decided to always
default to 64bit HWI ...

Richard.


--
H.J.
Just for the record, the straw the broke the camel's back for me was 
some bit map data structure in one of the gen functions that has a HWI 
worth of bits and i needed 35 of them for the port i was trying.It 
was all fixable, but i had already signed up for one task that seemed 
like it was going to take the rest of my like and i did not need another.


kenny


Re: wide-int, ada

2013-11-26 Thread Kenneth Zadeck


On 11/26/2013 09:12 AM, Richard Biener wrote:

On Tue, Nov 26, 2013 at 3:00 PM, Kenneth Zadeck
 wrote:

On 11/26/2013 08:44 AM, Richard Earnshaw wrote:

On 26/11/13 09:18, Eric Botcazou wrote:

you are correct - this was an incorrect change.   I believe that the
patch below would be correct, but it is impossible to test it because (i
believe) that gcc no longer works if the host_bits_per_wide_int is 32.
I could be wrong about this but if i am correct, what do you want me to
do?

While you're right that most mainstream architectures now require a
64-bit
HWI, not all of them do according to config.gcc, so I don't think that
this
path is entirely dead yet.  I'll carry out the testing once we agree on
the
final change.

I'm hoping, once this patch series is in that we might be able to
migrate the ARM port back to supporting a 32-bit HWI.  The driving
factor behind the original switch was supporting 128-bit constants for
Neon and these patches should resolve that.

R.


Richard,

I had not realized that you were into self abuse like that. you are
going to have a bad time.  I tried this as a way to test the wide-int branch
because if we made hwi be 32bits, then it would trigger the long version of
the implementation wide-int routines. What a disaster  richard sandiford
told me not to try and i ignored him.  that was a wasted weekend!!!There
are roadblocks everywhere.   i certainly do not have a complete list,
because i gave up after hacking a few things back.   But very quickly you
find places like real.[ch] which no longer work because the math that the
dfp people hacked into it no longer works if the hwis are 32 bits.it is
misleading because all of the tests are still there, it is just that the
equations return the wrong answers.It is like this, one place after
another.

As far as your hope that once this patch goes in you would be able to
support 128 bit constants is only partially true.it will/should be the
case, that you always get the correct answer and the code will not ice.
That is the good news.But the bad news is that there is still way to
much code of the form, if var fits in hwi, do some optimization, else do not
do the optimization. Richi did not want me to attack this code yet because
it would have made the patches even bigger.   For the rtl level a lot of
this was removed because i did it before he told me not to, but the tree
level still has tons of code like this.   So at least for a long time, like
the close of the next release, this is not going to get fixed.

Note that in the long run we want wide-int to use HOST_WIDEST_FAST_INT
and not HOST_WIDE_INT internally.  I know I asked you to do that
but it's of course too late now.

Richard.
i started to do it and it turned out to be harder than either of us 
thought it would.  As i remember it, the problem came up of what were 
you going to populate the tree_csts and const_wide_ints.   I assume that 
you wanted those to have HWFI in them also.Then i started looking at 
the *fits_*hwi to_*hwi which is used in a billion places and that was 
now starting to look expensive.


perhaps this change could/should be done later after there are almost no 
calls to these functions.   But while there are still a billion of them 
in the code, this seemed like a mistake.

Kenny







Re: wide-int, ada

2013-11-26 Thread Kenneth Zadeck


On 11/26/2013 08:44 AM, Richard Earnshaw wrote:

On 26/11/13 09:18, Eric Botcazou wrote:

you are correct - this was an incorrect change.   I believe that the
patch below would be correct, but it is impossible to test it because (i
believe) that gcc no longer works if the host_bits_per_wide_int is 32.
I could be wrong about this but if i am correct, what do you want me to do?

While you're right that most mainstream architectures now require a 64-bit
HWI, not all of them do according to config.gcc, so I don't think that this
path is entirely dead yet.  I'll carry out the testing once we agree on the
final change.

I'm hoping, once this patch series is in that we might be able to
migrate the ARM port back to supporting a 32-bit HWI.  The driving
factor behind the original switch was supporting 128-bit constants for
Neon and these patches should resolve that.

R.


Richard,

I had not realized that you were into self abuse like that. you are 
going to have a bad time.  I tried this as a way to test the wide-int 
branch because if we made hwi be 32bits, then it would trigger the long 
version of the implementation wide-int routines. What a disaster  
richard sandiford told me not to try and i ignored him.  that was a 
wasted weekend!!!There are roadblocks everywhere.   i certainly do 
not have a complete list, because i gave up after hacking a few things 
back.   But very quickly you find places like real.[ch] which no longer 
work because the math that the dfp people hacked into it no longer works 
if the hwis are 32 bits.it is misleading because all of the tests 
are still there, it is just that the equations return the wrong 
answers.It is like this, one place after another.


As far as your hope that once this patch goes in you would be able to 
support 128 bit constants is only partially true.it will/should be 
the case, that you always get the correct answer and the code will not 
ice.   That is the good news.But the bad news is that there is still 
way to much code of the form, if var fits in hwi, do some optimization, 
else do not do the optimization. Richi did not want me to attack this 
code yet because it would have made the patches even bigger.   For the 
rtl level a lot of this was removed because i did it before he told me 
not to, but the tree level still has tons of code like this.   So at 
least for a long time, like the close of the next release, this is not 
going to get fixed.


Kenny









Re: wide-int, ada

2013-11-25 Thread Kenneth Zadeck


On 11/25/2013 03:46 AM, Eric Botcazou wrote:

Richi has asked the we break the wide-int patch so that the individual port
and front end maintainers can review their parts without have to go through
the entire patch.This patch covers the ada front-end.

I don't think that the mechanical change in UI_From_gnu is correct, see the
comment just above.  The annotate_value change is very likely correct, but
please double check and, upon positive outcome, remove the last sentence of
the comment just above.  The rest looks good, thanks.


Eric,

you are correct - this was an incorrect change.   I believe that the 
patch below would be correct, but it is impossible to test it because (i 
believe) that gcc no longer works if the host_bits_per_wide_int is 32.   
I could be wrong about this but if i am correct, what do you want me to do?


Kenny

Index: gcc/ada/gcc-interface/cuintp.c
===
--- gcc/ada/gcc-interface/cuintp.c(revision 205374)
+++ gcc/ada/gcc-interface/cuintp.c(working copy)
@@ -167,7 +167,7 @@ UI_From_gnu (tree Input)
  in a signed 64-bit integer.  */
   if (tree_fits_shwi_p (Input))
 return UI_From_Int (tree_to_shwi (Input));
-  else if (wi::lts_p (Input, 0) && TYPE_UNSIGNED (gnu_type))
+  else if (wi::gtu_p (wi::lrshift (Input, 63), 0) && TYPE_UNSIGNED 
(gnu_type))

 return No_Uint;
 #endif




Re: wide-int, dwarf

2013-11-25 Thread Kenneth Zadeck
I replied to the wrong email when i sent the first version of this 
emal.   sorry.This was the comment that was addressed by this fix.


fixed on the wide-int branch 205363.



On 11/24/2013 08:43 AM, Jason Merrill wrote:

On 11/23/2013 09:55 PM, Kenneth Zadeck wrote:

On 11/23/2013 08:47 PM, Jason Merrill wrote:



So if the target supports wide ints you'll always use the scalar float
code?  Why does that do the right thing?



if TARGET_SUPPORTS_WIDE_INT != 0, then integers are NEVER placed in
const-doubles.  large integers go into CONST_WIDE_INTS  so this case
always represents a floating point constant and nothing else. So yes.

I think that you could argue that the comment above this frag needs to
be rewritten because it is wrong if TARGET_SUPPORTS_WIDE_INT != 0.


Please.  OK with that change.

Jason




Index: dwarf2out.c
===
--- dwarf2out.c	(revision 205360)
+++ dwarf2out.c	(working copy)
@@ -12880,10 +12880,15 @@ mem_loc_descriptor (rtx rtl, enum machin
 	{
 	  dw_die_ref type_die;
 
-	  /* Note that a CONST_DOUBLE rtx could represent either an integer
-	 or a floating-point constant.  A CONST_DOUBLE is used whenever
-	 the constant requires more than one word in order to be
-	 adequately represented.  We output CONST_DOUBLEs as blocks.  */
+	  /* Note that if TARGET_SUPPORTS_WIDE_INT == 0, a
+	 CONST_DOUBLE rtx could represent either an large integer
+	 or a floating-point constant.  If
+	 TARGET_SUPPORTS_WIDE_INT != 0, the value is always a
+	 floating point constant.
+
+	 When it is an integer, a CONST_DOUBLE is used whenever
+	 the constant requires 2 HWIs to be adequately
+	 represented.  We output CONST_DOUBLEs as blocks.  */
 	  if (mode == VOIDmode
 	  || (GET_MODE (rtl) == VOIDmode
 		  && GET_MODE_BITSIZE (mode) != HOST_BITS_PER_DOUBLE_INT))


Re: wide-int, C++ front end

2013-11-25 Thread Kenneth Zadeck

fixed on the wide-int branch 205363.
On 11/23/2013 09:00 PM, Jason Merrill wrote:

On 11/23/2013 02:20 PM, Mike Stump wrote:

@@ -2605,8 +2606,7 @@ cp_tree_equal (tree t1, tree t2)
switch (code1)
  {
  case INTEGER_CST:
-  return TREE_INT_CST_LOW (t1) == TREE_INT_CST_LOW (t2)
-&& TREE_INT_CST_HIGH (t1) == TREE_INT_CST_HIGH (t2);
+  return wi::to_widest (t1) == wi::to_widest (t2);


Why not use wi::eq_p like you do in the C front end?

Jason



Index: dwarf2out.c
===
--- dwarf2out.c	(revision 205360)
+++ dwarf2out.c	(working copy)
@@ -12880,10 +12880,15 @@ mem_loc_descriptor (rtx rtl, enum machin
 	{
 	  dw_die_ref type_die;
 
-	  /* Note that a CONST_DOUBLE rtx could represent either an integer
-	 or a floating-point constant.  A CONST_DOUBLE is used whenever
-	 the constant requires more than one word in order to be
-	 adequately represented.  We output CONST_DOUBLEs as blocks.  */
+	  /* Note that if TARGET_SUPPORTS_WIDE_INT == 0, a
+	 CONST_DOUBLE rtx could represent either an large integer
+	 or a floating-point constant.  If
+	 TARGET_SUPPORTS_WIDE_INT != 0, the value is always a
+	 floating point constant.
+
+	 When it is an integer, a CONST_DOUBLE is used whenever
+	 the constant requires 2 HWIs to be adequately
+	 represented.  We output CONST_DOUBLEs as blocks.  */
 	  if (mode == VOIDmode
 	  || (GET_MODE (rtl) == VOIDmode
 		  && GET_MODE_BITSIZE (mode) != HOST_BITS_PER_DOUBLE_INT))


Re: wide-int, loop

2013-11-25 Thread Kenneth Zadeck


On 11/25/2013 06:04 AM, Richard Biener wrote:

On Sat, Nov 23, 2013 at 8:22 PM, Mike Stump  wrote:

Richi has asked the we break the wide-int patch so that the individual port and 
front end maintainers can review their parts without have to go through the 
entire patch.This patch covers the loop code.

Ok?

@@ -2662,8 +2661,8 @@ iv_number_of_iterations (struct loop *loop, rtx
insn, rtx condition,
iv1.step = const0_rtx;
if (INTVAL (iv0.step) < 0)
 {
- iv0.step = simplify_gen_unary (NEG, comp_mode, iv0.step, mode);
- iv1.base = simplify_gen_unary (NEG, comp_mode, iv1.base, mode);
+ iv0.step = simplify_gen_unary (NEG, comp_mode, iv0.step, comp_mode);
+ iv1.base = simplify_gen_unary (NEG, comp_mode, iv1.base, comp_mode);
 }
iv0.step = lowpart_subreg (mode, iv0.step, comp_mode);

separate bugfix?

most likely.i will submit separately.

@@ -1378,7 +1368,8 @@ decide_peel_simple (struct loop *loop, int flags)
/* If we have realistic estimate on number of iterations, use it.  */
if (get_estimated_loop_iterations (loop, &iterations))
  {
-  if (double_int::from_shwi (npeel).ule (iterations))
+  /* TODO: unsigned/signed confusion */
+  if (wi::leu_p (npeel, iterations))
 {
   if (dump_file)
 {

what does this refer to?  npeel is unsigned.


it was the fact that they were doing the from_shwi and then using an 
unsigned test.

Otherwise looks good to me.

Thanks,
Richard.




Re: wide-int, fortran

2013-11-24 Thread Kenneth Zadeck


On 11/24/2013 08:38 AM, N.M. Maclaren wrote:

On Nov 24 2013, Kenneth Zadeck wrote:

Thank you for your posting.  That certainly clears up my understanding.


If there is a clear description of the subset
of C++ that the front-end is allowed to use, a pointer to it for the
benefit of Fortran/C/Ada/whatever people would be useful.  But that's
an aside from this thread.


There is a saying in the US that "you can always tell the pioneers 
because they are the ones with the arrows in their back."   I feel 
this way with respect to C++.   When this branch first went public 
several months ago, the wide-int class made very modest use of C++.   
It has publicly evolved away from that and i would now describe the 
usage as somewhat aggressive. I do not know if i think that this 
is good or bad, but i expect that because we are the first big patch 
to use C++ aggressively , that we are going to take some arrows.


I think that using C++ even slightly aggressively without first deciding
the grounds rules is a seriously bad idea.  The trouble about following
random pioneers is that they are very likely to lead you into a swamp,
with the loss of the whole expedition.  And, with C++, that is more than
just likely if you follow an adventurous pioneer rather than a cautious
one - it's damn-near certain :-(

i do not actually disagree with you.my point is that this was all 
done in the open.
A useful way to think about this patch is that it is nothing but 
plumbing.Gcc has had a painful and long evolution from being a 32 
bit compiler to a 64 bit compiler and beyond.  ...


Yes, but that's not my point.

The main problem is that integer constant expressions in C are limited to
the built-in operators, of which the only tricky ones are division and
remainder (and, occasionally, multiplication) - see C11 6.6#3. Fortran
is not so limited, and there are much wider requirements for expression
evaluation at compile time.


But at both ends of the compiler there are still limits.


And my point is that this is NOT an area that separates cleanly into
front and back end!

However, from your description, this is one component of any solution
for gfortran, and it doesn't sound as if it will cause trouble until
and unless someone wants to extend gfortran to wider types.  They will
then have to implement the other components 


In any event, the problem is not with making the math work correctly, 
which is really the primary point of this patch.   It is as you say how 
the various front ends decide how they wish to exploit that power.




Regards,
Nick Maclaren.







Re: wide-int, fortran

2013-11-24 Thread Kenneth Zadeck


On 11/24/2013 05:50 AM, Tobias Burnus wrote:

Mike Stump wrote:
Richi has asked the we break the wide-int patch so that the 
individual port and front end maintainers can review their parts 
without have to go through the entire patch.This patch covers the 
fortran front end.


Nice clean up. The new class looks much cleaner as it avoids the LO/HI 
handling.



-  hi = TREE_INT_CST_HIGH (bound);
-  low = TREE_INT_CST_LOW (bound);
-  if (hi || low < 0
-  || ((!as || as->type != AS_ASSUMED_RANK)
-  && low >= GFC_TYPE_ARRAY_RANK (TREE_TYPE (desc)))
-  || low > GFC_MAX_DIMENSIONS)
+  if (((!as || as->type != AS_ASSUMED_RANK)
+   && wi::geu_p (bound, GFC_TYPE_ARRAY_RANK (TREE_TYPE (desc
+  || wi::gtu_p (bound, GFC_MAX_DIMENSIONS))
 gfc_error ("'dim' argument of %s intrinsic at %L is not a valid "
"dimension index", upper ? "UBOUND" : "LBOUND",
&expr->where);

I don't see what happened to the "low < 0" check. (Ditto for the next 
chunk in conv_intrinsic_cobound).


Otherwise, it looks okay to me.

Tobias

This is the magic of using the correct representation. All this code 
really wanted to check is that bound is a small positive integer.





Re: wide-int, fortran

2013-11-24 Thread Kenneth Zadeck


On 11/24/2013 05:16 AM, N.M. Maclaren wrote:

On Nov 23 2013, Andrew Pinski wrote:

On Sat, Nov 23, 2013 at 12:16 PM, Steve Kargl
 wrote:

On Sat, Nov 23, 2013 at 11:21:21AM -0800, Mike Stump wrote:


Richi has asked the we break the wide-int patch so that the 
individual port and front end maintainers can review their parts 
without have to go through the entire patch. This patch covers the 
fortran front end.


Ok?

+  *logical = wi::eq_p (t, 0) ? 0 : 1;


I can't find the meaning of :: in n1256.pdf.  What does this do?


wi:: eq_p means the function eq_p inside the wi struct.


But you can't tell that from the code - wi might be a namespace, and
eq_p might be a class.  If there is a clear description of the subset
of C++ that the front-end is allowed to use, a pointer to it for the
benefit of Fortran/C/Ada/whatever people would be useful.  But that's
an aside from this thread.
There is a saying in the US that "you can always tell the pioneers 
because they are the ones with the arrows in their back."   I feel this 
way with respect to C++.   When this branch first went public several 
months ago, the wide-int class made very modest use of C++.   It has 
publicly evolved away from that and i would now describe the usage as 
somewhat aggressive. I do not know if i think that this is good or 
bad, but i expect that because we are the first big patch to use C++ 
aggressively , that we are going to take some arrows.





Also, given the complete lack of a description of what this
patch does and no pointer to a discussion of what this
patch does, and no description of its benefit to gfortran,
I vote "no".


The general description was in a different email:
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02877.html

The main benefit is it allows for targets to support wider integer
than two times HOST_WIDE_INT.  So gfortran, is that it connects to the
rest of the middle-end of GCC.


Hmm.  Looking at that makes me none the wiser, and even a web search
doesn't do more than tell me the same aspects.  Given that Fortran has
somewhat different constraints on type widths than C, it would be very
useful to know exactly what you mean by that.  C++ is almost entirely
irrelevant here.

A useful way to think about this patch is that it is nothing but 
plumbing.Gcc has had a painful and long evolution from being a 32 
bit compiler to a 64 bit compiler and beyond.The primary goal of 
this patch is change the underlying data structures used to represent 
integer constants from being either single host wide ints (that were 
limited to 64 bits) and double-int (that worked reliably for 127 bits 
and mostly ok for 128 bits) to a data structure that would just work 
reliably for any precision integer that a back end supported.


But at both ends of the compiler there are still limits.It is 
expected that with this patch in place, that a back end maintainer can 
now, for the first time, support architectures that support integers 
wider than 128 bits.  We have several public ports that appear to have 
tried to support some operations on 256 bits but when we tried this on 
our private port, we found an intolerable number of ices and incorrect 
code issues.They are now gone.


To the extent that this patch touches the front ends, this patch is 
strictly plumbing.   The front ends now use the new wide-int class, or 
the new int-cst to pass integer constants around and do constant prop on 
them.Beyond that, it is up to the front end maintainers to decide 
how or even if they will expose this to the users of the language. C 
and C++ currently allow the back end to tell it that they can use wider 
types but actually specifying a large integer constant is at best painful.

Now, obviously to an implementor, it doesn't mean unlimited-width types,
but does it (a) allow implementations to use hardware/firmware/emulated
built-in types however wide they are, (b) allow arbitrary-width types,
or (c) something else?  In all cases, I can see implementation issues
in gfortran that do not arise in gcc, but they are different between
those models.
the patch sniffs the port (in particular, it looks at the extra modes 
specified by the port), and allows integers that are some multiple of 
size of the largest mode specified by the back end. In this way ports 
for small machines pay less of price than the big ports will.


One could imagine that this is not good enough if a front end wanted to 
say that it supports everything up to 256 bits no matter what the 
hardware can do natively, much the way that java said that every host 
had to do 64 bit math, even if it was just a 32 bit host. However, that 
would require a lot more work than was done here. In particular, at some 
level someone would have to sniff the port and break the math into 
pieces that could be implemented on the port.




So I agree that some clarification would be a good idea, to avoid
introducing secondary problems by accident.  Quite likely there will be

Re: wide-int, i386

2013-11-24 Thread Kenneth Zadeck


On 11/24/2013 05:47 AM, Uros Bizjak wrote:

On Sun, Nov 24, 2013 at 11:23 AM, Kenneth Zadeck
 wrote:

We did not do this kind of transformation for any port beyond the minimum of 
having the port use wide-int rather than double-int.   we did do a lot of this 
in the common code, especially in the code that was just not correct for types 
beyond 64 bits.

Our motivation was that this is already a huge patch and going down that road 
for some of the crusty old ports would have made the patch just bigger. so we 
limited ourselves to the places in the common code that were obstructions to 
port writers to make large types work.

I should point out that there are even a lot of places in the common code where 
we left the old code alone as long as it was correct for larger types.   For 
testing purposes, we made no changes that were not bit for bit compatible for 
code of 64 bits or shorter.   There are some places where the right 
transformation would yield better code, but we left those for later so we could 
test this patch in a sane way.

I see. I was just wondering about those "obvious" places.

The patch is fairly mechanical (BTW: there are some unnecessary
whitespace changes that obscure real change), so it looks OK to me.

thanks.

Thanks,
Uros.




Re: wide-int, i386

2013-11-24 Thread Kenneth Zadeck
We did not do this kind of transformation for any port beyond the minimum of 
having the port use wide-int rather than double-int.   we did do a lot of this 
in the common code, especially in the code that was just not correct for types 
beyond 64 bits.

Our motivation was that this is already a huge patch and going down that road 
for some of the crusty old ports would have made the patch just bigger. so we 
limited ourselves to the places in the common code that were obstructions to 
port writers to make large types work.

I should point out that there are even a lot of places in the common code where 
we left the old code alone as long as it was correct for larger types.   For 
testing purposes, we made no changes that were not bit for bit compatible for 
code of 64 bits or shorter.   There are some places where the right 
transformation would yield better code, but we left those for later so we could 
test this patch in a sane way.

Kenny
On Nov 24, 2013, at 3:34 AM, Uros Bizjak  wrote:

> On Sat, Nov 23, 2013 at 8:22 PM, Mike Stump  wrote:
>> Richi has asked the we break the wide-int patch so that the individual port 
>> and front end maintainers can review their parts without have to go through 
>> the entire patch.This patch covers the i386 port.
> 
> Should this patch also address construction of 128bit immediates? One
> example is in ix86_build_signbit_mask, where we have to use special
> tricks to create them.
> 
> You can find other interesting places by grepping config/i386/*
> sources for HOST_BITS_PER_WIDE_INT checks. These remains from times
> when wide_int was 32 for i686 host, and IMO should be addressed by
> wide-int patch.
> 
> Thanks,
> Uros.


Re: wide-int, aarch64

2013-11-23 Thread Kenneth Zadeck


On 11/23/2013 04:36 PM, Andrew Pinski wrote:

On Sat, Nov 23, 2013 at 11:19 AM, Mike Stump  wrote:

Richi has asked the we break the wide-int patch so that the individual port and 
front end maintainers can review their parts without have to go through the 
entire patch.This patch covers the aarch64 port.


+  wide_int w = real_to_integer (&m, &fail, HOST_BITS_PER_WIDE_INT * 2);

Should we have a gcc_assert (!fail); after the real_to_integer?  Other
than that this seems obvious.
It actually is not obvious.   In general we did not add checking where 
there was none before.   the fact that this interface allows the 
checking the trunk interface did not does not mean that we are obligated 
to do that.

Thanks,
Andrew



Ok?





Re: wide-int, arc

2013-11-23 Thread Kenneth Zadeck


On 11/23/2013 08:19 PM, Joern Rennecke wrote:

On 23 November 2013 19:19, Mike Stump  wrote:

Richi has asked the we break the wide-int patch so that the individual port and 
front end maintainers can review their parts without have to go through the 
entire patch.This patch covers the arc port.

Ok?


wide-int.h says widest_int is always signed, yet the iterations count
in the doloop interface is now unsigned.  So, which is right, the code
or the documentation?

I see that dfp.c and vax.c directly include wide-int.h,
but aarch64.c and arc.c don't.  Is that safe?
if the compiler produced an error message about not finding something, 
we added the include.   It is possible that some of these may be 
redundant as things have moved around, but safety is not an issue.

Is there a rationale which files directly include wide-int.h and which
ones don't?




Re: wide-int, dwarf

2013-11-23 Thread Kenneth Zadeck


On 11/23/2013 08:47 PM, Jason Merrill wrote:

On 11/23/2013 02:21 PM, Mike Stump wrote:

-  if (SCALAR_FLOAT_MODE_P (mode))
+#if TARGET_SUPPORTS_WIDE_INT == 0
+  if (!SCALAR_FLOAT_MODE_P (mode))
+{
+  mem_loc_result->dw_loc_oprnd2.val_class
+= dw_val_class_const_double;
+  mem_loc_result->dw_loc_oprnd2.v.val_double
+= rtx_to_double_int (rtl);
+}
+  else
+#endif
 {
   unsigned int length = GET_MODE_SIZE (mode);
   unsigned char *array


So if the target supports wide ints you'll always use the scalar float 
code?  Why does that do the right thing?


Jason

if TARGET_SUPPORTS_WIDE_INT != 0, then integers are NEVER placed in 
const-doubles.  large integers go into CONST_WIDE_INTS  so this case 
always represents a floating point constant and nothing else.   So yes.


I think that you could argue that the comment above this frag needs to 
be rewritten because it is wrong if TARGET_SUPPORTS_WIDE_INT != 0.


kenny


Re: wide-int

2013-11-22 Thread Kenneth Zadeck


On 11/22/2013 03:03 PM, Richard Biener wrote:

On Thu, Nov 21, 2013 at 11:08 PM, Mike Stump  wrote:

This patch adds support for ints wider than double_int.

Ok?


Please split the patch into pieces.  I suggest to separate changes to
the various frontends
(CC maintainers), the new wide-int files, the example rs6000
conversion and the rest.

Otherwise I'm sure you'll get no attention.

we are happy to break the patch any way that makes since.


Otherwise I'm sure you need to do some teaching about the different
wide-int kinds
and their behavior.

i will start writing some documentation.


Is the patch an old revision?  Because I see the int_const_binop_1
code that I asked
Kenny to revert to trunk behavior (return TREE_OVERFLOW flagged
constants, not NULL_TREE).
this is a special case.what we did with the stor-layout patch was 
that we committed the fix the the branch simultaneously with the fix 
going on the trunk.


i have the fix for the wide-int branch, i was waiting for you to approve 
the fix for trunk.
(as a side point i think that their is a third test case that i am going 
to have fix for it.   it is not c or c++   i have a build with every 
language going on now to check this. )



+#ifndef WIDE_INT_PRINT_H
+#define WIDE_INT_PRINT_H
+
+#include 
+#include "wide-int.h"
+

system headers must be only included from [t]system.h

For the merge you want to have trunk frozen and you want to be
available during the following
days (I do expect fallout).
unless you want to go the route of approving the wide-int* first, the 
rtl second    But yes, i think that we would request a sunday to do 
the commit and be around for the next few days.


kenny


Thanks,
Richard.




Re: wide-int

2013-11-22 Thread Kenneth Zadeck
I am sorry that in the haste of battle that mike did not have an 
opportunity to write a proper introduction to the is patch.  The patch 
was submitted last night so that it could be formally submitted by the 
end of stage 1.


This patch is the same as the top of the wide-int branch that has been 
publicly available for public view and comment for the last several 
months.   During that time it has significantly evolved and improved.   
The wide-int interface has changed in significant ways as well as it has 
now been tested on at least one instance of every active architecture 
and has been tested on one or two architectures for every single language.


The wide-int branch deals with a fundamental data structure problem in 
the current gcc trunk.Integer math larger than 64 bits is generally 
buggy and beyond 128 bits is completely wrong and generally causes the 
compiler to ICE.   This patch fixes all of the 128 bits and below 
problems for every port, and if certain changes are made to the port 
(see the part of the diff relating to the RS6000 for an example) can 
support integers of any size.


Currently there are a few public ports that have some code that uses 
OImode.   The usage of OImode in those ports is currently limited. I 
personally do not know if it is because the architectures make very 
limited use of OImode, or if the port maintainers could not justify 
doing the work that we have done here. However,  when this patch is 
accepted, GCC will be able support any length integer for any kind of 
operation.


Kenny

On 11/21/2013 05:08 PM, Mike Stump wrote:

This patch adds support for ints wider than double_int.

Ok?





Re: three problems with stor-layout.c.

2013-11-22 Thread Kenneth Zadeck

committed as revision 205260.

thanks

kenny

On 11/22/2013 03:58 AM, Richard Biener wrote:

On Thu, 21 Nov 2013, Kenneth Zadeck wrote:


Richi,

Here is the patch.   As you can see, i chose the unsigned option.
It was bootstrapped and tested on x86 with all languages including ada.
Ok to commit?

Ok.

Thanks,
Richard.


kenny

2013-11-21  zad...@naturalbridge.com
 * store-layout.c
 (place-field): Fix hwi test and accessor mismatch.




On 11/21/2013 11:07 AM, Eric Botcazou wrote:

I think most of these are because in the past (yes I have fixed that!!)
all 'sizetype' constants were sign-extended (and the signedness,
that is, TYPE_UNSIGNED (sizetype), was frontend dependend (ugh) and
then later true, thus unsigned).

So I think all _SIZE stuff should check fits_uhwi_p and be used as
uhwi.  But that may have ripple-down effects, so consistently
using fits_shwi_p and using as shwi is also fine (it just restricts
the maximum values we accept(?)).

And please make sure to test Ada if you're tweaking this, it's fragile
stuff.


Index: gcc/stor-layout.c
===
--- gcc/stor-layout.c(revision 205259)
+++ gcc/stor-layout.c(working copy)
@@ -1204,7 +1204,7 @@ place_field (record_layout_info rli, tre
   unsigned int type_align = TYPE_ALIGN (type);
   tree dsize = DECL_SIZE (field);
   HOST_WIDE_INT field_size = tree_to_uhwi (dsize);
-  HOST_WIDE_INT offset = tree_to_shwi (rli->offset);
+  HOST_WIDE_INT offset = tree_to_uhwi (rli->offset);
   HOST_WIDE_INT bit_offset = tree_to_shwi (rli->bitpos);

 #ifdef ADJUST_FIELD_ALIGN
@@ -1248,7 +1248,7 @@ place_field (record_layout_info rli, tre
   unsigned int type_align = TYPE_ALIGN (type);
   tree dsize = DECL_SIZE (field);
   HOST_WIDE_INT field_size = tree_to_uhwi (dsize);
-  HOST_WIDE_INT offset = tree_to_shwi (rli->offset);
+  HOST_WIDE_INT offset = tree_to_uhwi (rli->offset);
   HOST_WIDE_INT bit_offset = tree_to_shwi (rli->bitpos);

 #ifdef ADJUST_FIELD_ALIGN
@@ -1304,7 +1304,7 @@ place_field (record_layout_info rli, tre
   && !integer_zerop (DECL_SIZE (field))
   && !integer_zerop (DECL_SIZE (rli->prev_field))
   && tree_fits_shwi_p (DECL_SIZE (rli->prev_field))
-  && tree_fits_shwi_p (TYPE_SIZE (type))
+  && tree_fits_uhwi_p (TYPE_SIZE (type))
   && simple_cst_equal (TYPE_SIZE (type), TYPE_SIZE (prev_type)))
 {
   /* We're in the middle of a run of equal type size fields; make


Re: three problems with stor-layout.c.

2013-11-21 Thread Kenneth Zadeck

Richi,

Here is the patch.   As you can see, i chose the unsigned option.
It was bootstrapped and tested on x86 with all languages including ada.
Ok to commit?

kenny

2013-11-21  zad...@naturalbridge.com
* store-layout.c
(place-field): Fix hwi test and accessor mismatch.




On 11/21/2013 11:07 AM, Eric Botcazou wrote:

I think most of these are because in the past (yes I have fixed that!!)
all 'sizetype' constants were sign-extended (and the signedness,
that is, TYPE_UNSIGNED (sizetype), was frontend dependend (ugh) and
then later true, thus unsigned).

So I think all _SIZE stuff should check fits_uhwi_p and be used as
uhwi.  But that may have ripple-down effects, so consistently
using fits_shwi_p and using as shwi is also fine (it just restricts
the maximum values we accept(?)).

And please make sure to test Ada if you're tweaking this, it's fragile stuff.



Index: gcc/stor-layout.c
===
--- gcc/stor-layout.c	(revision 205220)
+++ gcc/stor-layout.c	(working copy)
@@ -1204,7 +1204,7 @@ place_field (record_layout_info rli, tre
   unsigned int type_align = TYPE_ALIGN (type);
   tree dsize = DECL_SIZE (field);
   HOST_WIDE_INT field_size = tree_to_uhwi (dsize);
-  HOST_WIDE_INT offset = tree_to_shwi (rli->offset);
+  HOST_WIDE_INT offset = tree_to_uhwi (rli->offset);
   HOST_WIDE_INT bit_offset = tree_to_shwi (rli->bitpos);
 
 #ifdef ADJUST_FIELD_ALIGN
@@ -1248,7 +1248,7 @@ place_field (record_layout_info rli, tre
   unsigned int type_align = TYPE_ALIGN (type);
   tree dsize = DECL_SIZE (field);
   HOST_WIDE_INT field_size = tree_to_uhwi (dsize);
-  HOST_WIDE_INT offset = tree_to_shwi (rli->offset);
+  HOST_WIDE_INT offset = tree_to_uhwi (rli->offset);
   HOST_WIDE_INT bit_offset = tree_to_shwi (rli->bitpos);
 
 #ifdef ADJUST_FIELD_ALIGN
@@ -1304,7 +1304,7 @@ place_field (record_layout_info rli, tre
 	  && !integer_zerop (DECL_SIZE (field))
 	  && !integer_zerop (DECL_SIZE (rli->prev_field))
 	  && tree_fits_shwi_p (DECL_SIZE (rli->prev_field))
-	  && tree_fits_shwi_p (TYPE_SIZE (type))
+	  && tree_fits_uhwi_p (TYPE_SIZE (type))
 	  && simple_cst_equal (TYPE_SIZE (type), TYPE_SIZE (prev_type)))
 	{
 	  /* We're in the middle of a run of equal type size fields; make


Re: Some wide-int review comments

2013-11-20 Thread Kenneth Zadeck

On 11/13/2013 04:59 AM, Richard Biener wrote:

On Tue, Nov 12, 2013 at 5:43 PM, Kenneth Zadeck
 wrote:

On 11/12/2013 11:27 AM, Joseph S. Myers wrote:

On Tue, 12 Nov 2013, Kenneth Zadeck wrote:


Richi,

i am having a little trouble putting this back the way that you want.
The
issue is rem.
what is supposed to happen for INT_MIN % -1?

I would assume because i am failing the last case of
gcc.dg/c90-const-expr-8.c
that INT_MIN %-1 should not overflow even if INT_MIN / -1 does.
however,

Given the conclusion in C11 that a%b should be considered undefined if a/b
is not representable, I think it's reasonable to say INT_MIN % -1 *should*
be considered to overflow (for all C standard versions) (and bug 30484 is
only a bug for -fwrapv).


however, my local question is what do we want the api to be
int-const-binop-1?The existing behavior seems to be that at least for
common modes this function silently returns 0 and it is up to the front ends
to put their own spin on it.

For wide-int you create 1:1 the behavior of current trunk (if a change of
behavior in TImode is not tested in the testsuite then you can ignore that).
Whatever change you do to semantics of functions you do separately
from wide-int (preferably first on trunk, or at your choice after the wide-int
merge).

For this case in question I'd say a % -1 should return 0, but for
INT_MIN % -1 that 0 should have TREE_OVERFLOW set (and
thus you need to adjust that c90-const-expr-8.c testcase).

Richard.


kenny

richi,
I have done this exactly as you suggested.   bootstrapped and regression 
tested on x86-64.


2013-11-20  Kenneth Zadeck  

* fold-const.c
(int_const_binop_1): Make INT_MIN % -1 return 0 with the overflow
bit set.


2013-11-20  Kenneth Zadeck  

* gcc.dg/c90-const-expr-8.c: Look for overflow on INT_MIN % -1.
* gcc.dg/c99-const-expr-8.c: Look for overflow on INT_MIN % -1.

ok to commit?

kenny

Index: gcc/fold-const.c
===
--- gcc/fold-const.c	(revision 205161)
+++ gcc/fold-const.c	(working copy)
@@ -1105,7 +1105,22 @@ int_const_binop_1 (enum tree_code code,
 case ROUND_MOD_EXPR:
   if (op2.is_zero ())
 	return NULL_TREE;
-  tmp = op1.divmod_with_overflow (op2, uns, code, &res, &overflow);
+
+  /* Check for the case the case of INT_MIN % -1 and return
+   overflow and result = 0.  The TImode case is handled properly
+   in double-int.  */
+  if (TYPE_PRECISION (type) <= HOST_BITS_PER_WIDE_INT 
+	  && !uns
+  && op2.is_minus_one () 
+	  && op1.high == (HOST_WIDE_INT) -1
+	  && (HOST_WIDE_INT) op1.low 
+	  == (((HOST_WIDE_INT)-1) << (TYPE_PRECISION (type) - 1)))
+	{
+	  overflow = 1;
+	  res = double_int_zero;
+	}
+  else
+	tmp = op1.divmod_with_overflow (op2, uns, code, &res, &overflow);
   break;
 
 case MIN_EXPR:
Index: gcc/testsuite/gcc.dg/c90-const-expr-8.c
===
--- gcc/testsuite/gcc.dg/c90-const-expr-8.c	(revision 205161)
+++ gcc/testsuite/gcc.dg/c90-const-expr-8.c	(working copy)
@@ -23,5 +23,6 @@ enum e {
   /* { dg-error "3:overflow in constant expression" "constant" { target *-*-* } 22 } */
   E6 = 0 * !-INT_MIN, /* { dg-warning "13:integer overflow in expression" } */
   /* { dg-error "8:not an integer constant" "constant" { target *-*-* } 24 } */
-  E7 = INT_MIN % -1 /* Not an overflow.  */
+  E7 = INT_MIN % -1 /* { dg-warning "16:integer overflow in expression" } */
+  /* { dg-error "1:overflow in constant expression" "constant" { target *-*-* } 28 } */
 };
Index: gcc/testsuite/gcc.dg/c99-const-expr-8.c
===
--- gcc/testsuite/gcc.dg/c99-const-expr-8.c	(revision 205161)
+++ gcc/testsuite/gcc.dg/c99-const-expr-8.c	(working copy)
@@ -23,5 +23,6 @@ enum e {
   /* { dg-error "overflow in constant expression" "constant" { target *-*-* } 22 } */
   E6 = 0 * !-INT_MIN, /* { dg-warning "integer overflow in expression" } */
   /* { dg-error "not an integer constant" "constant" { target *-*-* } 24 } */
-  E7 = INT_MIN % -1 /* Not an overflow.  */
+  E7 = INT_MIN % -1 /* { dg-warning "16:integer overflow in expression" } */
+  /* { dg-error "1:overflow in constant expression" "constant" { target *-*-* } 28 } */
 };


  1   2   3   4   >