fixed misplaced testcase
2015-09-01 Kenneth Zadeck * gcc.c-torture/execute/ieee/2320-1.c Fixed misplaced test case. This was approved offline by Mike Stump. committed as revision 227389. Kenny --- gcc/testsuite/gcc.c-torture/execute/ieee/2320-1.c (revision 227385) +++ gcc/testsuite/gcc.c-torture/execute/ieee/2320-1.c (working copy) @@ -35,10 +35,10 @@ int main() || sizeof (double) != sizeof (ull)) exit (0); - c(0x3690ULL, 0xU); #if (defined __arm__ || defined __thumb__) && ! (defined __ARMEB__ || defined __VFP_FP__) /* The ARM always stores FP numbers in big-wordian format, even when running in little-byteian mode. */ + c(0x3690ULL, 0xU); c(0x00013690ULL, 0x0001U); c(0x369fULL, 0x0001U); c(0x36A0ULL, 0x0001U); @@ -61,6 +61,7 @@ int main() c(0x50003810ULL, 0x0082U); c(0x50013810ULL, 0x0083U); #else + c(0x3690ULL, 0xU); c(0x3691ULL, 0x0001U); c(0x369fULL, 0x0001U); c(0x36A0ULL, 0x0001U);
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01604.html
I had the following conversation with richi about this patch. Sorry to reply off thread, but i do net read this group in my mailer. [09:00]zadeckrichi: i am reviewing a patch and i have a couple of questions, do you have a second to look at something? [09:00]richizadeck: sure [09:01]zadeckthe patch is https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01604.html [09:01]zadeckhe has set up the data flow problem correctly, but i worry that this really is not the right way to solve this problem. [09:02]richilet me look at the problem [09:03]zadeckin particular, the only way he can demonstrate this problem in c is with an uninitialized variable.It seems that a normal correct program would not normally have this kind of issue unless there was some upstream bug coming out of the middle end. [09:04]richieven for his Ada case it's not initialized it seems [09:06]richizadeck: I've added a comment to the PR and requested info [09:06]zadeckmy ada is as good as my German. [09:07]zadeckthe thing is that if you turn this into a truly must problem, it will disable a lot of legit transformations. [09:08]richiyep [09:09]zadeckthanks [09:09]richilets see if he can point to a different issue [09:09]richior produce a C testcase for us [09:13]jakubrichi: yeah; if it is Ada only thing and Ada uninitialized variable must have some special properties, then they'd better use some on the side flag for whether it is initialized, or zero initialize or whatever Ada expects [09:15]richijakub: from what I understand of Ada the testcase doesn't look like such a case [09:16]richijakub: so the actual bug is likely somewhere else Kenny
Re: [PATCH, ping1] Fix removing of df problem in df_finish_pass
As a dataflow maintainer, I approve this patch for the next release. However, you will have to get approval of a release manager to get it into 5.0. On 04/20/2015 04:22 AM, Thomas Preud'homme wrote: Ping? -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme Sent: Tuesday, March 03, 2015 12:02 PM To: 'Bernhard Reutner-Fischer'; gcc-patches@gcc.gnu.org; 'Paolo Bonzini'; 'Seongbae Park'; 'Kenneth Zadeck' Subject: RE: [PATCH] Fix removing of df problem in df_finish_pass From: Bernhard Reutner-Fischer [mailto:rep.dot@gmail.com] Sent: Saturday, February 28, 2015 4:00 AM use df_remove_problem rather than manually removing problems, living leaving Indeed. Please find updated changelog below: 2015-03-03 Thomas Preud'homme * df-core.c (df_finish_pass): Iterate over df- problems_by_index[] and use df_remove_problem rather than manually removing problems, leaving holes in df->problems_in_order[]. Best regards, Thomas
Re: patch to fix rtl documentation for new floating point comparisons
> On Feb 18, 2015, at 3:23 AM, Joseph Myers wrote: > >> On Tue, 17 Feb 2015, Kenneth Zadeck wrote: >> >> The fp exceptions raise some very tricky issues with respect to gcc and >> optimization. On many machines, noisy does not mean to throw an >> exception, it means that you set a bit and then check later. If you try >> to model this kind of behavior in gcc, you end up pinning the code so >> that nothing can be moved or reordered. > > When I say exception here, I'm always referring to that flag bit setting, > not to processor-level exceptions. In IEEE 754 terms, an exception is > *signaled*, and the default exception handling is to *raise* a flag and > deliver a default result (except for exact underflow which doesn't raise > the flag). > > To quote Annex F, "This specification does not require support for trap > handlers that maintain information about the order or count of > floating-point exceptions. Therefore, between function calls, > floating-point exceptions need not be precise: the actual order and number > of occurrences of floating-point exceptions (> 1) may vary from what the > source code expresses.". So it is not necessary to be concerned about > configurations where trap handlers may be called. > > There is as yet no public draft of TS 18661-5 (Supplementary attributes). > That will provide C bindings for alternate exception handling as described > in IEEE 754-2008 clause 8. I suspect such bindings will not readily be > efficiently implementable using processor-level exception handlers; SIGFPE > is an awkward interface for implementing such things at the C language > level, some processors do not support such trap handlers at all (e.g. many > ARM processors), and where traps are supported they may be asynchronous > rather than occurring immediately on execution of the relevant > instruction. In addition, at least x86 does not support raising exception > flags without running trap handlers on the next floating-point instruction > (raiseFlags operation, fesetexcept in TS 18661-1); that is, if trap > handlers were used to implement standard functionality, it would need to > be in a way such that this x86 peculiarity is not visible. my point here is that what you want to be able to do is freely reorder the fp operations ( within the rules of reordering fp operations) between places were those bits are explicitly read or cleared. were have no way to model that chain of modify operations in gcc. > >> to get this right gcc needs something like a monotonic dependency which >> would allow reordering and gcc has nothing like this. essentially, you >> need way to say that all of these insns modify the same variable, but >> they all just move the value in the same direction so you do not care >> what order the operations are performed in. that does not mean that >> this could not be added but gcc has nothing like this. > > Indeed, this is one of the things about defining the default mode that I > referred to; the present default is -ftrapping-math, but we may wish to > distinguish between strict trapping-math (whenever exception flags might > be tested / raised / lowered, exactly the computations specified by the > abstract machine have occurred, which might mean rather more limits on > code movement in the absence of monotonic dependencies) and loose trapping > math (like the present default; maybe don't transform expressions locally > in ways that add or remove exceptions, but don't treat an expression as > having side effects or reading global state purely because of possible > raising of floating-point exceptions). > >> going back to the rounding modes issue, there is a huge range in the >> architectural implementation space. you have a few that are pure >> dynamic, a few that are pure static and some in the middle that are just >> a mess. a lot of machines would have liked to support fully static, but >> could not fit the bits to specify the rounding modes into the >> instruction. my point here is you do need to at least have a plan that >> will support the full space even if you do this with a 1000 small >> patches. > > I think the norm is dynamic, because that's what was in IEEE 754-1985, > with static rounding added more recently on some processors, because of > IEEE 754-2008. (There are other variants - IA64 having multiple dynamic > rounding mode registers and allowing instructions to specify which one the > rounding mode is taken from.) the first ieee standard only allowed the dynamic model. the second allows the static model. while dynamic is more common, there are/were architectures that ar
Re: patch to fix rtl documentation for new floating point comparisons
> On Feb 17, 2015, at 6:01 PM, Joseph Myers wrote: > >> On Tue, 17 Feb 2015, Kenneth Zadeck wrote: >> >>> On 02/17/2015 07:05 AM, Joseph Myers wrote: >>>> On Tue, 17 Feb 2015, Richard Earnshaw wrote: >>>> >>>> So the problem we have today is the compiler has no way to distinguish >>>> between, say, < and __builtin_isless. According to Annex F (c99) the >>>> former should be signalling while the latter quiet. >>> We do have a way: < is LT and __builtin_isless is !UNGE. >>> >>> __builtin_islessgreater is !UNEQ. The question is whether it's also LTGT >>> or whether LTGT means LT || GT. And the existing documentation of >>> LTGT_EXPR leaves this unspecified, which seems clearly unhelpful. Either >>> way, you have existing code in GCC that's incorrect (i.e. that does not >>> correspond to the set of transformations that are actually valid for the >>> chosen semantics). >> Having spent the majority of my gcc life at the rtl level, i find it >> astonishing that the designers of the tree level allowed this machine >> dependency to get into what i was always told was a pristine machine >> independent pass.i would have thought that the at the tree level this >> could/should be nailed down to being quiet and if someone wanted to support a >> noisy version in their port, then they would look for lt || gt when >> converting >> to the rtl level which has never been so pristine. > > As I said in <https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00583.html>, I > can't find any posting to gcc-patches of the r82467 commit that introduced > the "possible exception" wording. > >> it would be nice to know exactly how many ports (if any) actually have a >> noisy >> version of this in hardware. ltgt is not a common primitive (it is not even >> mentioned in the ieee fp standards). > > For example, on MIPS the C.cond.fmt instruction has a four-bit condition > field: "In the cond field of the instruction: cond 2..1 specify the nature > of the comparison (equals, less than, and so on); cond 0 specifies whether > the comparison is ordered or unordered, i.e. false or true if any operand > is a NaN; cond 3 indicates whether the instruction should signal an > exception on QNaN inputs, or not". Together with possibly negating the > result you get all 32 possible comparisons (choice of whether the > comparison is true or false for each of = < > unordered, choice of whether > to raise invalid for quiet NaNs). > This is a pretty weak motivation.Computer architects love this kind of thing, assuming they have the opcode space. Just give them every possible combination and let the programmers decide what is useful - and by doing that, the architect saves a couple of muxes and gate delays.But that doesn't mean that the layers of software need to support all of this, Especially , in this case when there is not motivation from either the fp standards or the language standards. >>> I think the main difficulty in proper Annex F support would be making >>> optimizers (on each IR) understand the side-effects operations have in >>> terms of raising exceptions, and how operations may take the rounding mode >>> or existing exceptions raised as inputs - with an associated issue of >>> defining the existing default floating-point rules well enough to keep a >>> default mode that doesn't unduly inhibit optimization. >> i have been thinking a lot about the rounding modes.the only way that >> could think about properly supporting them was to actually add new rtl and >> tree operations that take the rounding mode as an extra parameter.i think >> that this is going to be the only way to support both the static and dynamic >> models. This will be ugly, but having just finished the wide int patch it >> is >> natural to say "how bad could it be?" some ports will want to support this >> and some will not. > > My starting point is to presume that any port with hardware floating-point > exceptions and rounding modes should support this. But since ports would > probably need to change anyway to ensure operations do raise the right > exceptions, and to ensure that the machine-independent compiler can tell > which registers referred to in inline asm are part of the floating-point > exceptions / rounding modes state, maybe it wouldn't be so bad if they > also need to change their instruction descriptions to describe the > involvement of exceptions and rounding modes explicitly. (You do need to > handle the cas
Re: patch to fix rtl documentation for new floating point comparisons
On 02/17/2015 07:05 AM, Joseph Myers wrote: On Tue, 17 Feb 2015, Richard Earnshaw wrote: So the problem we have today is the compiler has no way to distinguish between, say, < and __builtin_isless. According to Annex F (c99) the former should be signalling while the latter quiet. We do have a way: < is LT and __builtin_isless is !UNGE. __builtin_islessgreater is !UNEQ. The question is whether it's also LTGT or whether LTGT means LT || GT. And the existing documentation of LTGT_EXPR leaves this unspecified, which seems clearly unhelpful. Either way, you have existing code in GCC that's incorrect (i.e. that does not correspond to the set of transformations that are actually valid for the chosen semantics). Having spent the majority of my gcc life at the rtl level, i find it astonishing that the designers of the tree level allowed this machine dependency to get into what i was always told was a pristine machine independent pass.i would have thought that the at the tree level this could/should be nailed down to being quiet and if someone wanted to support a noisy version in their port, then they would look for lt || gt when converting to the rtl level which has never been so pristine. it would be nice to know exactly how many ports (if any) actually have a noisy version of this in hardware. ltgt is not a common primitive (it is not even mentioned in the ieee fp standards). I suspect there are two ways we could deal with that: add new comparison RTL codes to distinguish the cases; or use something like the RTL /v bit on a comparison to indicate that it should be signalling. Of the two, the latter would probably be easiest to implement in a backwards compatible manner (backends not understanding /v would continue to use their existing code paths), but it would still take a fair amount of rejigging in the mid end to fully preserve the signalling nature of comparisons: there are many places where just RTX_CODE is available and a new pattern is generated from that. The first method would require all back-ends to be updated pretty much simultaneously to handle the new RTL codes. I don't know the optimal way of representing these variants in GENERIC, GIMPLE and RTL (the existing representation can cover everything, and is unambiguous apart from LTGT, but may not be optimal). I think the main difficulty in proper Annex F support would be making optimizers (on each IR) understand the side-effects operations have in terms of raising exceptions, and how operations may take the rounding mode or existing exceptions raised as inputs - with an associated issue of defining the existing default floating-point rules well enough to keep a default mode that doesn't unduly inhibit optimization. i have been thinking a lot about the rounding modes.the only way that could think about properly supporting them was to actually add new rtl and tree operations that take the rounding mode as an extra parameter.i think that this is going to be the only way to support both the static and dynamic models. This will be ugly, but having just finished the wide int patch it is natural to say "how bad could it be?" some ports will want to support this and some will not. kenny
Re: patch to fix rtl documentation for new floating point comparisons
On 02/14/2015 03:26 PM, Paolo Bonzini wrote: On 10/02/2015 22:46, Joseph Myers wrote: It may make sense to define LTGT as exactly !UNEQ, and so quiet, but the choice of definition is a matter of what's convenient for the implementation (and which choice you make determines which existing code in GCC should be considered incorrect). It would be different from e.g. !UNLT and GE differing only in that UNLT is quiet and GE is signaling. So it makes sense to me to keep LTGT as signaling. while in theory, your argument is correct, in practice, this is not how people use this stuff and so i disagree. The interrupts are there to allow legacy code that does not know about nans to be "run" in a mode where the nans can be used to signal that things did not run well. Properly written floating point aware code never uses the interrupts.In that code, you want a rich set of comparisons which allow the programmer to efficiently deal with the fact that any comparison can go one of 4 possible ways. to support legacy code we need ne and eq to be quiet and lt gt le ge to be noisy. This is what the standards call for and this is what gcc delivers.Going beyond that for legacy code is a waste of time. kenny Paolo
Re: patch to fix rtl documentation for new floating point comparisons
I vote for quiet.For three reasons: 1) it matches the expectation of what programmers expect. I view people who do floating point as falling into two categories: a) people who wish that there were no such thing as nans. These people are happy to write programs with just < > <= >= == != and then be unhappily surprised if their code blows up. b) careful floating pt programmers. These people never want a signal, they properly choose their operators to with knowledge of how each of those operations work with respect to nans. These people will never use anything like a signaling <> 2) it matches iso n1778 which is primarily written to satisfy the needs to (b). 3) Whenever you leave something like this undefined, you are basically saying "do not optimize" On 02/10/2015 04:46 PM, Joseph Myers wrote: On Mon, 9 Feb 2015, Kenneth Zadeck wrote: I don't think it's useful to have the trapping semantics unspecified for a comparison operation like that. So the question is what's most useful for LTGT and LTGT_EXPR to do (presumably they should do the same thing). Lots of existing code in this area seems confused (for example, HONOR_SNANS should be irrelevant to reversing an equality comparison, as reversing it will change neither results nor exceptions raised, whether or not signaling NaNs are involved). But maybe more code treats LTGT as a signaling operation than otherwise, in accordance with the original intent, and so that's the most appropriate way to document it (with !UNEQ being the representation of the corresponding quiet operation). section 7.12.14.4 in C99 seems pretty much to say that this is a quiet operation. It says islessgreater is quiet. It says nothing about the LTGT RTL operation or the LTGT_EXPR GIMPLE/GENERIC operation. __builtin_islessgreater is implemented using UNEQ_EXPR not LTGT_EXPR. It may make sense to define LTGT as exactly !UNEQ, and so quiet, but the choice of definition is a matter of what's convenient for the implementation (and which choice you make determines which existing code in GCC should be considered incorrect). Where back ends implement ltgt patterns, I don't know if they are consistently quiet or signaling.
Re: [PATCH] PR 62173, re-shuffle insns for RTL loop invariant hoisting
On 02/11/2015 06:20 AM, Jiong Wang wrote: 2014-12-19 15:21 GMT+00:00 Kenneth Zadeck : however, since i am a nice person loop-invariant solves the DF_UD_CHAIN which builds a data structure that connects each use with all of the defs that reach it. I believe that this is the opposite of what you want here. if you really need this, you need to also turn on the DF_DU_CHAIN which builds the opposite structure.Both structures can be space hogs, but they are both turned on in other places in the compiler so it is unlikely to be an issue. Exactly, Thanks, Kenneth. This approach works from my experiment and look much better than previous REG_NOTE approach. while it do have one problem. We need the UD/DU chain built before we do insn re-shuffling. While after re-shuffling, UD chain needs update, otherwise, the later "check_dependecies" in find_invariant_insn may fail. although we have re-shuffle instruction 1 into 2, the later check still using old UD info, thus decide instruction 2 is not iv. 1: regA <- vfp + regB 2: regA <- vfp + const my current fix is to insert those re-shuffled insn into a table named "vfp_const_iv", then skip those dependencies check for them as they don't have any dependencies. You now are beginning to understand why Mark Wegman and I invented SSA form almost 30 years ago There is no good way to keep the information up to data as you are changing the program.To a large extent you are on your own. If what you are suggesting works, then this is likely much faster than rebuilding the chains. LOG_LINKs have nothing to do with single use; they point from the _first_ use to its corresponding def. You might want to look at what fwprop does instead. Pass rtl fwprop uses df information in single-definition way, it doesn't really take into consideration if register is a single use. This often corrupts other optimizations like post-increment and load/store pair. For example: add r2, r1, r0 ldr rx, [r2] add r2, r2, #4 is transformed into below form: add r2, r1, r0 ldr rx, [r1, r0] add r2, r2, #4 As a result, post-increment opportunity is corrupted, also definition of r2 can't be deleted because it's not single use. Thanks, bin thanks for all these suggestion. Have look at the LOG_LINK build function, a simple reverse scan, while needs to allocate big map array for all pseudo regs. Haven't looked at similar code in fwprop, actually, when found the first matched insn pattern, I just want to scan several insns next, then abort quickly if nothing meet requirement. there is no need to build full single-use information. still can anyone confirm that it is safe to re-use REG_DEAD info there without calling df_note_add_problem and df_analysis first? or I am using those info passed down from the previous pass which calculated these info and maybe broken? It's not safe to use REG_DEAD info without re-computing it. not sure that reg_dead is the right answer even if you did re-compute it. I believe you can have two parallel uses (on both sides of an if-then-else) for a single def (above the if then else) and have two REG_DEAD notes. Richard.
Re: patch to fix rtl documentation for new floating point comparisons
On 02/09/2015 06:24 PM, Joseph Myers wrote: On Mon, 9 Feb 2015, Kenneth Zadeck wrote: @findex ge @cindex greater than @@ -2603,6 +2618,10 @@ Like @code{gt} and @code{gtu} but test f @item (ge:@var{m} @var{x} @var{y}) @itemx (geu:@var{m} @var{x} @var{y}) Like @code{gt} and @code{gtu} but test for ``greater than or equal''. +If the operands are floating point, @code{ge} is a signaling +comparison and corresponds to the IEC 60559 +@code{compareSignalingGreater} operation. @code{geu} is undefined for +floating point numbers. No, compareSignalingGreaterEqual. oops @findex le @cindex less than or equal @@ -2610,7 +2629,64 @@ Like @code{gt} and @code{gtu} but test f @cindex unsigned less than @item (le:@var{m} @var{x} @var{y}) @itemx (leu:@var{m} @var{x} @var{y}) -Like @code{gt} and @code{gtu} but test for ``less than or equal''. +Like @code{gt} and @code{gtu} but test for ``less than or equal''. If +the operands are floating point, @code{le} is a signaling comparison +and corresponds to the IEC 60559 @code{compareSignalingLess} +operation. @code{leu} is undefined for floating point numbers. compareSignalingLessEqual. oops again +@table @code +@findex ltgt +@cindex less than or greater than +@item (ltgt:@var{m} @var{x} @var{y}) +@code{STORE_FLAG_VALUE} if the values represented by @var{x} and +@var{y} are less, or greater, otherwise 0. This is a quiet operation +that applies only to floating point operands and does not have a +corresponding IEC 60559 operation. On GENERIC, the documentation describes some ambiguity: "With the possible exception of @code{LTGT_EXPR}, all of these operations are guaranteed not to generate a floating point exception.". LTGT (RTL code) was added by RTH in <https://gcc.gnu.org/ml/gcc-patches/2000-01/msg00974.html>. LTGT_EXPR was added in the thread starting at <https://gcc.gnu.org/ml/gcc-patches/2004-05/msg01674.html>, wherein RTH stated "the LTGT rtl code is assumed to trap". The documentation was soon thereafter changed to have the "possible exception" wording (r82467, which I can't find any sign of having been posted to gcc-patches so don't know the rationale). I don't think it's useful to have the trapping semantics unspecified for a comparison operation like that. So the question is what's most useful for LTGT and LTGT_EXPR to do (presumably they should do the same thing). Lots of existing code in this area seems confused (for example, HONOR_SNANS should be irrelevant to reversing an equality comparison, as reversing it will change neither results nor exceptions raised, whether or not signaling NaNs are involved). But maybe more code treats LTGT as a signaling operation than otherwise, in accordance with the original intent, and so that's the most appropriate way to document it (with !UNEQ being the representation of the corresponding quiet operation). section 7.12.14.4 in C99 seems pretty much to say that this is a quiet operation. The islessgreater macro determines whether its first argument is less than or greater than its second argument. The islessgreater(x, y) macro is similar to (x) < (y) || (x) > (y); howev er, islessgreater(x, y) does not raise the ‘‘invalid’’ floating-point exception when x and y are unordered (nor does it evaluate x and y twice). So i think that it seems that we should be pretty safe saying it is quiet. It's not clear to me that ad hoc logic around particular operations is the best way of handling optimizations in this area, instead of generic logic using four bits to track which conditions (< = > unordered) a comparison is true for and one bit to track whether it raises an exception for unordered operands. Even if you keep using particular sets of tree / RTL codes for a subset of operations, maybe transformations that map into and out of such five-bit form would be more likely to be correct. I will work on the tree version tomorrow and resubmit. Kenny
Re: patch to fix rtl documentation for new floating point comparisons
On 02/09/2015 06:26 PM, Richard Henderson wrote: On 02/09/2015 11:10 AM, Kenneth Zadeck wrote: +@table @code +@findex ltgt +@cindex less than or greater than +@item (ltgt:@var{m} @var{x} @var{y}) +@code{STORE_FLAG_VALUE} if the values represented by @var{x} and +@var{y} are less, or greater, otherwise 0. This is a quiet operation +that applies only to floating point operands and does not have a +corresponding IEC 60559 operation. This is intended to match c99 7.12.14.5 islessgreater, which I believe is the compareQuietNotEqual operation. I think that the description that I have is correct. According to IEEE 754-2008, table 5.1, page 29, compareQuietNotEqual is defined to have the true relations of LT GT UN.What appears to be described in c99 7.12.14.5 is a quiet version with relations LT GT and so returns false if either operand is a nan. This does not have a defined name in IEEE 754-2008, but if it did would have the name compareQuietLessGreater according to their naming conventions. +@table @code +@findex uneq +@cindex unordered or equal +@item (uneq:@var{m} @var{x} @var{y}) +@code{STORE_FLAG_VALUE} if the values represented by @var{x} and +@var{y} are unordered or equal, otherwise 0. This is a quiet +operation that applies only to floating point operands and does not +have a corresponding IEC 60559 operation. This is the strict inverse to LTGT, i.e. !compareQuietNotEqual, and you of course note this is not the same as compareQuietEqual. I believe that when the c99 math.h comparison builtins were added, treating EQ+NE as quiet comparisons was common but not universal. It's possible this could be simplified now. I agree that this is the strict inverse of LTGT, but as the name of this implies, if either operand is a nan, this returns true. As such, the relations are UN and EQ which matches the name. r~
patch to fix rtl documentation for new floating point comparisons
Before I commit this, or do the corresponding tree level patch, I would like this to be looked at by the FP people. I am guessing at some of this, other parts i gleaned from various standards docs and an irc conversation with Joseph. This should have been documented when it was put into the compiler. This is certainly not obvious. Kenny Index: gcc/doc/rtl.texi === --- gcc/doc/rtl.texi (revision 220541) +++ gcc/doc/rtl.texi (working copy) @@ -2553,40 +2553,52 @@ of comparisons is supported on a particu pass will try to merge the operations to produce the @code{eq} shown in case it exists in the context of the particular insn involved. -Inequality comparisons come in two flavors, signed and unsigned. Thus, -there are distinct expression codes @code{gt} and @code{gtu} for signed and -unsigned greater-than. These can produce different results for the same -pair of integer values: for example, 1 is signed greater-than @minus{}1 but not -unsigned greater-than, because @minus{}1 when regarded as unsigned is actually +Fixed point inequality comparisons come in two flavors, signed and +unsigned. Thus, there are distinct expression codes @code{gt} and +@code{gtu} for signed and unsigned greater-than. These can produce +different results for the same pair of integer values: for example, 1 +is signed greater-than @minus{}1 but not unsigned greater-than, +because @minus{}1 when regarded as unsigned is actually @code{0x} which is greater than 1. -The signed comparisons are also used for floating point values. Floating -point comparisons are distinguished by the machine modes of the operands. +Floating point comparisons come in two flavors, signaling and quiet, +though these are not always paired. The fixed point signed +comparisons are also used for floating point comparisons. Floating +point comparisons are distinguished by the machine modes of the +operands. @table @code @findex eq @cindex equal @item (eq:@var{m} @var{x} @var{y}) -@code{STORE_FLAG_VALUE} if the values represented by @var{x} and @var{y} -are equal, otherwise 0. +@code{STORE_FLAG_VALUE} if the values represented by @var{x} and +@var{y} are equal, otherwise 0. If the operands are floating point, +this is a quiet comparison and corresponds to the IEC 60559 +@code{compareQuietEqual} operation. @findex ne @cindex not equal @item (ne:@var{m} @var{x} @var{y}) -@code{STORE_FLAG_VALUE} if the values represented by @var{x} and @var{y} -are not equal, otherwise 0. +@code{STORE_FLAG_VALUE} if the values represented by @var{x} and +@var{y} are not equal, otherwise 0. If the operands are floating +point, this is a quiet comparison and corresponds to the IEC 60559 +@code{compareQuietNotEqual} operation. @findex gt @cindex greater than @item (gt:@var{m} @var{x} @var{y}) -@code{STORE_FLAG_VALUE} if the @var{x} is greater than @var{y}. If they -are fixed-point, the comparison is done in a signed sense. +@code{STORE_FLAG_VALUE} if the @var{x} is greater than @var{y}. If +the operands are fixed-point, the comparison is done in a signed +sense. If the operands are floating point, this is a signaling +comparison and corresponds to the IEC 60559 +@code{compareSignalingGreater} operation. @findex gtu @cindex greater than @cindex unsigned greater than @item (gtu:@var{m} @var{x} @var{y}) -Like @code{gt} but does unsigned comparison, on fixed-point numbers only. +Like @code{gt} but does unsigned comparison, on fixed-point numbers +only. This is undefined for floating point numbers. @findex lt @cindex less than @@ -2594,7 +2606,10 @@ Like @code{gt} but does unsigned compari @cindex unsigned less than @item (lt:@var{m} @var{x} @var{y}) @itemx (ltu:@var{m} @var{x} @var{y}) -Like @code{gt} and @code{gtu} but test for ``less than''. +Like @code{gt} and @code{gtu} but test for ``less than''. If the +operands are floating point, @code{gt} is a signaling comparison and +corresponds to the IEC 60559 @code{compareSignalingLess} operation. +@code{gtu} is undefined for floating point numbers. @findex ge @cindex greater than @@ -2603,6 +2618,10 @@ Like @code{gt} and @code{gtu} but test f @item (ge:@var{m} @var{x} @var{y}) @itemx (geu:@var{m} @var{x} @var{y}) Like @code{gt} and @code{gtu} but test for ``greater than or equal''. +If the operands are floating point, @code{ge} is a signaling +comparison and corresponds to the IEC 60559 +@code{compareSignalingGreater} operation. @code{geu} is undefined for +floating point numbers. @findex le @cindex less than or equal @@ -2610,7 +2629,64 @@ Like @code{gt} and @code{gtu} but test f @cindex unsigned less than @item (le:@var{m} @var{x} @var{y}) @itemx (leu:@var{m} @var{x} @var{y}) -Like @code{gt} and @code{gtu} but test for ``less than or equal''. +Like @code{gt} and @code{gtu} but test for ``less than or equal''. If +the operands are floating point, @code{le} is a signaling comparison +and corresponds t
Re: [PATCH] PR 62173, re-shuffle insns for RTL loop invariant hoisting
On 12/19/2014 06:26 AM, Richard Biener wrote: On Fri, Dec 19, 2014 at 11:28 AM, Jiong Wang wrote: 2014-12-19 3:51 GMT+00:00 Bin.Cheng : On Fri, Dec 19, 2014 at 6:09 AM, Segher Boessenkool wrote: On Thu, Dec 18, 2014 at 05:00:01PM +, Jiong Wang wrote: On 17/12/14 15:54, Richard Biener wrote: ick. I realize we don't have SSA form on RTL but doesn't DF provide at least some help in looking up definition statements for pseudos? In fact we want to restrict the transform to single-use pseudos, thus hopefully it can at least tell us that... (maybe not and this is what LOG_LINKS are for in combine...?) At least loop-invariant alreadly computes df_chain with DF_UD_CHAIN which seems exactly what is needed (apart from maybe getting at single-use info). thanks very much for these inspiring questions. yes, we want to restrict the transformation on single-use pseudo only, and it's better the transformation could re-use existed info and helper function to avoid increase compile time. but I haven't found anything I can reuse at the stage the transformation happen. the info similar as LOG_LINKS is what I want, but maybe simpler. I'd study the code about build LOG_LINKS, and try to see if we can do some factor out. LOG_LINKs in combine are just historical. combine should be converted to use DF fully. As noted loop-invariant already computes DF use-def chains. The question is whether it does compute single-use info when doing that (should be trivially possible). Kenny? In the US, (where i live), there is something called the "statute of limitations" which says that after 7 years you cannot be found accountable for anything except the most heinous of crimes. It has been more than 7 years since i did this. however, since i am a nice person loop-invariant solves the DF_UD_CHAIN which builds a data structure that connects each use with all of the defs that reach it. I believe that this is the opposite of what you want here. if you really need this, you need to also turn on the DF_DU_CHAIN which builds the opposite structure.Both structures can be space hogs, but they are both turned on in other places in the compiler so it is unlikely to be an issue. LOG_LINKs have nothing to do with single use; they point from the _first_ use to its corresponding def. You might want to look at what fwprop does instead. Pass rtl fwprop uses df information in single-definition way, it doesn't really take into consideration if register is a single use. This often corrupts other optimizations like post-increment and load/store pair. For example: add r2, r1, r0 ldr rx, [r2] add r2, r2, #4 is transformed into below form: add r2, r1, r0 ldr rx, [r1, r0] add r2, r2, #4 As a result, post-increment opportunity is corrupted, also definition of r2 can't be deleted because it's not single use. Thanks, bin thanks for all these suggestion. Have look at the LOG_LINK build function, a simple reverse scan, while needs to allocate big map array for all pseudo regs. Haven't looked at similar code in fwprop, actually, when found the first matched insn pattern, I just want to scan several insns next, then abort quickly if nothing meet requirement. there is no need to build full single-use information. still can anyone confirm that it is safe to re-use REG_DEAD info there without calling df_note_add_problem and df_analysis first? or I am using those info passed down from the previous pass which calculated these info and maybe broken? It's not safe to use REG_DEAD info without re-computing it. not sure that reg_dead is the right answer even if you did re-compute it. I believe you can have two parallel uses (on both sides of an if-then-else) for a single def (above the if then else) and have two REG_DEAD notes. Richard.
[PATCH]: fix pr61111 Fixed width of mask.
2014-05-09 Kenneth Zadeck PR middle-end/6 * fold-const.c (fold_binary_loc): Changed width of mask. committed as revision 210274. kenny Index: gcc/fold-const.c === --- gcc/fold-const.c (revision 210253) +++ gcc/fold-const.c (working copy) @@ -11358,7 +11358,7 @@ fold_binary_loc (location_t loc, wide_int c3 = c1.and_not (c2); for (w = BITS_PER_UNIT; w <= width; w <<= 1) { - wide_int mask = wi::mask (width - w, false, + wide_int mask = wi::mask (w, false, TYPE_PRECISION (type)); if (((c1 | c2) & mask) == mask && c1.and_not (mask) == 0) {
Re: [wide-int] Add fast path for hosts with HWI widening multiplication
everyone who has a private port will hate you forever. note that i have 2 of them. On 05/08/2014 02:31 PM, Richard Sandiford wrote: "Joseph S. Myers" writes: On Thu, 8 May 2014, Ramana Radhakrishnan wrote: Ramana Radhakrishnan * wide-int.cc (UTItype): Define. (UDWtype): Define for appropriate W_TYPE_SIZE. This breaks builds for 32-bit hosts, where TImode isn't supported. You can only use TImode on the host if it's 64-bit. wide-int.cc:37:56: error: unable to emulate 'TI' The longlong.h interface seems to be designed to be as difficult to use as possible :-( So maybe we really do need to limit it to hosts that are known to work and benefit from it. How about the following? I tested that it produces identical wide-int.o .text for x86_64. I think additions to or removals from the list should be treated as pre-approved. Thanks, Richard gcc/ * wide-int.cc: Only include longlong.h for certain targets. Index: gcc/wide-int.cc === --- gcc/wide-int.cc 2014-05-08 19:13:15.782158808 +0100 +++ gcc/wide-int.cc 2014-05-08 19:28:52.880742385 +0100 @@ -27,19 +27,20 @@ along with GCC; see the file COPYING3. #include "tree.h" #include "dumpfile.h" -#if GCC_VERSION >= 3000 +#if (GCC_VERSION >= 3000 \ + && (defined __aarch64 \ + || defined __alpha \ + || defined __ia64 \ + || defined __powerpc64__ \ + || defined __sparcv9 \ + || defined __x86_64__)) #define W_TYPE_SIZE HOST_BITS_PER_WIDE_INT -typedef unsigned HOST_HALF_WIDE_INT UHWtype; -typedef unsigned HOST_WIDE_INT UWtype; typedef unsigned int UQItype __attribute__ ((mode (QI))); typedef unsigned int USItype __attribute__ ((mode (SI))); typedef unsigned int UDItype __attribute__ ((mode (DI))); -typedef unsigned int UTItype __attribute__ ((mode (TI))); -#if W_TYPE_SIZE == 32 -# define UDWtype UDItype -#elif W_TYPE_SIZE == 64 -# define UDWtype UTItype -#endif +typedef unsigned HOST_HALF_WIDE_INT UHWtype; +typedef unsigned HOST_WIDE_INT UWtype; +typedef unsigned int UDWtype __attribute__ ((mode (TI))); #include "longlong.h" #endif
fallout on x86-64 from the wide-int merge.
here is a comparison. The two areas were built using configure with no options at all on x86-64. The comparison is between revision 210112 and 210113.Tsan is very unhappy but everything else looks ok.I know that this worked a couple of days before the merge. I know that there was some tsan fiddling before the merge. Kenny # Comparing directories ## Dir1=gbBaseline/: 11 sum files ## Dir2=gbNew: 11 sum files # Comparing 11 common sum files ## /bin/sh gccBaseline/contrib/compare_tests /tmp/gxx-sum1.26585 /tmp/gxx-sum2.26585 Tests that now fail, but worked before: c-c++-common/tsan/thread_leak.c -O0 execution test c-c++-common/tsan/thread_leak.c -O0 execution test c-c++-common/tsan/thread_leak.c -O2 execution test c-c++-common/tsan/thread_leak.c -O2 execution test g++.dg/tsan/aligned_vs_unaligned_race.C -O0 execution test g++.dg/tsan/aligned_vs_unaligned_race.C -O2 execution test g++.dg/tsan/benign_race.C -O0 execution test g++.dg/tsan/benign_race.C -O2 execution test g++.dg/tsan/default_options.C -O0 execution test g++.dg/tsan/default_options.C -O2 execution test g++.dg/tsan/fd_close_norace.C -O0 execution test g++.dg/tsan/fd_close_norace.C -O2 execution test g++.dg/tsan/fd_close_norace2.C -O0 execution test g++.dg/tsan/fd_close_norace2.C -O2 execution test New tests that FAIL: c-c++-common/tsan/atomic_stack.c -O0 output pattern test, is FATAL: ThreadSanitizer CHECK failed: ../../../../gccNew/libsanitizer/tsan/tsan_rtl.cc:587 "((IsShadowMem((uptr)(p + size * kShadowCnt / kShadowCell - 1 != (0)" (0x0, 0x0) c-c++-common/tsan/atomic_stack.c -O0 output pattern test, is FATAL: ThreadSanitizer CHECK failed: ../../../../gccNew/libsanitizer/tsan/tsan_rtl.cc:587 "((IsShadowMem((uptr)(p + size * kShadowCnt / kShadowCell - 1 != (0)" (0x0, 0x0) c-c++-common/tsan/atomic_stack.c -O2 output pattern test, is FATAL: ThreadSanitizer CHECK failed: ../../../../gccNew/libsanitizer/tsan/tsan_rtl.cc:587 "((IsShadowMem((uptr)(p + size * kShadowCnt / kShadowCell - 1 != (0)" (0x0, 0x0) c-c++-common/tsan/atomic_stack.c -O2 output pattern test, is FATAL: ThreadSanitizer CHECK failed: ../../../../gccNew/libsanitizer/tsan/tsan_rtl.cc:587 "((IsShadowMem((uptr)(p + size * kShadowCnt / kShadowCell - 1 != (0)" (0x0, 0x0) c-c++-common/tsan/fd_pipe_race.c -O0 output pattern test, is FATAL: ThreadSanitizer CHECK failed: ../../../../gccNew/libsanitizer/tsan/tsan_rtl.cc:587 "((IsShadowMem((uptr)(p + size * kShadowCnt / kShadowCell - 1 != (0)" (0x0, 0x0) c-c++-common/tsan/fd_pipe_race.c -O0 output pattern test, is FATAL: ThreadSanitizer CHECK failed: ../../../../gccNew/libsanitizer/tsan/tsan_rtl.cc:587 "((IsShadowMem((uptr)(p + size * kShadowCnt / kShadowCell - 1 != (0)" (0x0, 0x0) c-c++-common/tsan/fd_pipe_race.c -O2 output pattern test, is FATAL: ThreadSanitizer CHECK failed: ../../../../gccNew/libsanitizer/tsan/tsan_rtl.cc:587 "((IsShadowMem((uptr)(p + size * kShadowCnt / kShadowCell - 1 != (0)" (0x0, 0x0) c-c++-common/tsan/fd_pipe_race.c -O2 output pattern test, is FATAL: ThreadSanitizer CHECK failed: ../../../../gccNew/libsanitizer/tsan/tsan_rtl.cc:587 "((IsShadowMem((uptr)(p + size * kShadowCnt / kShadowCell - 1 != (0)" (0x0, 0x0) c-c++-common/tsan/free_race.c -O0 output pattern test, is FATAL: ThreadSanitizer CHECK failed: ../../../../gccNew/libsanitizer/tsan/tsan_rtl.cc:587 "((IsShadowMem((uptr)(p + size * kShadowCnt / kShadowCell - 1 != (0)" (0x0, 0x0) c-c++-common/tsan/free_race.c -O0 output pattern test, is FATAL: ThreadSanitizer CHECK failed: ../../../../gccNew/libsanitizer/tsan/tsan_rtl.cc:587 "((IsShadowMem((uptr)(p + size * kShadowCnt / kShadowCell - 1 != (0)" (0x0, 0x0) c-c++-common/tsan/free_race.c -O2 output pattern test, is FATAL: ThreadSanitizer CHECK failed: ../../../../gccNew/libsanitizer/tsan/tsan_rtl.cc:587 "((IsShadowMem((uptr)(p + size * kShadowCnt / kShadowCell - 1 != (0)" (0x0, 0x0) c-c++-common/tsan/free_race.c -O2 output pattern test, is FATAL: ThreadSanitizer CHECK failed: ../../../../gccNew/libsanitizer/tsan/tsan_rtl.cc:587 "((IsShadowMem((uptr)(p + size * kShadowCnt / kShadowCell - 1 != (0)" (0x0, 0x0) c-c++-common/tsan/free_race2.c -O0 output pattern test, is FATAL: ThreadSanitizer CHECK failed: ../../../../gccNew/libsanitizer/tsan/tsan_rtl.cc:587 "((IsShadowMem((uptr)(p + size * kShadowCnt / kShadowCell - 1 != (0)" (0x0, 0x0) c-c++-common/tsan/free_race2.c -O0 output pattern test, is FATAL: ThreadSanitizer CHECK failed: ../../../../gccNew/libsanitizer/tsan/tsan_rtl.cc:587 "((IsShadowMem((uptr)(p + size * kShadowCnt / kShadowCell - 1 != (0)" (0x0, 0x0) c-c++-common/tsan/free_race2.c -O2 output pattern test, is FATAL: ThreadSanitizer CHECK failed: ../../../../gccNew/libsanitizer/tsan/tsan_rtl.cc:587 "((IsShadowMem((uptr)(p + size * kShadowCnt / kShadowCell - 1 != (0)" (0x0, 0x0) c-c++-common/tsan/free_race2.c -O2 output pattern
we are starting the wide int merge
please hold off on committing patches for the next couple of hours as we have a very large merge to do. thanks. kenny
Re: [wide-int] Handle zero-precision INTEGER_CSTs again
Then with a fixed comment, this patch is fine. kenny On 05/03/2014 02:59 PM, Richard Sandiford wrote: Kenneth Zadeck writes: The doc at wide-int.h:136 needs work.The doc currently says that the precision and length are always greater than 0. This is now not correct. It also says that the bits above the precision are defined to be the sign extension if the precision does not cover that block. I agree that the comment needs work. Reordering slightly: I do not know if i believe this completely. This is now covered by the compile-time is_sign_extended property. Trees aren't extended in that way because we want to be able to access them quickly in wider precisions, with the extension following the signedness of the underlying type. So an 8-bit all-ones INTEGER_CST will be stored as -1 if the type is signed and as 0xff otherwise. RTL constants aren't sign-extended because of the case you note: I worry about this because we have moved back and forth on this issue many times and i still see code at rtl.h:1440 which assumes that BImode constants are stored differently on some platforms. It is possible that i am reading that code incorrectly but at first reading it looks questionable. so code comparing a bimode 1 with a 1 of a different precision would not work. I hope to get rid of the special BImode case after the merge and set rtl's is_sign_extended back to true. wide_int itself is always sign-extended. The situation never occurs for offset_int and widest_int. I noticed that in the eq_p_large code, you removed a block of code left over from the days when this was not true, but there is still so of this code remaining that does not assume this. The *_large functions have to be conservative and assume that the inputs are not sign-extended. My eq_p_large change doesn't change that, it just represents it differently. The idea is that rather than extending each HWI individually from small_prec and then comparing them, we can shift the XOR of the two values left by the number of excess bits, so that all undefined bits disappear from the value. The shifted-in bits are all zeros so the shifted XOR value will be zero iff the meaningful bits are the same and nonzero iff the meaningful bits are different, regardless of whether the input is sign-extended or not. The same technique is already used in the inline version. Thanks, Richard
Re: [wide-int] Handle zero-precision INTEGER_CSTs again
The doc at wide-int.h:136 needs work.The doc currently says that the precision and length are always greater than 0. This is now not correct. It also says that the bits above the precision are defined to be the sign extension if the precision does not cover that block. I do not know if i believe this completely. I noticed that in the eq_p_large code, you removed a block of code left over from the days when this was not true, but there is still so of this code remaining that does not assume this. I worry about this because we have moved back and forth on this issue many times and i still see code at rtl.h:1440 which assumes that BImode constants are stored differently on some platforms. It is possible that i am reading that code incorrectly but at first reading it looks questionable. so code comparing a bimode 1 with a 1 of a different precision would not work. aside from that the patch is fine. kenny On 05/02/2014 03:19 PM, Richard Sandiford wrote: I'd hoped the days of zero-precision INTEGER_CSTs were behind us after Richard's patch to remove min amd max values from zero-width bitfields, but a boostrap-ubsan showed otherwise. One source is in: null_pointer_node = build_int_cst (build_pointer_type (void_type_node), 0); if no_target, since the precision of the type comes from ptr_mode, which remains the default VOIDmode. Maybe that's a bug, but setting it to an arbitrary nonzero value would also be dangerous. The other use is in the C/C++ void_zero_node, which is specifically defined as a VOID_TYPE, zero-precision INTEGER_CST. This is used by the ubsan code in some ?: tests, as well as by other places in the frontend proper. At least the ubsan use persists until gimple. This patch therefore restores the wide-int handling for zero precision, for which the length must be 1 and the single HWI must be zero. I've tried to wrap up most of the dependencies in two new functions, wi::blocks_needed (a function version of the .cc BLOCKS_NEEDED, now also used in the header) and wi::excess_bits, so that it'll be easier to remove the handling again if zero precision ever goes away. There are some remaining, easily-greppable cases that check directly for a precision of 0 though. The combination of this and the other patches allows boostrap-ubsan to complete. There are a lot of extra testsuite failures compared to a normal bootstrap, but they don't look related to wide-int. Tested on x86_64-linux-gnu and powerpc64-linux-gnu. OK to install? Thanks, Richard Index: gcc/wide-int.cc === --- gcc/wide-int.cc 2014-05-02 16:28:07.657847935 +0100 +++ gcc/wide-int.cc 2014-05-02 16:28:09.560842845 +0100 @@ -48,8 +48,6 @@ static const HOST_WIDE_INT zeros[WIDE_IN #define HALF_INT_MASK (((HOST_WIDE_INT) 1 << HOST_BITS_PER_HALF_WIDE_INT) - 1) #define BLOCK_OF(TARGET) ((TARGET) / HOST_BITS_PER_WIDE_INT) -#define BLOCKS_NEEDED(PREC) \ - (PREC ? (((PREC) + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT) : 1) #define SIGN_MASK(X) ((HOST_WIDE_INT) (X) < 0 ? -1 : 0) /* Return the value a VAL[I] if I < LEN, otherwise, return 0 or -1 @@ -69,20 +67,20 @@ safe_uhwi (const HOST_WIDE_INT *val, uns static unsigned int canonize (HOST_WIDE_INT *val, unsigned int len, unsigned int precision) { - unsigned int blocks_needed = BLOCKS_NEEDED (precision); + unsigned int blocks_needed = wi::blocks_needed (precision); HOST_WIDE_INT top; int i; if (len > blocks_needed) len = blocks_needed; + if (wi::excess_bits (len, precision) > 0) +val[len - 1] = sext_hwi (val[len - 1], precision % HOST_BITS_PER_WIDE_INT); if (len == 1) return len; top = val[len - 1]; - if (len * HOST_BITS_PER_WIDE_INT > precision) -val[len - 1] = top = sext_hwi (top, precision % HOST_BITS_PER_WIDE_INT); - if (top != 0 && top != (HOST_WIDE_INT)-1) + if (top != 0 && top != (HOST_WIDE_INT) -1) return len; /* At this point we know that the top is either 0 or -1. Find the @@ -134,7 +132,7 @@ wi::from_buffer (const unsigned char *bu /* We have to clear all the bits ourself, as we merely or in values below. */ - unsigned int len = BLOCKS_NEEDED (precision); + unsigned int len = wi::blocks_needed (precision); HOST_WIDE_INT *val = result.write_val (); for (unsigned int i = 0; i < len; ++i) val[i] = 0; @@ -180,7 +178,7 @@ wi::to_mpz (const wide_int_ref &x, mpz_t { int len = x.get_len (); const HOST_WIDE_INT *v = x.get_val (); - int excess = len * HOST_BITS_PER_WIDE_INT - x.get_precision (); + unsigned int excess = wi::excess_bits (len, x.get_precision ()); if (wi::neg_p (x, sgn)) { @@ -306,7 +304,8 @@ wi::force_to_size (HOST_WIDE_INT *val, c unsigned int xlen, unsigned int xprecision, unsigned int precision, signop sgn) { - unsigned int blocks_needed = BLOCKS_NEEDED (precision); + uns
Re: [wide-int] Fix some division cases
this is fine. On 05/02/2014 03:22 PM, Richard Sandiford wrote: divmod_internal didn't handle unsigned division in which the inputs have implicit all-one upper bits. There were two problems: - wi_unpack should extend implicit 1s to index blocks_needed - 1 (possibly with a zext_hwi on the last block for small_prec) regardless of signedness - when calculating m and n, divmod_internal only used the *_blocks_needed value if the division was signed. Neither is value is really signed in this context, since we've already calculated absolute values if sgnop == SIGNED, so the neg_p (and its pre-wi:: incarnation) would only have held for the minimum integer. Using a simple while loop to find the top half-HWI seems more direct. Also, divmod_internal_2 took n and m as unsigned values but used HOST_WIDE_INT iterators on m - n. m can be less than n for things like 0 / big, and the negative m - n was being treated as an unsigned int and zero-extended to HOST_WIDE_INT. The patch fixes both the types of n and m and the types of the iterators. Tested on x86_64-linux-gnu and powerpc64-linux-gnu. OK to install? Thanks, Richard Index: gcc/wide-int.cc === --- gcc/wide-int.cc 2014-05-02 16:28:09.560842845 +0100 +++ gcc/wide-int.cc 2014-05-02 16:40:40.002916374 +0100 @@ -1126,8 +1126,7 @@ wi::add_large (HOST_WIDE_INT *val, const HOST_HALF_WIDE_INTs of RESULT. The rest of RESULT is filled by uncompressing the top bit of INPUT[IN_LEN - 1]. */ static void -wi_unpack (unsigned HOST_HALF_WIDE_INT *result, - const unsigned HOST_WIDE_INT *input, +wi_unpack (unsigned HOST_HALF_WIDE_INT *result, const HOST_WIDE_INT *input, unsigned int in_len, unsigned int out_len, unsigned int prec, signop sgn) { @@ -1145,20 +1144,24 @@ wi_unpack (unsigned HOST_HALF_WIDE_INT * else mask = 0; - for (i = 0; i < in_len; i++) + for (i = 0; i < blocks_needed - 1; i++) { - HOST_WIDE_INT x = input[i]; - if (i == blocks_needed - 1 && small_prec) - { - if (sgn == SIGNED) - x = sext_hwi (x, small_prec); - else - x = zext_hwi (x, small_prec); - } + HOST_WIDE_INT x = safe_uhwi (input, in_len, i); result[j++] = x; result[j++] = x >> HOST_BITS_PER_HALF_WIDE_INT; } + HOST_WIDE_INT x = safe_uhwi (input, in_len, i); + if (small_prec) +{ + if (sgn == SIGNED) + x = sext_hwi (x, small_prec); + else + x = zext_hwi (x, small_prec); +} + result[j++] = x; + result[j++] = x >> HOST_BITS_PER_HALF_WIDE_INT; + /* Smear the sign bit. */ while (j < out_len) result[j++] = mask; @@ -1317,10 +1320,8 @@ wi::mul_internal (HOST_WIDE_INT *val, co } /* We do unsigned mul and then correct it. */ - wi_unpack (u, (const unsigned HOST_WIDE_INT *) op1val, op1len, -half_blocks_needed, prec, SIGNED); - wi_unpack (v, (const unsigned HOST_WIDE_INT *) op2val, op2len, -half_blocks_needed, prec, SIGNED); + wi_unpack (u, op1val, op1len, half_blocks_needed, prec, SIGNED); + wi_unpack (v, op2val, op2len, half_blocks_needed, prec, SIGNED); /* The 2 is for a full mult. */ memset (r, 0, half_blocks_needed * 2 @@ -1519,7 +1520,7 @@ divmod_internal_2 (unsigned HOST_HALF_WI unsigned HOST_HALF_WIDE_INT *b_remainder, unsigned HOST_HALF_WIDE_INT *b_dividend, unsigned HOST_HALF_WIDE_INT *b_divisor, - unsigned int m, unsigned int n) + int m, int n) { /* The "digits" are a HOST_HALF_WIDE_INT which the size of half of a HOST_WIDE_INT and stored in the lower bits of each word. This @@ -1530,7 +1531,8 @@ divmod_internal_2 (unsigned HOST_HALF_WI unsigned HOST_WIDE_INT qhat; /* Estimate of quotient digit. */ unsigned HOST_WIDE_INT rhat; /* A remainder. */ unsigned HOST_WIDE_INT p; /* Product of two digits. */ - HOST_WIDE_INT s, i, j, t, k; + HOST_WIDE_INT t, k; + int i, j, s; /* Single digit divisor. */ if (n == 1) @@ -1757,26 +1759,17 @@ wi::divmod_internal (HOST_WIDE_INT *quot } } - wi_unpack (b_dividend, (const unsigned HOST_WIDE_INT *) dividend.get_val (), -dividend.get_len (), dividend_blocks_needed, dividend_prec, sgn); - wi_unpack (b_divisor, (const unsigned HOST_WIDE_INT *) divisor.get_val (), -divisor.get_len (), divisor_blocks_needed, divisor_prec, sgn); - - if (wi::neg_p (dividend, sgn)) -m = dividend_blocks_needed; - else -m = 2 * dividend.get_len (); + wi_unpack (b_dividend, dividend.get_val (), dividend.get_len (), +dividend_blocks_needed, dividend_prec, sgn); + wi_unpack (b_divisor, divisor.get_val (), divisor.get_len (), +divisor_blocks_needed, divisor_prec, sgn); + + m = dividend_blocks_needed; + while (m > 1 && b_divide
Re: [wide-int] Add more assertions
These are fine. On 05/02/2014 03:20 PM, Richard Sandiford wrote: This patch adds some assertions against sext (.., 0) and zext (..., 0). The former is undefined at the sext_hwi level and the latter is disallowed for consistency with the former. Also, set_bit (rightly IMO) can't handle bit >= precision. For precision <= HOST_BITS_PER_WIDE_INT it would invoke undefined behaviour while for other precisions I think it would crash. A case with precision <= HOST_BITS_PER_WIDE_INT showed up in java (fix posted separately). Tested on x86_64-linux-gnu and powerpc64-linux-gnu. OK to install? Thanks, Richard Index: gcc/wide-int.h === --- gcc/wide-int.h 2014-05-02 16:28:09.561842842 +0100 +++ gcc/wide-int.h 2014-05-02 16:44:04.015463718 +0100 @@ -2046,6 +2046,8 @@ wi::sext (const T &x, unsigned int offse unsigned int precision = get_precision (result); WIDE_INT_REF_FOR (T) xi (x, precision); + gcc_checking_assert (offset != 0); + if (offset <= HOST_BITS_PER_WIDE_INT) { val[0] = sext_hwi (xi.ulow (), offset); @@ -2065,6 +2067,8 @@ wi::zext (const T &x, unsigned int offse unsigned int precision = get_precision (result); WIDE_INT_REF_FOR (T) xi (x, precision); + gcc_checking_assert (offset != 0); + /* This is not just an optimization, it is actually required to maintain canonization. */ if (offset >= precision) @@ -2102,6 +2106,9 @@ wi::set_bit (const T &x, unsigned int bi WI_UNARY_RESULT_VAR (result, val, T, x); unsigned int precision = get_precision (result); WIDE_INT_REF_FOR (T) xi (x, precision); + + gcc_checking_assert (bit < precision); + if (precision <= HOST_BITS_PER_WIDE_INT) { val[0] = xi.ulow () | ((unsigned HOST_WIDE_INT) 1 << bit);
Re: [wide-int] Stricter type checking in wide_int constructor
On 04/28/2014 12:25 PM, Mike Stump wrote: On Apr 28, 2014, at 2:36 AM, Richard Sandiford wrote: Ping. FWIW this is the last patch I have lined up before the merge. I repeated the asm comparison test I did a few months ago on one target per config/ architecture and there were no unexpected changes. Nice, thanks. by "nice thanks", he meant "accepted".
Re: [PATCH][RFC][wide-int] Fix some build errors on arm in wide-int branch and report ICE
ok to commit. kenny On 04/28/2014 11:42 AM, Richard Sandiford wrote: Kyrill Tkachov writes: With that patch bootstrap now still fails at dwarf2out.c with the same message. I'm attaching a gzipped dwarf2out.ii Thanks. This is a nice proof of why clz_zero and ctz_zero were as bogus as claimed. It meant that the behaviour of floor_log2 depended on the target and would return the wrong value if clz (0) was anything other than the precision. This patch makes the wide-int functions behave like the double_int ones and pushes the target dependency back to the callers that care, which is where it belongs. The "new" *_DEFINED_VALUE_AT_ZERO checks are really reinstating what's already on trunk. There are other tree uses of ctz that I think relied on the double_int behaviour. Tests still ongoing, but could you check what the arm results are like with this? Thanks, Richard Index: gcc/builtins.c === --- gcc/builtins.c 2014-04-28 16:30:59.939239843 +0100 +++ gcc/builtins.c 2014-04-28 16:31:00.252238996 +0100 @@ -8080,6 +8080,7 @@ fold_builtin_bitop (tree fndecl, tree ar /* Optimize for constant argument. */ if (TREE_CODE (arg) == INTEGER_CST && !TREE_OVERFLOW (arg)) { + tree type = TREE_TYPE (arg); int result; switch (DECL_FUNCTION_CODE (fndecl)) @@ -8089,11 +8090,17 @@ fold_builtin_bitop (tree fndecl, tree ar break; CASE_INT_FN (BUILT_IN_CLZ): - result = wi::clz (arg); + if (wi::ne_p (arg, 0)) + result = wi::clz (arg); + else if (! CLZ_DEFINED_VALUE_AT_ZERO (TYPE_MODE (type), result)) + result = TYPE_PRECISION (type); break; CASE_INT_FN (BUILT_IN_CTZ): - result = wi::ctz (arg); + if (wi::ne_p (arg, 0)) + result = wi::ctz (arg); + else if (! CTZ_DEFINED_VALUE_AT_ZERO (TYPE_MODE (type), result)) + result = TYPE_PRECISION (type); break; CASE_INT_FN (BUILT_IN_CLRSB): Index: gcc/simplify-rtx.c === --- gcc/simplify-rtx.c 2014-04-28 16:30:59.941239838 +0100 +++ gcc/simplify-rtx.c 2014-04-28 16:31:00.254238990 +0100 @@ -1656,6 +1656,7 @@ simplify_const_unary_operation (enum rtx wide_int result; enum machine_mode imode = op_mode == VOIDmode ? mode : op_mode; rtx_mode_t op0 = std::make_pair (op, imode); + int int_value; #if TARGET_SUPPORTS_WIDE_INT == 0 /* This assert keeps the simplification from producing a result @@ -1686,7 +1687,11 @@ simplify_const_unary_operation (enum rtx break; case CLZ: - result = wi::shwi (wi::clz (op0), mode); + if (wi::ne_p (op0, 0)) + int_value = wi::clz (op0); + else if (! CLZ_DEFINED_VALUE_AT_ZERO (mode, int_value)) + int_value = GET_MODE_PRECISION (mode); + result = wi::shwi (int_value, mode); break; case CLRSB: @@ -1694,7 +1699,11 @@ simplify_const_unary_operation (enum rtx break; case CTZ: - result = wi::shwi (wi::ctz (op0), mode); + if (wi::ne_p (op0, 0)) + int_value = wi::ctz (op0); + else if (! CTZ_DEFINED_VALUE_AT_ZERO (mode, int_value)) + int_value = GET_MODE_PRECISION (mode); + result = wi::shwi (int_value, mode); break; case POPCOUNT: Index: gcc/wide-int.cc === --- gcc/wide-int.cc 2014-04-28 16:30:59.941239838 +0100 +++ gcc/wide-int.cc 2014-04-28 16:31:00.254238990 +0100 @@ -1137,46 +1137,6 @@ wi::add_large (HOST_WIDE_INT *val, const return canonize (val, len, prec); } -/* This is bogus. We should always return the precision and leave the - caller to handle target dependencies. */ -static int -clz_zero (unsigned int precision) -{ - unsigned int count; - - enum machine_mode mode = mode_for_size (precision, MODE_INT, 0); - if (mode == BLKmode) -mode_for_size (precision, MODE_PARTIAL_INT, 0); - - /* Even if the value at zero is undefined, we have to come up - with some replacement. Seems good enough. */ - if (mode == BLKmode) -count = precision; - else if (!CLZ_DEFINED_VALUE_AT_ZERO (mode, count)) -count = precision; - return count; -} - -/* This is bogus. We should always return the precision and leave the - caller to handle target dependencies. */ -static int -ctz_zero (unsigned int precision) -{ - unsigned int count; - - enum machine_mode mode = mode_for_size (precision, MODE_INT, 0); - if (mode == BLKmode) -mode_for_size (precision, MODE_PARTIAL_INT, 0); - - /* Even if the value at zero is undefined, we have to come up - with some replacement. Seems good enough. */ - if (mode == BLKmode) -count = precision; - else if (!CTZ_DEFINED_VALUE_AT_ZERO (mode, count)) -count = precision; - return count; -} - /* Subrout
Re: [wide-int 4/5] Fix canonize handling of "small_prec" case
this is fine. kenny On 04/25/2014 09:44 AM, Richard Sandiford wrote: We should write back the sign-extended value. Tested on x86_64-linux-gnu. OK to install? Thanks, Richard Index: gcc/wide-int.cc === --- gcc/wide-int.cc 2014-04-25 09:15:14.297359380 +0100 +++ gcc/wide-int.cc 2014-04-25 09:15:50.755637670 +0100 @@ -81,7 +81,7 @@ canonize (HOST_WIDE_INT *val, unsigned i top = val[len - 1]; if (len * HOST_BITS_PER_WIDE_INT > precision) -top = sext_hwi (top, precision % HOST_BITS_PER_WIDE_INT); +val[len - 1] = top = sext_hwi (top, precision % HOST_BITS_PER_WIDE_INT); if (top != 0 && top != (HOST_WIDE_INT)-1) return len;
Re: [wide-int 5/5] Add dump () method for gdb debugging
i am sorry, i missed the fact that the loop counts up but you were reversing the order in the indexes. kenny On 04/26/2014 04:26 AM, Richard Sandiford wrote: Kenneth Zadeck writes: don't you think that it would be easier to understand the number if you printed it largest index first, as in the routines in wide-int-print.cc? Yeah, that's what the patch does. E.g. (for 32-bit HWI): [...,0x3,0x8000] is 7 << 31. Thanks, Richard kenny On 04/25/2014 09:58 AM, Richard Sandiford wrote: This patch adds a dump () method so that it's easier to read the contents of the various wide-int types in gdb. I've deliberately not done any extension for "small_prec" cases because I think here we want to see the raw values as much as possible. Tested on x86_64-linux-gnu. OK to install? Thanks, Richard Index: gcc/wide-int.cc === --- gcc/wide-int.cc 2014-04-25 14:48:03.263213403 +0100 +++ gcc/wide-int.cc 2014-04-25 14:48:25.186333842 +0100 @@ -2130,3 +2130,9 @@ wi::only_sign_bit_p (const wide_int_ref void gt_ggc_mx (widest_int *) { } void gt_pch_nx (widest_int *, void (*) (void *, void *), void *) { } void gt_pch_nx (widest_int *) { } + +template void wide_int::dump () const; +template void generic_wide_int >::dump () const; +template void generic_wide_int >::dump () const; +template void offset_int::dump () const; +template void widest_int::dump () const; Index: gcc/wide-int.h === --- gcc/wide-int.h 2014-04-25 14:34:54.823652619 +0100 +++ gcc/wide-int.h 2014-04-25 14:48:06.218229309 +0100 @@ -709,6 +709,9 @@ #define INCDEC_OPERATOR(OP, DELTA) \ #undef ASSIGNMENT_OPERATOR #undef INCDEC_OPERATOR + /* Debugging functions. */ + void dump () const; + static const bool is_sign_extended = wi::int_traits >::is_sign_extended; }; @@ -859,6 +862,23 @@ generic_wide_int ::operator = ( return *this; } +/* Dump the contents of the integer to stderr, for debugging. */ +template +void +generic_wide_int ::dump () const +{ + unsigned int len = this->get_len (); + const HOST_WIDE_INT *val = this->get_val (); + unsigned int precision = this->get_precision (); + fprintf (stderr, "["); + if (len * HOST_BITS_PER_WIDE_INT < precision) +fprintf (stderr, "...,"); + for (unsigned int i = 0; i < len - 1; ++i) +fprintf (stderr, HOST_WIDE_INT_PRINT_HEX ",", val[len - 1 - i]); + fprintf (stderr, HOST_WIDE_INT_PRINT_HEX "], precision = %d\n", + val[0], precision); +} + namespace wi { template <>
Re: [wide-int 3/5] Fix large widths in shifted_mask
approved On 04/25/2014 09:40 AM, Richard Sandiford wrote: Very minor, but since shifted_mask copes with out-of-range widths, I think mask should too. Tested on x86_64-linux-gnu. OK to install? Thanks, Richard Index: gcc/wide-int.cc === --- gcc/wide-int.cc 2014-04-25 09:26:57.025944460 +0100 +++ gcc/wide-int.cc 2014-04-25 09:37:16.873811137 +0100 @@ -716,7 +716,7 @@ wi::mask (HOST_WIDE_INT *val, unsigned i gcc_assert (width < 4 * MAX_BITSIZE_MODE_ANY_INT); gcc_assert (prec <= 4 * MAX_BITSIZE_MODE_ANY_INT); - if (width == prec) + if (width >= prec) { val[0] = negate ? 0 : -1; return 1;
Re: [wide-int 2/5] Fix large widths in shifted_mask
approved. On 04/25/2014 09:39 AM, Richard Sandiford wrote: shifted_mask would mishandle cases where the start bit is in the middle of a HWI and the end bit is in a different HWI. The "000111000" case needs to check that the start and end are in the same block. In the changed lines, "shift" is the position of the lowest mask bit in its containing HWI. The end is in the same HWI if shift + width < HOST_BITS_PER_WIDE_INT. Tested on x86_64-linux-gnu. OK to install? Thanks, Richard Index: gcc/wide-int.cc === --- gcc/wide-int.cc 2014-04-25 09:50:37.850073188 +0100 +++ gcc/wide-int.cc 2014-04-25 09:51:05.992292099 +0100 @@ -768,8 +768,8 @@ wi::shifted_mask (HOST_WIDE_INT *val, un if (shift) { HOST_WIDE_INT block = ((unsigned HOST_WIDE_INT) 1 << shift) - 1; - shift = end & (HOST_BITS_PER_WIDE_INT - 1); - if (shift) + shift += width; + if (shift < HOST_BITS_PER_WIDE_INT) { /* case 000111000 */ block = ((unsigned HOST_WIDE_INT) 1 << shift) - block - 1;
Re: [wide-int 5/5] Add dump () method for gdb debugging
don't you think that it would be easier to understand the number if you printed it largest index first, as in the routines in wide-int-print.cc? kenny On 04/25/2014 09:58 AM, Richard Sandiford wrote: This patch adds a dump () method so that it's easier to read the contents of the various wide-int types in gdb. I've deliberately not done any extension for "small_prec" cases because I think here we want to see the raw values as much as possible. Tested on x86_64-linux-gnu. OK to install? Thanks, Richard Index: gcc/wide-int.cc === --- gcc/wide-int.cc 2014-04-25 14:48:03.263213403 +0100 +++ gcc/wide-int.cc 2014-04-25 14:48:25.186333842 +0100 @@ -2130,3 +2130,9 @@ wi::only_sign_bit_p (const wide_int_ref void gt_ggc_mx (widest_int *) { } void gt_pch_nx (widest_int *, void (*) (void *, void *), void *) { } void gt_pch_nx (widest_int *) { } + +template void wide_int::dump () const; +template void generic_wide_int >::dump () const; +template void generic_wide_int >::dump () const; +template void offset_int::dump () const; +template void widest_int::dump () const; Index: gcc/wide-int.h === --- gcc/wide-int.h 2014-04-25 14:34:54.823652619 +0100 +++ gcc/wide-int.h 2014-04-25 14:48:06.218229309 +0100 @@ -709,6 +709,9 @@ #define INCDEC_OPERATOR(OP, DELTA) \ #undef ASSIGNMENT_OPERATOR #undef INCDEC_OPERATOR + /* Debugging functions. */ + void dump () const; + static const bool is_sign_extended = wi::int_traits >::is_sign_extended; }; @@ -859,6 +862,23 @@ generic_wide_int ::operator = ( return *this; } +/* Dump the contents of the integer to stderr, for debugging. */ +template +void +generic_wide_int ::dump () const +{ + unsigned int len = this->get_len (); + const HOST_WIDE_INT *val = this->get_val (); + unsigned int precision = this->get_precision (); + fprintf (stderr, "["); + if (len * HOST_BITS_PER_WIDE_INT < precision) +fprintf (stderr, "...,"); + for (unsigned int i = 0; i < len - 1; ++i) +fprintf (stderr, HOST_WIDE_INT_PRINT_HEX ",", val[len - 1 - i]); + fprintf (stderr, HOST_WIDE_INT_PRINT_HEX "], precision = %d\n", + val[0], precision); +} + namespace wi { template <>
Re: [wide-int 3/5] Fix large widths in shifted_mask
richard, I think that this patch is fine as is.but in looking at the surrounding code, i saw something that appears to be somewhat troubling. I am worried about the two asserts. Given that we now require that some users write code similar to the code in tree-vrp.c:2628, it seems that these asserts are only latent land mines. kenny On 04/25/2014 09:40 AM, Richard Sandiford wrote: Very minor, but since shifted_mask copes with out-of-range widths, I think mask should too. Tested on x86_64-linux-gnu. OK to install? Thanks, Richard Index: gcc/wide-int.cc === --- gcc/wide-int.cc 2014-04-25 09:26:57.025944460 +0100 +++ gcc/wide-int.cc 2014-04-25 09:37:16.873811137 +0100 @@ -716,7 +716,7 @@ wi::mask (HOST_WIDE_INT *val, unsigned i gcc_assert (width < 4 * MAX_BITSIZE_MODE_ANY_INT); gcc_assert (prec <= 4 * MAX_BITSIZE_MODE_ANY_INT); - if (width == prec) + if (width >= prec) { val[0] = negate ? 0 : -1; return 1;
Re: [wide-int 1/5] Cosmetic fixes to wide-int.{cc,h}
i see nothing in this patch that requires a review. On 04/25/2014 09:35 AM, Richard Sandiford wrote: This series of patches is from a read-through of wide-int.h and wide-int.cc. (The series from earlier in the week was from a diff of preexisting files.) This first patch fixes some comments, typos and formatting. It also removes some dead code. I'll follow up with a gdb-friendly replacement for the old dump function. Tested on x86_64-linux-gnu. OK to install? Thanks, Richard Index: gcc/wide-int.cc === --- gcc/wide-int.cc 2014-04-25 11:33:57.809153807 +0100 +++ gcc/wide-int.cc 2014-04-25 12:33:59.202021025 +0100 @@ -37,12 +37,6 @@ typedef unsigned int UDItype __attribute #include "longlong.h" #endif -/* This is the maximal size of the buffer needed for dump. */ -const unsigned int MAX_SIZE = (4 * (MAX_BITSIZE_MODE_ANY_INT / 4 - + (MAX_BITSIZE_MODE_ANY_INT - / HOST_BITS_PER_WIDE_INT) - + 32)); - static const HOST_WIDE_INT zeros[WIDE_INT_MAX_ELTS] = {}; /* @@ -51,7 +45,7 @@ static const HOST_WIDE_INT zeros[WIDE_IN /* Quantities to deal with values that hold half of a wide int. Used in multiply and divide. */ -#define HALF_INT_MASK (((HOST_WIDE_INT)1 << HOST_BITS_PER_HALF_WIDE_INT) - 1) +#define HALF_INT_MASK (((HOST_WIDE_INT) 1 << HOST_BITS_PER_HALF_WIDE_INT) - 1) #define BLOCK_OF(TARGET) ((TARGET) / HOST_BITS_PER_WIDE_INT) #define BLOCKS_NEEDED(PREC) \ @@ -129,8 +123,8 @@ wi::from_array (HOST_WIDE_INT *val, cons /* Construct a wide int from a buffer of length LEN. BUFFER will be read according to byte endianess and word endianess of the target. - Only the lower LEN bytes of the result are set; the remaining high - bytes are cleared. */ + Only the lower BUFFER_LEN bytes of the result are set; the remaining + high bytes are cleared. */ wide_int wi::from_buffer (const unsigned char *buffer, unsigned int buffer_len) { @@ -212,7 +206,7 @@ wi::to_mpz (const wide_int_ref &x, mpz_t mpz_import (result, len, -1, sizeof (HOST_WIDE_INT), 0, 0, v); } -/* Returns VAL converted to TYPE. If WRAP is true, then out-of-range +/* Returns X converted to TYPE. If WRAP is true, then out-of-range values of VAL will be wrapped; otherwise, they will be set to the appropriate minimum or maximum TYPE bound. */ wide_int @@ -243,8 +237,8 @@ wi::from_mpz (const_tree type, mpz_t x, for representing the value. The code to calculate count is extracted from the GMP manual, section "Integer Import and Export": http://gmplib.org/manual/Integer-Import-and-Export.html */ - numb = 8*sizeof(HOST_WIDE_INT); - count = (mpz_sizeinbase (x, 2) + numb-1) / numb; + numb = 8 * sizeof(HOST_WIDE_INT); + count = (mpz_sizeinbase (x, 2) + numb - 1) / numb; HOST_WIDE_INT *val = res.write_val (); mpz_export (val, &count, -1, sizeof (HOST_WIDE_INT), 0, 0, x); if (count < 1) @@ -352,9 +346,8 @@ wi::force_to_size (HOST_WIDE_INT *val, c where we do allow comparisons of values of different precisions. */ static inline HOST_WIDE_INT selt (const HOST_WIDE_INT *a, unsigned int len, - unsigned int blocks_needed, - unsigned int small_prec, - unsigned int index, signop sgn) + unsigned int blocks_needed, unsigned int small_prec, + unsigned int index, signop sgn) { HOST_WIDE_INT val; if (index < len) @@ -388,7 +381,7 @@ top_bit_of (const HOST_WIDE_INT *a, unsi /* * Comparisons, note that only equality is an operator. The other - * comparisons cannot be operators since they are inherently singed or + * comparisons cannot be operators since they are inherently signed or * unsigned and C++ has no such operators. */ @@ -401,7 +394,7 @@ wi::eq_p_large (const HOST_WIDE_INT *op0 int l0 = op0len - 1; unsigned int small_prec = prec & (HOST_BITS_PER_WIDE_INT - 1); - while (op0len != op1len) + if (op0len != op1len) return false; if (op0len == BLOCKS_NEEDED (prec) && small_prec) @@ -741,7 +734,7 @@ wi::mask (HOST_WIDE_INT *val, unsigned i unsigned int shift = width & (HOST_BITS_PER_WIDE_INT - 1); if (shift != 0) { - HOST_WIDE_INT last = (((unsigned HOST_WIDE_INT) 1) << shift) - 1; + HOST_WIDE_INT last = ((unsigned HOST_WIDE_INT) 1 << shift) - 1; val[i++] = negate ? ~last : last; } else @@ -774,12 +767,12 @@ wi::shifted_mask (HOST_WIDE_INT *val, un unsigned int shift = start & (HOST_BITS_PER_WIDE_INT - 1); if (shift) { - HOST_WIDE_INT block = (((unsigned HOST_WIDE_INT) 1) << shift) - 1; + HOST_WIDE_INT block = ((unsigned HOST_WIDE_INT) 1 << shift) - 1; shift = end & (HOST_BITS_PER_WIDE_INT - 1); if (shift) { /* case 000111000 */ - block = (((unsigned HOST_WIDE_INT) 1) << shift) - bl
Re: [wide-int] Fix signed min / -1 quotient
This is fine with me. kenny On 04/24/2014 10:34 AM, Richard Sandiford wrote: For signed min / -1 we set the overflow flag (good) but also returned a quotient of 0. It should be 0x80...0 instead. Since that's also the value of the original dividend, we can just copy the representation over. The value for division by 0 is probably pretty arbitrary. double-int.c seems to treat it as division by one: if (hden == 0 && lden == 0) overflow = 1, lden = 1; and while not important, it has the nice feature that here too we could just copy the original dividend, rather than treat it differently from signed min / -1. This fixes libjava's Divide_1, which was the last remaining regression on x86_64-linux-gnu. I have a couple of other patches queued up, but I'm still running tests to compare the asm output for the gcc and g++ testsuites on a range of targets. (FWIW, the first round of these tests picked up the same bug as Divide_1, in the output for gcc.c-torture/compile/20010404-1.c.) OK to install? Thanks, Richard Index: gcc/wide-int.cc === --- gcc/wide-int.cc 2014-04-24 09:51:08.056774754 +0100 +++ gcc/wide-int.cc 2014-04-24 11:52:35.996541888 +0100 @@ -1726,8 +1726,10 @@ wi::divmod_internal (HOST_WIDE_INT *quot && wi::only_sign_bit_p (dividend)) overflow = true; - /* If overflow is set, just get out. There will only be grief by - continuing. */ + /* Handle the overflow cases. Viewed as unsigned value, the quotient of + (signed min / -1) has the same representation as the orignal dividend. + We have traditionally made division by zero act as division by one, + so there too we use the original dividend. */ if (overflow) { if (remainder) @@ -1738,8 +1741,9 @@ wi::divmod_internal (HOST_WIDE_INT *quot if (oflow != 0) *oflow = true; if (quotient) - quotient[0] = 0; - return 1; + for (unsigned int i = 0; i < dividend_len; ++i) + quotient[i] = dividend_val[i]; + return dividend_len; } if (oflow)
Re: [PATCH][RFC][wide-int] Fix some build errors on arm in wide-int branch and report ICE
On 04/23/2014 10:36 AM, Richard Biener wrote: On Wed, Apr 23, 2014 at 4:29 PM, Kenneth Zadeck wrote: On 04/23/2014 05:47 AM, Richard Biener wrote: On Tue, Apr 22, 2014 at 6:04 PM, Mike Stump wrote: On Apr 22, 2014, at 8:33 AM, Richard Sandiford wrote: Kyrill Tkachov writes: Ping. http://gcc.gnu.org/ml/gcc-patches/2014-04/msg00769.html Any ideas? I recall chatter on IRC that we want to merge wide-int into trunk soon. Bootstrap failure on arm would prevent that... Sorry for the late reply. I hadn't forgotten, but I wanted to wait until I had chance to look into the ICE before replying, which I haven't had chance to do yet. They are separable issues, so, I checked in the change. It's a shame we can't use C++ style casts, but I suppose that's the price to pay for being able to write "unsigned HOST_WIDE_INT”. unsigned_HOST_WIDE_INT isn’t horrible, but, yeah, my fingers were expecting a typedef or better. I slightly prefer the int (1) style, but I think we should go the direction of the patch. Well, on my list of things to try for 4.10 is to kill off HOST_WIDE_* and require a 64bit integer type on the host and force all targets to use a 64bit 'hwi'. Thus, s/HOST_WIDE_INT/int64_t/ (and the appropriate related changes). Richard. I should point out that there is a community that wants to go in the opposite direction here. They are the people with real 32 bit hosts who want to go back to a world where they are allowed to make hwi a 32 bit value.They have been waiting wide-int to be committed because they see this as a way to get back to world where most of the math is done natively. I am not part of this community but they feel that if the math that has the potential to be big to be is done in wide-ints, then they can go back to using a 32 bit hwi for everything else.For them, a wide-int built on 32 hwi's would be a win. That wide-int builds on HWI is an implementation detail. It can easily be changed to build on int32_t. Btw, what important target still supports a 32bit HWI? None for what I know. Look at config.gcc and what does _not_ set need_64bit_hwint. Even plain arm needs it. I think that originally, hwi was supposed to be a natural integer on the host machine and it was corrupted to always be a 64 bit integer. Right now, wide-int is built on hwis which are always 64 bits.On a 32 bit machine, this means that there are two levels of abstraction to get to the hardware, one to get from wide-int to 64 bits and one to get from 64 bits to 32 bits. The easy part of converting wide-int to run natively on a 32 bit machine is going to be the internals of wide-int. Of course until you test it you never know, but we did try very hard not to care about the internal size of the rep. The hard part will be the large number of places where wide-int converts to or from hwi. Some of those callers expect things to really be 64 bits and some of them deal with numbers that are always small enough to be implemented in the efficient native representation. I think that the push against you is that the latter case should not be converted to int64_t. Richard. kenny
Re: [PATCH][RFC][wide-int] Fix some build errors on arm in wide-int branch and report ICE
On 04/23/2014 05:47 AM, Richard Biener wrote: On Tue, Apr 22, 2014 at 6:04 PM, Mike Stump wrote: On Apr 22, 2014, at 8:33 AM, Richard Sandiford wrote: Kyrill Tkachov writes: Ping. http://gcc.gnu.org/ml/gcc-patches/2014-04/msg00769.html Any ideas? I recall chatter on IRC that we want to merge wide-int into trunk soon. Bootstrap failure on arm would prevent that... Sorry for the late reply. I hadn't forgotten, but I wanted to wait until I had chance to look into the ICE before replying, which I haven't had chance to do yet. They are separable issues, so, I checked in the change. It's a shame we can't use C++ style casts, but I suppose that's the price to pay for being able to write "unsigned HOST_WIDE_INT”. unsigned_HOST_WIDE_INT isn’t horrible, but, yeah, my fingers were expecting a typedef or better. I slightly prefer the int (1) style, but I think we should go the direction of the patch. Well, on my list of things to try for 4.10 is to kill off HOST_WIDE_* and require a 64bit integer type on the host and force all targets to use a 64bit 'hwi'. Thus, s/HOST_WIDE_INT/int64_t/ (and the appropriate related changes). Richard. I should point out that there is a community that wants to go in the opposite direction here. They are the people with real 32 bit hosts who want to go back to a world where they are allowed to make hwi a 32 bit value.They have been waiting wide-int to be committed because they see this as a way to get back to world where most of the math is done natively. I am not part of this community but they feel that if the math that has the potential to be big to be is done in wide-ints, then they can go back to using a 32 bit hwi for everything else.For them, a wide-int built on 32 hwi's would be a win. kenny
Re: [wide-int 5/8] Use LOG2_BITS_PER_UNIT
On 04/22/2014 04:02 PM, Richard Sandiford wrote: Looks like a few uses of the old idiom: BITS_PER_UNIT == 8 ? 3 : exact_log2 (BITS_PER_UNIT) I do not think that these crept in as much as they were never squished out. have crept in. This patch replaces them with LOG2_BITS_PER_UNIT. Tested on x86_64-linux-gnu. OK to install? Thanks, Richard Index: gcc/expr.c === --- gcc/expr.c 2014-04-22 20:58:26.969683484 +0100 +++ gcc/expr.c 2014-04-22 21:00:26.377614881 +0100 @@ -6801,8 +6801,7 @@ get_inner_reference (tree exp, HOST_WIDE if (!integer_zerop (off)) { offset_int boff, coff = mem_ref_offset (exp); - boff = wi::lshift (coff, (BITS_PER_UNIT == 8 - ? 3 : exact_log2 (BITS_PER_UNIT))); + boff = wi::lshift (coff, LOG2_BITS_PER_UNIT); bit_offset += boff; } exp = TREE_OPERAND (TREE_OPERAND (exp, 0), 0); @@ -6828,8 +6827,7 @@ get_inner_reference (tree exp, HOST_WIDE { offset_int tem = wi::sext (wi::to_offset (offset), TYPE_PRECISION (sizetype)); - tem = wi::lshift (tem, (BITS_PER_UNIT == 8 - ? 3 : exact_log2 (BITS_PER_UNIT))); + tem = wi::lshift (tem, LOG2_BITS_PER_UNIT); tem += bit_offset; if (wi::fits_shwi_p (tem)) { @@ -6844,16 +6842,12 @@ get_inner_reference (tree exp, HOST_WIDE /* Avoid returning a negative bitpos as this may wreak havoc later. */ if (wi::neg_p (bit_offset)) { - offset_int mask - = wi::mask (BITS_PER_UNIT == 8 -? 3 : exact_log2 (BITS_PER_UNIT), -false); + offset_int mask = wi::mask (LOG2_BITS_PER_UNIT, false); offset_int tem = bit_offset.and_not (mask); /* TEM is the bitpos rounded to BITS_PER_UNIT towards -Inf. Subtract it to BIT_OFFSET and add it (scaled) to OFFSET. */ bit_offset -= tem; - tem = wi::arshift (tem, (BITS_PER_UNIT == 8 - ? 3 : exact_log2 (BITS_PER_UNIT))); + tem = wi::arshift (tem, LOG2_BITS_PER_UNIT); offset = size_binop (PLUS_EXPR, offset, wide_int_to_tree (sizetype, tem)); } Index: gcc/tree-dfa.c === --- gcc/tree-dfa.c 2014-04-22 20:58:27.020683881 +0100 +++ gcc/tree-dfa.c 2014-04-22 21:00:26.378614888 +0100 @@ -463,10 +463,7 @@ get_ref_base_and_extent (tree exp, HOST_ { offset_int tem = (wi::to_offset (ssize) - wi::to_offset (fsize)); - if (BITS_PER_UNIT == 8) - tem = wi::lshift (tem, 3); - else - tem *= BITS_PER_UNIT; + tem = wi::lshift (tem, LOG2_BITS_PER_UNIT); tem -= woffset; maxsize += tem; } @@ -583,8 +580,7 @@ get_ref_base_and_extent (tree exp, HOST_ else { offset_int off = mem_ref_offset (exp); - off = wi::lshift (off, (BITS_PER_UNIT == 8 - ? 3 : exact_log2 (BITS_PER_UNIT))); + off = wi::lshift (off, LOG2_BITS_PER_UNIT); off += bit_offset; if (wi::fits_shwi_p (off)) { Index: gcc/tree-ssa-alias.c === --- gcc/tree-ssa-alias.c2014-04-22 20:58:26.969683484 +0100 +++ gcc/tree-ssa-alias.c2014-04-22 21:00:26.378614888 +0100 @@ -1041,8 +1041,7 @@ indirect_ref_may_alias_decl_p (tree ref1 /* The offset embedded in MEM_REFs can be negative. Bias them so that the resulting offset adjustment is positive. */ offset_int moff = mem_ref_offset (base1); - moff = wi::lshift (moff, (BITS_PER_UNIT == 8 - ? 3 : exact_log2 (BITS_PER_UNIT))); + moff = wi::lshift (moff, LOG2_BITS_PER_UNIT); if (wi::neg_p (moff)) offset2p += (-moff).to_short_addr (); else @@ -1118,8 +1117,7 @@ indirect_ref_may_alias_decl_p (tree ref1 || TREE_CODE (dbase2) == TARGET_MEM_REF) { offset_int moff = mem_ref_offset (dbase2); - moff = wi::lshift (moff, (BITS_PER_UNIT == 8 - ? 3 : exact_log2 (BITS_PER_UNIT))); + moff = wi::lshift (moff, LOG2_BITS_PER_UNIT); if (wi::neg_p (moff)) doffset1 -= (-moff).to_short_addr (); else @@ -1217,15 +1215,13 @@ indirect_refs_may_alias_p (tree ref1 ATT /* The offset embedded in MEM_REF
[wide-int] fixed vector testcases.
several test cases started failing as a result of making the size of the wide-int buffer smaller. this patch fixes them. This failure was unrelated to the wide-int buffer size directly, but a hard constant in the truck code was replaced by MAX_BITSIZE_MODE_ANY_INT when it should have been replaced by MAX_BITSIZE_MODE_ANY_MODE. committed as revision 206689. kenny Index: gcc/simplify-rtx.c === --- gcc/simplify-rtx.c (revision 206688) +++ gcc/simplify-rtx.c (working copy) @@ -5396,7 +5396,7 @@ simplify_immed_subreg (enum machine_mode case MODE_DECIMAL_FLOAT: { REAL_VALUE_TYPE r; - long tmp[MAX_BITSIZE_MODE_ANY_INT / 32]; + long tmp[MAX_BITSIZE_MODE_ANY_MODE / 32]; /* real_from_target wants its input in words affected by FLOAT_WORDS_BIG_ENDIAN. However, we ignore this,
[wide-int] fixed several regressions in branch.
This patch fixes what appears to have been a long standing failure in the conversion of tree-vect-generic.c:build_replicated_const. This failure caused several regressions on the branch. Committed as revision 206616 Index: gcc/tree-vect-generic.c === --- gcc/tree-vect-generic.c (revision 206609) +++ gcc/tree-vect-generic.c (working copy) @@ -57,7 +57,8 @@ static tree build_replicated_const (tree type, tree inner_type, HOST_WIDE_INT value) { int width = tree_to_uhwi (TYPE_SIZE (inner_type)); - int n = TYPE_PRECISION (type) / width; + int n = (TYPE_PRECISION (type) + HOST_BITS_PER_WIDE_INT - 1) +/ HOST_BITS_PER_WIDE_INT; unsigned HOST_WIDE_INT low, mask; HOST_WIDE_INT a[WIDE_INT_MAX_ELTS]; int i;
Re: wide-int, tree-ssa
On 11/26/2013 07:34 AM, Richard Biener wrote: --- a/gcc/tree-ssa-ccp.c +++ b/gcc/tree-ssa-ccp.c @@ -98,6 +98,15 @@ along with GCC; see the file COPYING3. If not see array CONST_VAL[i].VALUE. That is fed into substitute_and_fold for final substitution and folding. + This algorithm uses wide-ints at the max precision of the target. + This means that, with one uninteresting exception, variables with + UNSIGNED types never go to VARYING because the bits above the + precision of the type of the variable are always zero. The + uninteresting case is a variable of UNSIGNED type that has the + maximum precision of the target. Such variables can go to VARYING, + but this causes no loss of infomation since these variables will + never be extended. + I don't understand this. In CCP a SSA name drops to varying when it's not constant. How is that related to signedness or precision?! Richi, This is a bogosity that is inherited from the double-int code. Consider an unsigned int in double-int (or for that matter in widest-int). It has a rep of a bunch of leading zeros followed by 32 bits of real stuff.Even if you know nothing about the value, it still has the upper bits of zero that you know are constant zeros. Of course this is not true for signed numbers because the upper bits are smeared with the sign bits. This was what i thought you were talking about when you argued many months ago when you said that infinite precision was more natural and that i would have trouble with a fixed precision based on the size of the type or mode and is the reason that tree-ssa-ccp uses widest int rather than wide-int. I do actually believe that wide-int is more natural here. However this bogosity does buy you a few constants. My first cut at this pass used wide-int rather than widest int, but there are in fact constants that the pass finds that depend on this being done this way so to satisfy the bit for bit same code rule for values smaller than timode, so i left this the way that it was but decided to document it. The proper way to do this is in fact to use wide-int, not widest-int but that would require major surgery to the code that converts values from one type to another.I plan to do this during the next stage 1. At that time, i will also enhance the code to expand the sign bit if it is known for signed types. Kenny
Re: wide-int, rtl
On 01/02/2014 05:26 PM, Eric Botcazou wrote: So, I'd like to ping the original patch and Kenny's patch to resolve the issues you found. If you have any other concerns or thoughts, let us know. Almost OK, but remove the strange quotes in the comment for the INTEGER_CST case of expand_expr_real_1 (and optionally add the fast path for the common case written down by Richard). eric, the fast path is not any faster. that was why we did not do it. And by the time you add the tests it would be slower.The thing is that on the inside of the so called fast path, you are still converting the tree to wide-int first and that conversion still is checking the rest of these things. the quotes will be removed before committing. kenny
Re: wide-int more performance fixes for wide multiplication.
On 12/16/2013 01:37 PM, Richard Biener wrote: Kenneth Zadeck wrote: On 12/16/2013 09:37 AM, Richard Biener wrote: Kenneth Zadeck wrote: On 12/16/2013 06:19 AM, Richard Biener wrote: On 12/15/13 7:48 PM, Kenneth Zadeck wrote: On 12/15/2013 11:40 AM, Richard Sandiford wrote: Kenneth Zadeck writes: it is certainly true that in order to do an unbounded set of operations, you would have to check on every operation. so my suggestion that we should remove the checking from the infinite precision would not support this. but the reality is that there are currently no places in the compiler that do this. Currently all of the uses of widest-int are one or two operations, and the style of code writing is that you do these and then you deal with the overflow at the time that you convert the widest-int to a tree. I think that it is important to maintain the style of programming where for a small finite number of computations do not need to check until they convert back. The problem with making the buffer size so tight is that we do not have an adequate reserves to allow this style for any supportable type. I personally think that 2x + some small n is what we need to have. i am not as familiar with how this is used (or to be used when all of the offset math is converted to use wide-int), but there appear to be two uses of multiply.one is the "harmless" mult by 3" and the other is where people are trying to compute the size of arrays. These last operations do need to be checked for overflow.The question here is do you want to force those operations to overflow individually or do you want to check when you convert out.Again, i think 2x + some small number is what we might want to consider. It's a fair question, but personally I think checking for overflow on the operation is much more robust. Checking on conversion doesn't allow you to stop thinking about overflow, it just changes the way you think about it: rather than handling explicit overflow flags, you have to remember to ask "is the range of the unconverted result within the range of widest_int", which I bet it is something that would be easily forgotten once widest_int & co. are part of the furniture. E.g. the SPARC operation (picked only because I remember it): for (i = 0; i < VECTOR_CST_NELTS (arg0); ++i) { tree e0 = VECTOR_CST_ELT (arg0, i); tree e1 = VECTOR_CST_ELT (arg1, i); bool neg1_ovf, neg2_ovf, add1_ovf, add2_ovf; tmp = wi::neg (e1, &neg1_ovf); tmp = wi::add (e0, tmp, SIGNED, &add1_ovf); if (wi::neg_p (tmp)) tmp = wi::neg (tmp, &neg2_ovf); else neg2_ovf = false; result = wi::add (result, tmp, SIGNED, &add2_ovf); overflow |= neg1_ovf | neg2_ovf | add1_ovf | add2_ovf; } gcc_assert (!overflow); return wide_int_to_tree (rtype, result); seems pretty natural. If instead it was modelled as a widest_int chain without overflow then it would be less obviously correct. Thanks, Richard Let us for the sake of argument assume that this was common code rather than code in a particular port, because code in a particular port can know more about the environment than common code is allowed to. My main point is that this code is in wide-int not widest-int because at this level the writer of this code actually wants to model what the target wants to do. So doing the adds in precision and testing overflow is perfectly fine at every step.But this loop CANNOT be written in a style where you tested the overflow at the end because if this is common code you cannot make any assumptions about the largest mode on the machine. If the buffer was 2x + n in size, then it would be reasonably safe to assume that the number of elements in the vector could be represented in an integer and so you could wait till the end. I think that my point was that (and i feel a little uncomfortable putting words in richi's mouth but i believe that this was his point early on) was that he thinks of the widest int as an infinite precision representation.he was the one who was pushing for the entire rep to be done with a large internal (or perhaps unbounded) rep because he felt that this was more natural to not have to think about overflow. He wanted you to be able to chain a mult and a divide and not see the product get truncated before the divide was done.The rep that we have now really sucks with respect to this because widest int truncates if you are close to the largest precision on the machine and does not if you are small with respect to that. My other point is that while you think that the example above is nice, the experience with double-int is contrary to this. people will say (
Re: wide-int more performance fixes for wide multiplication.
On 12/16/2013 09:37 AM, Richard Biener wrote: Kenneth Zadeck wrote: On 12/16/2013 06:19 AM, Richard Biener wrote: On 12/15/13 7:48 PM, Kenneth Zadeck wrote: On 12/15/2013 11:40 AM, Richard Sandiford wrote: Kenneth Zadeck writes: it is certainly true that in order to do an unbounded set of operations, you would have to check on every operation. so my suggestion that we should remove the checking from the infinite precision would not support this. but the reality is that there are currently no places in the compiler that do this. Currently all of the uses of widest-int are one or two operations, and the style of code writing is that you do these and then you deal with the overflow at the time that you convert the widest-int to a tree. I think that it is important to maintain the style of programming where for a small finite number of computations do not need to check until they convert back. The problem with making the buffer size so tight is that we do not have an adequate reserves to allow this style for any supportable type. I personally think that 2x + some small n is what we need to have. i am not as familiar with how this is used (or to be used when all of the offset math is converted to use wide-int), but there appear to be two uses of multiply.one is the "harmless" mult by 3" and the other is where people are trying to compute the size of arrays.These last operations do need to be checked for overflow.The question here is do you want to force those operations to overflow individually or do you want to check when you convert out.Again, i think 2x + some small number is what we might want to consider. It's a fair question, but personally I think checking for overflow on the operation is much more robust. Checking on conversion doesn't allow you to stop thinking about overflow, it just changes the way you think about it: rather than handling explicit overflow flags, you have to remember to ask "is the range of the unconverted result within the range of widest_int", which I bet it is something that would be easily forgotten once widest_int & co. are part of the furniture. E.g. the SPARC operation (picked only because I remember it): for (i = 0; i < VECTOR_CST_NELTS (arg0); ++i) { tree e0 = VECTOR_CST_ELT (arg0, i); tree e1 = VECTOR_CST_ELT (arg1, i); bool neg1_ovf, neg2_ovf, add1_ovf, add2_ovf; tmp = wi::neg (e1, &neg1_ovf); tmp = wi::add (e0, tmp, SIGNED, &add1_ovf); if (wi::neg_p (tmp)) tmp = wi::neg (tmp, &neg2_ovf); else neg2_ovf = false; result = wi::add (result, tmp, SIGNED, &add2_ovf); overflow |= neg1_ovf | neg2_ovf | add1_ovf | add2_ovf; } gcc_assert (!overflow); return wide_int_to_tree (rtype, result); seems pretty natural. If instead it was modelled as a widest_int chain without overflow then it would be less obviously correct. Thanks, Richard Let us for the sake of argument assume that this was common code rather than code in a particular port, because code in a particular port can know more about the environment than common code is allowed to. My main point is that this code is in wide-int not widest-int because at this level the writer of this code actually wants to model what the target wants to do. So doing the adds in precision and testing overflow is perfectly fine at every step.But this loop CANNOT be written in a style where you tested the overflow at the end because if this is common code you cannot make any assumptions about the largest mode on the machine. If the buffer was 2x + n in size, then it would be reasonably safe to assume that the number of elements in the vector could be represented in an integer and so you could wait till the end. I think that my point was that (and i feel a little uncomfortable putting words in richi's mouth but i believe that this was his point early on) was that he thinks of the widest int as an infinite precision representation.he was the one who was pushing for the entire rep to be done with a large internal (or perhaps unbounded) rep because he felt that this was more natural to not have to think about overflow. He wanted you to be able to chain a mult and a divide and not see the product get truncated before the divide was done.The rep that we have now really sucks with respect to this because widest int truncates if you are close to the largest precision on the machine and does not if you are small with respect to that. My other point is that while you think that the example above is nice, the experience with double-int is contrary to this. people will say (and test) the normal modes and anyone trying to use large modes will die a terrible death of a thousand cuts.
Re: wide-int more performance fixes for wide multiplication.
On 12/16/2013 06:19 AM, Richard Biener wrote: On 12/15/13 7:48 PM, Kenneth Zadeck wrote: On 12/15/2013 11:40 AM, Richard Sandiford wrote: Kenneth Zadeck writes: it is certainly true that in order to do an unbounded set of operations, you would have to check on every operation. so my suggestion that we should remove the checking from the infinite precision would not support this. but the reality is that there are currently no places in the compiler that do this. Currently all of the uses of widest-int are one or two operations, and the style of code writing is that you do these and then you deal with the overflow at the time that you convert the widest-int to a tree. I think that it is important to maintain the style of programming where for a small finite number of computations do not need to check until they convert back. The problem with making the buffer size so tight is that we do not have an adequate reserves to allow this style for any supportable type. I personally think that 2x + some small n is what we need to have. i am not as familiar with how this is used (or to be used when all of the offset math is converted to use wide-int), but there appear to be two uses of multiply.one is the "harmless" mult by 3" and the other is where people are trying to compute the size of arrays.These last operations do need to be checked for overflow.The question here is do you want to force those operations to overflow individually or do you want to check when you convert out.Again, i think 2x + some small number is what we might want to consider. It's a fair question, but personally I think checking for overflow on the operation is much more robust. Checking on conversion doesn't allow you to stop thinking about overflow, it just changes the way you think about it: rather than handling explicit overflow flags, you have to remember to ask "is the range of the unconverted result within the range of widest_int", which I bet it is something that would be easily forgotten once widest_int & co. are part of the furniture. E.g. the SPARC operation (picked only because I remember it): for (i = 0; i < VECTOR_CST_NELTS (arg0); ++i) { tree e0 = VECTOR_CST_ELT (arg0, i); tree e1 = VECTOR_CST_ELT (arg1, i); bool neg1_ovf, neg2_ovf, add1_ovf, add2_ovf; tmp = wi::neg (e1, &neg1_ovf); tmp = wi::add (e0, tmp, SIGNED, &add1_ovf); if (wi::neg_p (tmp)) tmp = wi::neg (tmp, &neg2_ovf); else neg2_ovf = false; result = wi::add (result, tmp, SIGNED, &add2_ovf); overflow |= neg1_ovf | neg2_ovf | add1_ovf | add2_ovf; } gcc_assert (!overflow); return wide_int_to_tree (rtype, result); seems pretty natural. If instead it was modelled as a widest_int chain without overflow then it would be less obviously correct. Thanks, Richard Let us for the sake of argument assume that this was common code rather than code in a particular port, because code in a particular port can know more about the environment than common code is allowed to. My main point is that this code is in wide-int not widest-int because at this level the writer of this code actually wants to model what the target wants to do. So doing the adds in precision and testing overflow is perfectly fine at every step.But this loop CANNOT be written in a style where you tested the overflow at the end because if this is common code you cannot make any assumptions about the largest mode on the machine. If the buffer was 2x + n in size, then it would be reasonably safe to assume that the number of elements in the vector could be represented in an integer and so you could wait till the end. I think that my point was that (and i feel a little uncomfortable putting words in richi's mouth but i believe that this was his point early on) was that he thinks of the widest int as an infinite precision representation.he was the one who was pushing for the entire rep to be done with a large internal (or perhaps unbounded) rep because he felt that this was more natural to not have to think about overflow. He wanted you to be able to chain a mult and a divide and not see the product get truncated before the divide was done.The rep that we have now really sucks with respect to this because widest int truncates if you are close to the largest precision on the machine and does not if you are small with respect to that. My other point is that while you think that the example above is nice, the experience with double-int is contrary to this. people will say (and test) the normal modes and anyone trying to use large modes will die a terrible death of a thousand cuts. Well - the cases that matter in practice are 1) the things we have offset_int for - code that does bit vs. byte quantity calculations on addresses or address of
Re: wide-int more performance fixes for wide multiplication.
On 12/15/2013 11:40 AM, Richard Sandiford wrote: Kenneth Zadeck writes: it is certainly true that in order to do an unbounded set of operations, you would have to check on every operation. so my suggestion that we should remove the checking from the infinite precision would not support this. but the reality is that there are currently no places in the compiler that do this. Currently all of the uses of widest-int are one or two operations, and the style of code writing is that you do these and then you deal with the overflow at the time that you convert the widest-int to a tree. I think that it is important to maintain the style of programming where for a small finite number of computations do not need to check until they convert back. The problem with making the buffer size so tight is that we do not have an adequate reserves to allow this style for any supportable type. I personally think that 2x + some small n is what we need to have. i am not as familiar with how this is used (or to be used when all of the offset math is converted to use wide-int), but there appear to be two uses of multiply.one is the "harmless" mult by 3" and the other is where people are trying to compute the size of arrays.These last operations do need to be checked for overflow.The question here is do you want to force those operations to overflow individually or do you want to check when you convert out.Again, i think 2x + some small number is what we might want to consider. It's a fair question, but personally I think checking for overflow on the operation is much more robust. Checking on conversion doesn't allow you to stop thinking about overflow, it just changes the way you think about it: rather than handling explicit overflow flags, you have to remember to ask "is the range of the unconverted result within the range of widest_int", which I bet it is something that would be easily forgotten once widest_int & co. are part of the furniture. E.g. the SPARC operation (picked only because I remember it): for (i = 0; i < VECTOR_CST_NELTS (arg0); ++i) { tree e0 = VECTOR_CST_ELT (arg0, i); tree e1 = VECTOR_CST_ELT (arg1, i); bool neg1_ovf, neg2_ovf, add1_ovf, add2_ovf; tmp = wi::neg (e1, &neg1_ovf); tmp = wi::add (e0, tmp, SIGNED, &add1_ovf); if (wi::neg_p (tmp)) tmp = wi::neg (tmp, &neg2_ovf); else neg2_ovf = false; result = wi::add (result, tmp, SIGNED, &add2_ovf); overflow |= neg1_ovf | neg2_ovf | add1_ovf | add2_ovf; } gcc_assert (!overflow); return wide_int_to_tree (rtype, result); seems pretty natural. If instead it was modelled as a widest_int chain without overflow then it would be less obviously correct. Thanks, Richard Let us for the sake of argument assume that this was common code rather than code in a particular port, because code in a particular port can know more about the environment than common code is allowed to. My main point is that this code is in wide-int not widest-int because at this level the writer of this code actually wants to model what the target wants to do. So doing the adds in precision and testing overflow is perfectly fine at every step.But this loop CANNOT be written in a style where you tested the overflow at the end because if this is common code you cannot make any assumptions about the largest mode on the machine. If the buffer was 2x + n in size, then it would be reasonably safe to assume that the number of elements in the vector could be represented in an integer and so you could wait till the end. I think that my point was that (and i feel a little uncomfortable putting words in richi's mouth but i believe that this was his point early on) was that he thinks of the widest int as an infinite precision representation.he was the one who was pushing for the entire rep to be done with a large internal (or perhaps unbounded) rep because he felt that this was more natural to not have to think about overflow. He wanted you to be able to chain a mult and a divide and not see the product get truncated before the divide was done.The rep that we have now really sucks with respect to this because widest int truncates if you are close to the largest precision on the machine and does not if you are small with respect to that. My other point is that while you think that the example above is nice, the experience with double-int is contrary to this. people will say (and test) the normal modes and anyone trying to use large modes will die a terrible death of a thousand cuts.
Re: wide-int more performance fixes for wide multiplication.
On 12/15/2013 03:54 AM, Richard Sandiford wrote: Kenneth Zadeck writes: The current world is actually structured so that we never ask about overflow for the two larger classes because the reason that you used those classes was that you never wanted to have this discussion. So if you never ask about overflow, then it really does not matter because we are not going to return enough bits for you to care what happened on the inside. Of course that could change and someone could say that they wanted overflow on widest-int. Then the comment makes sense, with revisions, unless your review of the code that wants overflow on widest int suggests that they are just being stupid. But widest_int is now supposed to be at least 1 bit wider than widest input type (unlike previously where it was double the widest input type). So I can definitely see cases where we'd want to know whether a widest_int * widest_int result overflows. My point is that the widest_int * widest_int would normally be a signed multiplication rather than an unsigned multiplication, since the extra 1 bit of precision allows every operation to be signed. So it isn't a case of whether the top bit of a widest_int will be set, but whether we ever reach here for widest_int in the first place. We should be going down the sgn == SIGNED path rather than the sgn == UNSIGNED path. widest_int can represent an all-1s value, usually interpreted as -1. If we do go down this sgn == UNSIGNED path for widest_int then we will instead treat the all-1s value as the maximum unsigned number, just like for any other kind of wide int. As far as this function goes there really is no difference between wide_int, offset_int and widest_int. Which is good, because offset_int and widest_int should just be wide_ints that are optimised for a specific and fixed precision. Thanks, Richard I am now seriously regretting letting richi talk me into changing the size of the wide int buffer from being 2x of the largest mode on the machine. It was a terrible mistake AND i would guess making it smaller does not provide any real benefit. The problem is that when you use widest-int (and by analogy offset int) it should NEVER EVER overflow. Furthermore we need to change the interfaces for these two so that you cannot even ask!!(i do not believe that anyone does ask so the change would be small.) offset_int * offset_int could overflow too, at least in the sense that there are combinations of valid offset_ints whose product can't be represented in an offset_int. E.g. (1ULL << 67) * (1ULL << 67). I think that was always the case. see answer below. There are a huge set of bugs on the trunk that are "fixed" with wide-int because people wrote code for double-int thinking that it was infinite precision.So they never tested the cases of what happens when the size of the variable needed two HWIs. Most of those cases were resolved by making passes like tree-vrp use wide-int and then being explicit about the overflow on every operation, because with wide-int the issue is in your face since things overflow even for 32 bit numbers. However, with the current widest-int, we will only be safe for add and subtract by adding the extra bit. In multiply we are exposed. The perception is that widest-int is a good as infinite precision and no one will ever write the code to check if it overflowed because it only rarely happens. All operations can overflow. We would need 2 extra bits rather than 1 extra bit to stop addition overflowing, because the 1 extra bit we already have is to allow unsigned values to be treated as signed. But 2 extra bits is only good for one addition, not a chain of two additions. That's why ignoring overflow seems dangerous to me. The old wide-int way might have allowed any x * y to be represented, but if nothing checked whether x * y was bigger than expected then x * y + z could overflow. Thanks, Richard it is certainly true that in order to do an unbounded set of operations, you would have to check on every operation. so my suggestion that we should remove the checking from the infinite precision would not support this. but the reality is that there are currently no places in the compiler that do this. Currently all of the uses of widest-int are one or two operations, and the style of code writing is that you do these and then you deal with the overflow at the time that you convert the widest-int to a tree. I think that it is important to maintain the style of programming where for a small finite number of computations do not need to check until they convert back. The problem with making the buffer size so tight is that we do not have an adequate reserves to allow this style for any supportable type. I personally think that 2x + some small n is what we need to have. i am not as familiar with how this is used (or to be used when all of the offset math is converte
Re: wide-int more performance fixes for wide multiplication.
+ vallen = canonize (val, (uvlen + 1) >> 1, prec); + + /* Shift is not always safe to write over one of the +operands, so we must copy. */ + HOST_WIDE_INT tval[2 * WIDE_INT_MAX_ELTS]; + memcpy (tval, val, vallen * CHAR_BIT / HOST_BITS_PER_WIDE_INT); vallen * sizeof (HOST_WIDE_INT) would be more typical. But why not unpack into tval directly and avoid the copy? I could special case this, but the old code was not correct for odd precisions.
Re: wide-int more performance fixes for wide multiplication.
On 12/14/2013 09:30 AM, Richard Sandiford wrote: + /* True if the result needs to be negated. */ + bool is_neg = false; /* If the top level routine did not really pass in an overflow, then just make sure that we never attempt to set it. */ bool needs_overflow = (overflow != 0); + const HOST_WIDE_INT zero = 0; if (needs_overflow) *overflow = false; @@ -1382,61 +1392,96 @@ wi::mul_internal (HOST_WIDE_INT *val, co return 1; } - /* We do unsigned mul and then correct it. */ - wi_unpack (u, (const unsigned HOST_WIDE_INT *) op1val, op1len, -half_blocks_needed, prec, SIGNED); - wi_unpack (v, (const unsigned HOST_WIDE_INT *) op2val, op2len, -half_blocks_needed, prec, SIGNED); - - /* The 2 is for a full mult. */ - memset (r, 0, half_blocks_needed * 2 - * HOST_BITS_PER_HALF_WIDE_INT / CHAR_BIT); + ulen = op1len * 2; + vlen = op2len * 2; + + if (sgn == SIGNED) +{ + if (wi::neg_p (op1)) + { + HOST_WIDE_INT nop1[2 * WIDE_INT_MAX_ELTS]; + E.g. following on from the above: /* Add an extra HWI so that we can represent the negative of the minimum. */ HOST_WIDE_INT *nop1 = XALLOCAVEC (HOST_WIDE_INT, op1len + 1); + int nop1len + = wi::sub_large (nop1, &zero, 1, op1val, op1len, prec + 1, SIGNED, 0); Not sure about the "prec + 1" here, since it could cause nop1len to be greater than the maximum length for "prec". And you go on to unpack nop1 in "prec" rather than "prec + 1". the unpacking with prec is the wrong part. That should have been prec + 1 Hmm, I still think it's the opposite, because... AIUI, once you've taken the absolutes, you've got two unsigned numbers of precision "prec" and create a "prec * 2" result, so naybe: sub_large (..., prec, UNSIGNED, 0) instead? the signed and unsigned does not matter here.the prec + 1 means we never overflow. ...my point is that after this we treat the number as unsigned, so it isn't a question of whether the absolute value overflows the precision as a signed number but whether it overflows as an unsigned number. And it never will. It might need 1 extra HWI than the original input if op1len < blocks_needed, but it shouldn't need extra precision. E.g. say for 8-bit HWIs we start out with a signed -0x80 * -0x80. We convert that into an unsigned 0x80 * 0x80, which is still precision 8 * precision 8 -> precision 16. And we'd need to be able to handle those precision-8 values correctly (with no extra zero HWI) if we had an unsigned 0x80 * 0x80 from the outset. it will not work like that. You are thinking that i really can do an 8 bit X 8 bit multiply.I cannot. The case that i am worried about is 115 bits x 115 bits.in this case i have to make sure that the top bits of the top block are provisioned properly.If you think about your case above where bit 114 is set and the bits below it are zeros, then i need to invert it so that the bit 115 and above are zero. otherwise i get the top of the block polluted with the 1 bits which would mess up the top block multiplied by any of the lower blocks. however, you are, at a deeper level correct.if i change the parms to wi_unpack to UNSIGNED then it is likely you are correct.I will think this through today and see if i can find a counter example.but it may be true that if i invert the number, i always want the top bits of the block to be 0 which is what the sign parameter controls. + is_neg = !is_neg; + ulen = nop1len * 2; + wi_unpack (u, (const unsigned HOST_WIDE_INT*)nop1, nop1len, +ulen, prec, SIGNED); Back on the alloca theme, we'd have: u = XALLOCAVEC (unsigned HOST_HALF_WIDE_INT, ulen); before this and other unpacks. Note that there's supposed to be space after ")" in casts. Lots of instances in the patch :-) i actually did not know this. i will fix. + /* UNSIGNED mul. */ + /* For large positive integers inside regular wide-ints, we must +explictly expand out the 1s to full width for the mul to be +correct. Note that the top bit will never be set for a +unsigned number in offset_int of widest-int. */ s/of/or/. But I think the comment is confusing because the unsignedness is a property of the multiplication rather than the inputs. We should never be doing an unsigned operation on widest_int to begin with. And if offset_ints are being treated as having a sign then the same is true there. If we ever do have an unsigned multiplication on those types for some reason then -1 will (rightly) be treated as the maximum unsigned value. The real question here is "do you want to do an assert or do you want to explain what happens if the input comes in that way?"
Re: wide-int more performance fixes for wide multiplication.
The current world is actually structured so that we never ask about overflow for the two larger classes because the reason that you used those classes was that you never wanted to have this discussion. So if you never ask about overflow, then it really does not matter because we are not going to return enough bits for you to care what happened on the inside. Of course that could change and someone could say that they wanted overflow on widest-int. Then the comment makes sense, with revisions, unless your review of the code that wants overflow on widest int suggests that they are just being stupid. But widest_int is now supposed to be at least 1 bit wider than widest input type (unlike previously where it was double the widest input type). So I can definitely see cases where we'd want to know whether a widest_int * widest_int result overflows. My point is that the widest_int * widest_int would normally be a signed multiplication rather than an unsigned multiplication, since the extra 1 bit of precision allows every operation to be signed. So it isn't a case of whether the top bit of a widest_int will be set, but whether we ever reach here for widest_int in the first place. We should be going down the sgn == SIGNED path rather than the sgn == UNSIGNED path. widest_int can represent an all-1s value, usually interpreted as -1. If we do go down this sgn == UNSIGNED path for widest_int then we will instead treat the all-1s value as the maximum unsigned number, just like for any other kind of wide int. As far as this function goes there really is no difference between wide_int, offset_int and widest_int. Which is good, because offset_int and widest_int should just be wide_ints that are optimised for a specific and fixed precision. Thanks, Richard I am now seriously regretting letting richi talk me into changing the size of the wide int buffer from being 2x of the largest mode on the machine. It was a terrible mistake AND i would guess making it smaller does not provide any real benefit. The problem is that when you use widest-int (and by analogy offset int) it should NEVER EVER overflow. Furthermore we need to change the interfaces for these two so that you cannot even ask!!(i do not believe that anyone does ask so the change would be small.) There are a huge set of bugs on the trunk that are "fixed" with wide-int because people wrote code for double-int thinking that it was infinite precision.So they never tested the cases of what happens when the size of the variable needed two HWIs. Most of those cases were resolved by making passes like tree-vrp use wide-int and then being explicit about the overflow on every operation, because with wide-int the issue is in your face since things overflow even for 32 bit numbers. However, with the current widest-int, we will only be safe for add and subtract by adding the extra bit. In multiply we are exposed. The perception is that widest-int is a good as infinite precision and no one will ever write the code to check if it overflowed because it only rarely happens.
Re: wide-int more performance fixes for wide multiplication.
On 12/14/2013 06:40 AM, Richard Sandiford wrote: Hi Kenny, Sorry for the slow response. Kenneth Zadeck writes: Index: gcc/wide-int.cc === --- gcc/wide-int.cc (revision 205765) +++ gcc/wide-int.cc (working copy) @@ -1275,23 +1275,31 @@ wi::mul_internal (HOST_WIDE_INT *val, co unsigned int j; unsigned int blocks_needed = BLOCKS_NEEDED (prec); unsigned int half_blocks_needed = blocks_needed * 2; - /* The sizes here are scaled to support a 2x largest mode by 2x - largest mode yielding a 4x largest mode result. This is what is - needed by vpn. */ + /* The sizes here are scaled to support a 2*largest mode + a little + by 2*largest mode + a little yielding a 4*largest mode + a + little. This is what is needed by vpn. */ unsigned HOST_HALF_WIDE_INT -u[4 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_HALF_WIDE_INT]; +u[2 * WIDE_INT_MAX_PRECISION / HOST_BITS_PER_HALF_WIDE_INT]; + unsigned int ulen; unsigned HOST_HALF_WIDE_INT -v[4 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_HALF_WIDE_INT]; - /* The '2' in 'R' is because we are internally doing a full +v[2 * WIDE_INT_MAX_PRECISION / HOST_BITS_PER_HALF_WIDE_INT]; + unsigned int vlen; + unsigned int uvlen; + unsigned int vallen; + + /* The '4' in 'R' is because we are internally doing a wide multiply. */ unsigned HOST_HALF_WIDE_INT -r[2 * 4 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_HALF_WIDE_INT]; - HOST_WIDE_INT mask = ((HOST_WIDE_INT)1 << HOST_BITS_PER_HALF_WIDE_INT) - 1; +r[4 * WIDE_INT_MAX_PRECISION / HOST_BITS_PER_HALF_WIDE_INT]; While we're here I think we should convert these arrays into allocas, so that we're not hard-coding the specialness of vpn in wide-int.cc. I.e. rather than having arrays of maximum size, use XALLOCAVEC to allocate a specific number of stack HWIs, once we know how many we actually need. That includes the new arrays added in the patch. I had thought of doing this but there is a part of me that has always thought of this as being more expensive, but in the scheme of things it is just noise here, so i will do it. + /* True if the result needs to be negated. */ + bool is_neg = false; /* If the top level routine did not really pass in an overflow, then just make sure that we never attempt to set it. */ bool needs_overflow = (overflow != 0); + const HOST_WIDE_INT zero = 0; if (needs_overflow) *overflow = false; @@ -1382,61 +1392,96 @@ wi::mul_internal (HOST_WIDE_INT *val, co return 1; } - /* We do unsigned mul and then correct it. */ - wi_unpack (u, (const unsigned HOST_WIDE_INT *) op1val, op1len, -half_blocks_needed, prec, SIGNED); - wi_unpack (v, (const unsigned HOST_WIDE_INT *) op2val, op2len, -half_blocks_needed, prec, SIGNED); - - /* The 2 is for a full mult. */ - memset (r, 0, half_blocks_needed * 2 - * HOST_BITS_PER_HALF_WIDE_INT / CHAR_BIT); + ulen = op1len * 2; + vlen = op2len * 2; + + if (sgn == SIGNED) +{ + if (wi::neg_p (op1)) + { + HOST_WIDE_INT nop1[2 * WIDE_INT_MAX_ELTS]; + E.g. following on from the above: /* Add an extra HWI so that we can represent the negative of the minimum. */ HOST_WIDE_INT *nop1 = XALLOCAVEC (HOST_WIDE_INT, op1len + 1); + int nop1len + = wi::sub_large (nop1, &zero, 1, op1val, op1len, prec + 1, SIGNED, 0); Not sure about the "prec + 1" here, since it could cause nop1len to be greater than the maximum length for "prec". And you go on to unpack nop1 in "prec" rather than "prec + 1". the unpacking with prec is the wrong part. That should have been prec + 1 AIUI, once you've taken the absolutes, you've got two unsigned numbers of precision "prec" and create a "prec * 2" result, so naybe: sub_large (..., prec, UNSIGNED, 0) instead? the signed and unsigned does not matter here.the prec + 1 means we never overflow. + is_neg = !is_neg; + ulen = nop1len * 2; + wi_unpack (u, (const unsigned HOST_WIDE_INT*)nop1, nop1len, +ulen, prec, SIGNED); Back on the alloca theme, we'd have: u = XALLOCAVEC (unsigned HOST_HALF_WIDE_INT, ulen); before this and other unpacks. Note that there's supposed to be space after ")" in casts. Lots of instances in the patch :-) + /* UNSIGNED mul. */ + /* For large positive integers inside regular wide-ints, we must +explictly expand out the 1s to full width for the mul to be +correct. Note that the top bit will never be set for a +unsigned number in offset_int of widest-int. */ s/of/or/. But I think the comment is confusing because the unsignedness is a property of the multiplication
[trunk] added missing parenthesis to #defines generated by gen-modes.c
committed as revision 205970 as obvious. kenny Index: gcc/genmodes.c === --- gcc/genmodes.c (revision 205967) +++ gcc/genmodes.c (working copy) @@ -891,7 +891,7 @@ emit_max_int (void) max = i->bytesize; if (max > mmax) mmax = max; - printf ("#define MAX_BITSIZE_MODE_ANY_INT %d*BITS_PER_UNIT\n", mmax); + printf ("#define MAX_BITSIZE_MODE_ANY_INT (%d*BITS_PER_UNIT)\n", mmax); } else printf ("#define MAX_BITSIZE_MODE_ANY_INT %d\n", max_bitsize_mode_any_int); @@ -901,7 +901,7 @@ emit_max_int (void) for (i = modes[j]; i; i = i->next) if (mmax < i->bytesize) mmax = i->bytesize; - printf ("#define MAX_BITSIZE_MODE_ANY_MODE %d*BITS_PER_UNIT\n", mmax); + printf ("#define MAX_BITSIZE_MODE_ANY_MODE (%d*BITS_PER_UNIT)\n", mmax); } static void
Re: [trunk]: Patch to move BITS_PER_UNIT to be available for genmodes.c
committed as revision 205964 with updated comment. kenny 2013-12-13 Kenneth Zadeck * config/arc/arc.h (BITS_PER_UNIT): Removed. * config/bfin/bfin.h (BITS_PER_UNIT): Removed. * config/lm32/lm32.h (BITS_PER_UNIT): Removed. * config/m32c/m32c.h (BITS_PER_UNIT): Removed. * config/microblaze/microblaze.h (BITS_PER_UNIT): Removed. * config/picochip/picochip.h (BITS_PER_UNIT): Removed. * config/spu/spu.h (BITS_PER_UNIT): Removed. * defaults.h (BITS_PER_UNIT): Removed. * config/i386/i386-modes.def (MAX_BITSIZE_MODE_ANY_INT): New. * doc/rtl (BITS_PER_UNIT): Moved from tm.texi. (MAX_BITSIZE_MODE_ANY_INT): Updated. * doc/tm.texi (BITS_PER_UNIT): Removed. * doc/tm.texi.in (BITS_PER_UNIT): Removed. * genmodes.c (bits_per_unit, max_bitsize_mode_any_int): New. (create_modes): Added code to set bits_per_unit and max_bitsize_mode_any_int. (emit_max_int): Changed code generation. * mkconfig.sh: Added insn-modes.h. On 12/13/2013 06:11 AM, Richard Biener wrote: On Fri, 13 Dec 2013, Uros Bizjak wrote: Hello! In addition, this target also changes the way that MAX_BITSIZE_MODE_ANY_INT is calculated. The value is heavily used on the wide-int branch to allocate buffers that are used to hold large integer values. The change in the way it is computed was motivated by the i386 port, but there may be other ports that have the same problem. The i386 port defines two very large integer modes that are only used as containers for large vectors. They are never used for large integers. The new way of computing this allows a port to say (very early) that some of these integer modes are never used to hold numbers and so smaller buffers can be used for integer calculations. Other ports that play the same game should follow suit. Index: gcc/config/i386/i386-modes.def === --- gcc/config/i386/i386-modes.def (revision 205895) +++ gcc/config/i386/i386-modes.def (working copy) @@ -90,5 +90,10 @@ VECTOR_MODE (INT, QI, 2); /* INT_MODE (OI, 32); INT_MODE (XI, 64); +/* Keep the OI and XI modes from confusing the compiler into thinking + that these modes could actually be used for computation. They are + only holders for vectors during data movement. */ +#define MAX_BITSIZE_MODE_ANY_INT (128) + /* The symbol Pmode stands for one of the above machine modes (usually SImode). The tm.h file specifies which one. It is not a distinct mode. */ __int128 is avaialble only for x86_64 - 64bit targets, so: #define MAX_BITSIZE_MODE_ANY_INT (TARGET_64BIT ? 128 : 64) It needs to be a compile-time constant. Richard. Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 205962) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,24 @@ +2013-12-13 Kenneth Zadeck + + * config/arc/arc.h (BITS_PER_UNIT): Removed. + * config/bfin/bfin.h (BITS_PER_UNIT): Removed. + * config/lm32/lm32.h (BITS_PER_UNIT): Removed. + * config/m32c/m32c.h (BITS_PER_UNIT): Removed. + * config/microblaze/microblaze.h (BITS_PER_UNIT): Removed. + * config/picochip/picochip.h (BITS_PER_UNIT): Removed. + * config/spu/spu.h (BITS_PER_UNIT): Removed. + * defaults.h (BITS_PER_UNIT): Removed. + * config/i386/i386-modes.def (MAX_BITSIZE_MODE_ANY_INT): New. + * doc/rtl (BITS_PER_UNIT): Moved from tm.texi. + (MAX_BITSIZE_MODE_ANY_INT): Updated. + * doc/tm.texi (BITS_PER_UNIT): Removed. + * doc/tm.texi.in (BITS_PER_UNIT): Removed. + * genmodes.c (bits_per_unit, max_bitsize_mode_any_int): New. + (create_modes): Added code to set bits_per_unit and + max_bitsize_mode_any_int. + (emit_max_int): Changed code generation. + * mkconfig.sh: Added insn-modes.h. + 2013-12-13 Yuri Rumyantsev * config/i386/i386.c (slm_cost): Fix imul cost for HI. Index: gcc/config/arc/arc.h === --- gcc/config/arc/arc.h (revision 205962) +++ gcc/config/arc/arc.h (working copy) @@ -303,9 +303,6 @@ along with GCC; see the file COPYING3. numbered. */ #define WORDS_BIG_ENDIAN (TARGET_BIG_ENDIAN) -/* Number of bits in an addressable storage unit. */ -#define BITS_PER_UNIT 8 - /* Width in bits of a "word", which is the contents of a machine register. Note that this is not necessarily the width of data type `int'; if using 16-bit ints on a 68000, this would still be 32. Index: gcc/config/bfin/bfin.h === --- gcc/config/bfin/bfin.h (revision 205962) +++ gcc/config/bfin/bfin.h (working copy) @@ -859,9 +859,6 @@ typedef struct { /* Define this if most significant word of a multiword number is numbered. */ #define WORDS_BIG_ENDIAN 0 -/* number of bits in an addressable storage unit */ -#define BITS_PER_UNIT 8 - /* Width in bits of a "word", which is the contents of a machine register. Note that this is not necessarily th
Re: [trunk]: Patch to move BITS_PER_UNIT to be available for genmodes.c
On Dec 13, 2013, at 5:11 AM, Richard Biener wrote: > On Fri, 13 Dec 2013, Uros Bizjak wrote: > >> Hello! >> >>> In addition, this target also changes the way that MAX_BITSIZE_MODE_ANY_INT >>> is calculated. >>> The value is heavily used on the wide-int branch to allocate buffers that >>> are used to hold large >>> integer values. The change in the way it is computed was motivated by the >>> i386 port, but there >>> may be other ports that have the same problem. The i386 port defines two >>> very large integer >>> modes that are only used as containers for large vectors. They are never >>> used for large integers. >>> The new way of computing this allows a port to say (very early) that some >>> of these integer modes >>> are never used to hold numbers and so smaller buffers can be used for >>> integer calculations. Other >>> ports that play the same game should follow suit. >> >> Index: gcc/config/i386/i386-modes.def >> === >> --- gcc/config/i386/i386-modes.def (revision 205895) >> +++ gcc/config/i386/i386-modes.def (working copy) >> @@ -90,5 +90,10 @@ VECTOR_MODE (INT, QI, 2); /* >> INT_MODE (OI, 32); >> INT_MODE (XI, 64); >> >> +/* Keep the OI and XI modes from confusing the compiler into thinking >> + that these modes could actually be used for computation. They are >> + only holders for vectors during data movement. */ >> +#define MAX_BITSIZE_MODE_ANY_INT (128) >> + >> /* The symbol Pmode stands for one of the above machine modes (usually >> SImode). >>The tm.h file specifies which one. It is not a distinct mode. */ >> >> __int128 is avaialble only for x86_64 - 64bit targets, so: >> >> #define MAX_BITSIZE_MODE_ANY_INT (TARGET_64BIT ? 128 : 64) > > It needs to be a compile-time constant. > > Richard. I will add a line to the effect to the doc before checking it in.
Re: [trunk]: Patch to move BITS_PER_UNIT to be available for genmodes.c
On 12/11/2013 08:42 PM, DJ Delorie wrote: The m32c part is OK. thanks for the fast reply. kenny
[trunk]: Patch to move BITS_PER_UNIT to be available for genmodes.c
This patch is for the trunk, but it solves a problem that comes up for wide-int. For wide-int we need to have the BITS_PER_UNIT available earlier.So this patch sets the default value (8) in genmodes.c so that it is available by anyone who includes insn-modes.h. The generator for tm.h was modified to include insn-modes.h.The value for BITS_PER_UNIT can be overridden by any port by placing a define for it in their target modes file. This patch removes the definition of BITS_PER_UNIT from 7 platform .h files. All of those platforms initialized it to the default value so there was no need for additions to their target modes file. In addition, this target also changes the way that MAX_BITSIZE_MODE_ANY_INT is calculated.The value is heavily used on the wide-int branch to allocate buffers that are used to hold large integer values. The change in the way it is computed was motivated by the i386 port, but there may be other ports that have the same problem. The i386 port defines two very large integer modes that are only used as containers for large vectors. They are never used for large integers. The new way of computing this allows a port to say (very early) that some of these integer modes are never used to hold numbers and so smaller buffers can be used for integer calculations. Other ports that play the same game should follow suit. This patch has been bootstrapped and regression tested on x86-64. Ok to commit? Kenny Index: gcc/config/arc/arc.h === --- gcc/config/arc/arc.h (revision 205895) +++ gcc/config/arc/arc.h (working copy) @@ -303,9 +303,6 @@ along with GCC; see the file COPYING3. numbered. */ #define WORDS_BIG_ENDIAN (TARGET_BIG_ENDIAN) -/* Number of bits in an addressable storage unit. */ -#define BITS_PER_UNIT 8 - /* Width in bits of a "word", which is the contents of a machine register. Note that this is not necessarily the width of data type `int'; if using 16-bit ints on a 68000, this would still be 32. Index: gcc/config/bfin/bfin.h === --- gcc/config/bfin/bfin.h (revision 205895) +++ gcc/config/bfin/bfin.h (working copy) @@ -859,9 +859,6 @@ typedef struct { /* Define this if most significant word of a multiword number is numbered. */ #define WORDS_BIG_ENDIAN 0 -/* number of bits in an addressable storage unit */ -#define BITS_PER_UNIT 8 - /* Width in bits of a "word", which is the contents of a machine register. Note that this is not necessarily the width of data type `int'; if using 16-bit ints on a 68000, this would still be 32. Index: gcc/config/lm32/lm32.h === --- gcc/config/lm32/lm32.h (revision 205895) +++ gcc/config/lm32/lm32.h (working copy) @@ -73,7 +73,6 @@ #define BYTES_BIG_ENDIAN 1 #define WORDS_BIG_ENDIAN 1 -#define BITS_PER_UNIT 8 #define BITS_PER_WORD 32 #define UNITS_PER_WORD 4 Index: gcc/config/m32c/m32c.h === --- gcc/config/m32c/m32c.h (revision 205895) +++ gcc/config/m32c/m32c.h (working copy) @@ -140,7 +140,6 @@ machine_function; matches "int". Pointers are 16 bits for R8C/M16C (when TARGET_A16 is true) and 24 bits for M32CM/M32C (when TARGET_A24 is true), but 24-bit pointers are stored in 32-bit words. */ -#define BITS_PER_UNIT 8 #define UNITS_PER_WORD 2 #define POINTER_SIZE (TARGET_A16 ? 16 : 32) #define POINTERS_EXTEND_UNSIGNED 1 Index: gcc/config/microblaze/microblaze.h === --- gcc/config/microblaze/microblaze.h (revision 205895) +++ gcc/config/microblaze/microblaze.h (working copy) @@ -193,7 +193,6 @@ extern enum pipeline_type microblaze_pip #define BITS_BIG_ENDIAN 0 #define BYTES_BIG_ENDIAN (TARGET_LITTLE_ENDIAN == 0) #define WORDS_BIG_ENDIAN (BYTES_BIG_ENDIAN) -#define BITS_PER_UNIT 8 #define BITS_PER_WORD 32 #define UNITS_PER_WORD 4 #define MIN_UNITS_PER_WORD 4 Index: gcc/config/picochip/picochip.h === --- gcc/config/picochip/picochip.h (revision 205895) +++ gcc/config/picochip/picochip.h (working copy) @@ -92,8 +92,6 @@ extern enum picochip_dfa_type picochip_s #define BYTES_BIG_ENDIAN 0 #define WORDS_BIG_ENDIAN 0 -#define BITS_PER_UNIT 8 - #define BITS_PER_WORD 16 #define UNITS_PER_WORD (BITS_PER_WORD / BITS_PER_UNIT) Index: gcc/config/spu/spu.h === --- gcc/config/spu/spu.h (revision 205895) +++ gcc/config/spu/spu.h (working copy) @@ -54,8 +54,6 @@ extern GTY(()) int spu_tune; #define WORDS_BIG_ENDIAN 1 -#define BITS_PER_UNIT 8 - /* GCC uses word_mode in many places, assuming that it is the fastest integer mode. That is not the case for SPU though. We can't us
Re: [wide-int] small cleanup in wide-int.*
On 12/09/2013 10:01 AM, Richard Biener wrote: On Mon, Dec 9, 2013 at 3:49 PM, Kenneth Zadeck wrote: On 12/08/2013 05:35 AM, Richard Biener wrote: Richard Sandiford wrote: Kenneth Zadeck writes: #define WIDE_INT_MAX_ELTS \ - ((4 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \ - / HOST_BITS_PER_WIDE_INT) + (((MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1)\ +/ HOST_BITS_PER_WIDE_INT) + 1) I think this should be: (MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT + 1) We only need an extra HWI if MAX_BITSIZE_MODE_ANY_INT is an exact multiple of HOST_BITS_PER_WIDE_INT. we will do this later when some other issues that Eric B raised are settled. I think you're talking about defining MAX_BITSIZE_MODE_ANY_INT as MAX_SIZE_MODE_ANY_INT * BITS_PER_UNIT, but that's orthogonal to the change above. IMO it doesn't make sense to both round up the division and also add 1 to the result. So I think we should make this change regardless of whatever follows. Looks good to me otherwise, thanks. Richard so this one works the way you want.While it is true the the problems are disjoint, the solution will likely evolving change the same lines of source in two different ways. Well, if we're changing the definition of WIDE_INT_MAX_ELTS now (and we are) I think we should change it to something that makes sense. All we want is 1 extra bit. We don't need to round up the size and then add a whole HWI on top. Doing that will potentially pessimise targets that don't need anything beyond SImode. I agree. Richard. Done, ok to commit? + early examination of the target's mode file. The WIDE_INT_MAX_ELTS + can accomodate at least 1 more bit so that unsigned numbers of that + mode can be represented. Note that it is still possible to create + fixed_wide_ints that have precisions greater than + MAX_BITSIZE_MODE_ANY_INT. This can be useful when representing a + double-width multiplication result, for example. */ #define WIDE_INT_MAX_ELTS \ - ((4 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \ - / HOST_BITS_PER_WIDE_INT) + (((MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1)\ +/ HOST_BITS_PER_WIDE_INT)) + that doesn't reserve 1 more bit, it just rounds up. For one more bit you need to add 1, so #define WIDE_INT_MAX_ELTS \ + (((MAX_BITSIZE_MODE_ANY_INT + 1 + HOST_BITS_PER_WIDE_INT - 1)\ +/ HOST_BITS_PER_WIDE_INT)) no? Otherwise ok. Thanks, Richard. kenny It's not like I can approve the patch anyway though, so I'll leave it there. Thanks, Richard committed as revision 205900 after rebootstrap. kenny Index: gcc/tree.h === --- gcc/tree.h (revision 205897) +++ gcc/tree.h (working copy) @@ -4545,7 +4545,7 @@ namespace wi static const unsigned int precision = N; }; - generic_wide_int > + generic_wide_int > to_widest (const_tree); generic_wide_int > to_offset (const_tree); @@ -4566,7 +4566,7 @@ wi::int_traits ::decompose ( precision); } -inline generic_wide_int > +inline generic_wide_int > wi::to_widest (const_tree t) { return t; @@ -4605,7 +4605,7 @@ wi::extended_tree ::get_len () const { if (N == ADDR_MAX_PRECISION) return TREE_INT_CST_OFFSET_NUNITS (m_t); - else if (N == MAX_BITSIZE_MODE_ANY_INT) + else if (N >= WIDE_INT_MAX_PRECISION) return TREE_INT_CST_EXT_NUNITS (m_t); else /* This class is designed to be used for specific output precisions Index: gcc/tree-vrp.c === --- gcc/tree-vrp.c (revision 205897) +++ gcc/tree-vrp.c (working copy) @@ -2620,23 +2620,24 @@ extract_range_from_binary_expr_1 (value_ signop sign = TYPE_SIGN (expr_type); unsigned int prec = TYPE_PRECISION (expr_type); - unsigned int prec2 = (prec * 2) + (sign == UNSIGNED ? 2 : 0); if (range_int_cst_p (&vr0) && range_int_cst_p (&vr1) && TYPE_OVERFLOW_WRAPS (expr_type)) { - wide_int sizem1 = wi::mask (prec, false, prec2); - wide_int size = sizem1 + 1; + typedef FIXED_WIDE_INT (WIDE_INT_MAX_PRECISION * 2) vrp_int; + typedef generic_wide_int + > vrp_int_cst; + vrp_int sizem1 = wi::mask (prec, false); + vrp_int size = sizem1 + 1; /* Extend the values using the sign of the result to PREC2. From here on out, everthing is just signed math no matter what the input types were. */ - wide_int min0 = wide_int::from (vr0.min, prec2, sign); - wide_int max0 = wide_int::from (vr0.max, prec2, sign); - wide_int min1 = wide_int::from (vr1.min, prec2, sign); - wide_int max1 = wide_int::from (vr1.max, prec2, sign); - + vrp_int min0 = vrp_int_cst (vr0.min); + vrp_int max0 = vrp_int_cst (vr0.max); + vrp_int min1 = vrp_int_cst (vr1.min); + vrp_in
wide-int more performance fixes for wide multiplication.
This patch is the last performance patch that i have for wide-int. This patch changes large multiply from taking precision/hbpwi * precision/hbpwi multiplies to taking #significant_bits1/hbpwi * #significant_bits2/hbpwi multiplications. That was a significant number of multiplies on machines that had a large largest integer. This patch was mostly tested before richard sandifords patch which uses the longlong stuff. That patch gets many of the cases that this would get. but if either of the operands does not fit into a hwi, then things got really expensive without this patch. This patch works a lot differently from the version in the branch. In the branch, the inputs were assumed to unsigned numbers and the multiply happened with what ever came in. After the multiply happened and if a signed multiply was requested, the product was adjusted. in this version, the adjustment happens first if we are doing signed multiply. Thus the operands are always unsigned numbers to the actual multiply loops. We may need an extra bit for this, but we have it. The advantage of this is that negative numbers have a lot of leading 1s and compensating for these is complex. The easy solution is to convert this to a positive number and then count many hwi's are needed to represent that.Of course, if exactly one of the operands was negative, then we end up negating the product at the end. This patch was tested on x86-64 with richards longlong short cut commented out to get it to hit a lot of times. And for this it makes a lot of difference in the number of multiplies that are performed. Without commenting that out, the number of hits is small when one looks only at the bootstrap and the regression tests. However, this patch will also get more hit more often for ports that decide to revert back to having a HWI be 32 bits because any time one of the operands is larger than 32 bits, this patch will take control. Also, this patch hits if any of the operands really do not fit in a hwi. This patch only works after that patch to change tree-vrp has been applied. http://gcc.gnu.org/ml/gcc-patches/2013-12/msg00877.html kenny Index: gcc/wide-int.cc === --- gcc/wide-int.cc (revision 205765) +++ gcc/wide-int.cc (working copy) @@ -1275,23 +1275,31 @@ wi::mul_internal (HOST_WIDE_INT *val, co unsigned int j; unsigned int blocks_needed = BLOCKS_NEEDED (prec); unsigned int half_blocks_needed = blocks_needed * 2; - /* The sizes here are scaled to support a 2x largest mode by 2x - largest mode yielding a 4x largest mode result. This is what is - needed by vpn. */ + /* The sizes here are scaled to support a 2*largest mode + a little + by 2*largest mode + a little yielding a 4*largest mode + a + little. This is what is needed by vpn. */ unsigned HOST_HALF_WIDE_INT -u[4 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_HALF_WIDE_INT]; +u[2 * WIDE_INT_MAX_PRECISION / HOST_BITS_PER_HALF_WIDE_INT]; + unsigned int ulen; unsigned HOST_HALF_WIDE_INT -v[4 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_HALF_WIDE_INT]; - /* The '2' in 'R' is because we are internally doing a full +v[2 * WIDE_INT_MAX_PRECISION / HOST_BITS_PER_HALF_WIDE_INT]; + unsigned int vlen; + unsigned int uvlen; + unsigned int vallen; + + /* The '4' in 'R' is because we are internally doing a wide multiply. */ unsigned HOST_HALF_WIDE_INT -r[2 * 4 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_HALF_WIDE_INT]; - HOST_WIDE_INT mask = ((HOST_WIDE_INT)1 << HOST_BITS_PER_HALF_WIDE_INT) - 1; +r[4 * WIDE_INT_MAX_PRECISION / HOST_BITS_PER_HALF_WIDE_INT]; + + /* True if the result needs to be negated. */ + bool is_neg = false; /* If the top level routine did not really pass in an overflow, then just make sure that we never attempt to set it. */ bool needs_overflow = (overflow != 0); + const HOST_WIDE_INT zero = 0; if (needs_overflow) *overflow = false; @@ -1382,61 +1392,96 @@ wi::mul_internal (HOST_WIDE_INT *val, co return 1; } - /* We do unsigned mul and then correct it. */ - wi_unpack (u, (const unsigned HOST_WIDE_INT *) op1val, op1len, - half_blocks_needed, prec, SIGNED); - wi_unpack (v, (const unsigned HOST_WIDE_INT *) op2val, op2len, - half_blocks_needed, prec, SIGNED); - - /* The 2 is for a full mult. */ - memset (r, 0, half_blocks_needed * 2 - * HOST_BITS_PER_HALF_WIDE_INT / CHAR_BIT); + ulen = op1len * 2; + vlen = op2len * 2; + + if (sgn == SIGNED) +{ + if (wi::neg_p (op1)) + { + HOST_WIDE_INT nop1[2 * WIDE_INT_MAX_ELTS]; + + int nop1len + = wi::sub_large (nop1, &zero, 1, op1val, op1len, prec + 1, SIGNED, 0); + is_neg = !is_neg; + ulen = nop1len * 2; + wi_unpack (u, (const unsigned HOST_WIDE_INT*)nop1, nop1len, + ulen, prec, SIGNED); + } + else + wi_unpack (u, (const unsigned HOST_WIDE_INT*)op1val, o
Re: [wide-int] small cleanup in wide-int.*
On 12/09/2013 10:12 AM, Richard Sandiford wrote: Richard Biener writes: On Mon, Dec 9, 2013 at 3:49 PM, Kenneth Zadeck wrote: On 12/08/2013 05:35 AM, Richard Biener wrote: Richard Sandiford wrote: Kenneth Zadeck writes: #define WIDE_INT_MAX_ELTS \ - ((4 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \ - / HOST_BITS_PER_WIDE_INT) + (((MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1)\ +/ HOST_BITS_PER_WIDE_INT) + 1) I think this should be: (MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT + 1) We only need an extra HWI if MAX_BITSIZE_MODE_ANY_INT is an exact multiple of HOST_BITS_PER_WIDE_INT. we will do this later when some other issues that Eric B raised are settled. I think you're talking about defining MAX_BITSIZE_MODE_ANY_INT as MAX_SIZE_MODE_ANY_INT * BITS_PER_UNIT, but that's orthogonal to the change above. IMO it doesn't make sense to both round up the division and also add 1 to the result. So I think we should make this change regardless of whatever follows. Looks good to me otherwise, thanks. Richard so this one works the way you want.While it is true the the problems are disjoint, the solution will likely evolving change the same lines of source in two different ways. Well, if we're changing the definition of WIDE_INT_MAX_ELTS now (and we are) I think we should change it to something that makes sense. All we want is 1 extra bit. We don't need to round up the size and then add a whole HWI on top. Doing that will potentially pessimise targets that don't need anything beyond SImode. I agree. Richard. Done, ok to commit? + early examination of the target's mode file. The WIDE_INT_MAX_ELTS + can accomodate at least 1 more bit so that unsigned numbers of that + mode can be represented. Note that it is still possible to create + fixed_wide_ints that have precisions greater than + MAX_BITSIZE_MODE_ANY_INT. This can be useful when representing a + double-width multiplication result, for example. */ #define WIDE_INT_MAX_ELTS \ - ((4 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \ - / HOST_BITS_PER_WIDE_INT) + (((MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1)\ +/ HOST_BITS_PER_WIDE_INT)) + that doesn't reserve 1 more bit, it just rounds up. For one more bit you need to add 1, so #define WIDE_INT_MAX_ELTS \ + (((MAX_BITSIZE_MODE_ANY_INT + 1 + HOST_BITS_PER_WIDE_INT - 1)\ +/ HOST_BITS_PER_WIDE_INT)) no? Or just: (MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT + 1) as above. Richard This time for sure. Tested on x86-64 kenny Index: gcc/tree-vrp.c === --- gcc/tree-vrp.c (revision 205726) +++ gcc/tree-vrp.c (working copy) @@ -2620,23 +2620,24 @@ extract_range_from_binary_expr_1 (value_ signop sign = TYPE_SIGN (expr_type); unsigned int prec = TYPE_PRECISION (expr_type); - unsigned int prec2 = (prec * 2) + (sign == UNSIGNED ? 2 : 0); if (range_int_cst_p (&vr0) && range_int_cst_p (&vr1) && TYPE_OVERFLOW_WRAPS (expr_type)) { - wide_int sizem1 = wi::mask (prec, false, prec2); - wide_int size = sizem1 + 1; + typedef FIXED_WIDE_INT (WIDE_INT_MAX_PRECISION * 2) vrp_int; + typedef generic_wide_int + > vrp_int_cst; + vrp_int sizem1 = wi::mask (prec, false); + vrp_int size = sizem1 + 1; /* Extend the values using the sign of the result to PREC2. From here on out, everthing is just signed math no matter what the input types were. */ - wide_int min0 = wide_int::from (vr0.min, prec2, sign); - wide_int max0 = wide_int::from (vr0.max, prec2, sign); - wide_int min1 = wide_int::from (vr1.min, prec2, sign); - wide_int max1 = wide_int::from (vr1.max, prec2, sign); - + vrp_int min0 = vrp_int_cst (vr0.min); + vrp_int max0 = vrp_int_cst (vr0.max); + vrp_int min1 = vrp_int_cst (vr1.min); + vrp_int max1 = vrp_int_cst (vr1.max); /* Canonicalize the intervals. */ if (sign == UNSIGNED) { @@ -2653,17 +2654,17 @@ extract_range_from_binary_expr_1 (value_ } } - wide_int prod0 = min0 * min1; - wide_int prod1 = min0 * max1; - wide_int prod2 = max0 * min1; - wide_int prod3 = max0 * max1; + vrp_int prod0 = min0 * min1; + vrp_int prod1 = min0 * max1; + vrp_int prod2 = max0 * min1; + vrp_int prod3 = max0 * max1; /* Sort the 4 products so that min is in prod0 and max is in prod3. */ /* min0min1 > max0max1 */ if (wi::gts_p (prod0, prod3)) { - wide_int tmp = prod3; + vrp_int tmp = prod3; prod3 = prod0; prod0 = tmp; } @@ -2671,21 +2672,21 @@ extract_range_from_binary_expr_1 (value_ /* min0max1 > max0min1 */ if (wi::gts_p (prod1, prod2)) { - wide_int tmp = prod2; + vrp_int tmp = prod2; prod
Re: [wide-int] small cleanup in wide-int.*
On 12/09/2013 10:01 AM, Richard Biener wrote: On Mon, Dec 9, 2013 at 3:49 PM, Kenneth Zadeck wrote: On 12/08/2013 05:35 AM, Richard Biener wrote: Richard Sandiford wrote: Kenneth Zadeck writes: #define WIDE_INT_MAX_ELTS \ - ((4 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \ - / HOST_BITS_PER_WIDE_INT) + (((MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1)\ +/ HOST_BITS_PER_WIDE_INT) + 1) I think this should be: (MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT + 1) We only need an extra HWI if MAX_BITSIZE_MODE_ANY_INT is an exact multiple of HOST_BITS_PER_WIDE_INT. we will do this later when some other issues that Eric B raised are settled. I think you're talking about defining MAX_BITSIZE_MODE_ANY_INT as MAX_SIZE_MODE_ANY_INT * BITS_PER_UNIT, but that's orthogonal to the change above. IMO it doesn't make sense to both round up the division and also add 1 to the result. So I think we should make this change regardless of whatever follows. Looks good to me otherwise, thanks. Richard so this one works the way you want.While it is true the the problems are disjoint, the solution will likely evolving change the same lines of source in two different ways. Well, if we're changing the definition of WIDE_INT_MAX_ELTS now (and we are) I think we should change it to something that makes sense. All we want is 1 extra bit. We don't need to round up the size and then add a whole HWI on top. Doing that will potentially pessimise targets that don't need anything beyond SImode. I agree. Richard. Done, ok to commit? + early examination of the target's mode file. The WIDE_INT_MAX_ELTS + can accomodate at least 1 more bit so that unsigned numbers of that + mode can be represented. Note that it is still possible to create + fixed_wide_ints that have precisions greater than + MAX_BITSIZE_MODE_ANY_INT. This can be useful when representing a + double-width multiplication result, for example. */ #define WIDE_INT_MAX_ELTS \ - ((4 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \ - / HOST_BITS_PER_WIDE_INT) + (((MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1)\ +/ HOST_BITS_PER_WIDE_INT)) + that doesn't reserve 1 more bit, it just rounds up. For one more bit you need to add 1, so #define WIDE_INT_MAX_ELTS \ + (((MAX_BITSIZE_MODE_ANY_INT + 1 + HOST_BITS_PER_WIDE_INT - 1)\ +/ HOST_BITS_PER_WIDE_INT)) no? oops? i am testing it correctly this time. Will report back in an hour. kenny Otherwise ok. Thanks, Richard. kenny It's not like I can approve the patch anyway though, so I'll leave it there. Thanks, Richard
Re: [wide-int] small cleanup in wide-int.*
On 12/08/2013 05:35 AM, Richard Biener wrote: Richard Sandiford wrote: Kenneth Zadeck writes: #define WIDE_INT_MAX_ELTS \ - ((4 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \ - / HOST_BITS_PER_WIDE_INT) + (((MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1)\ +/ HOST_BITS_PER_WIDE_INT) + 1) I think this should be: (MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT + 1) We only need an extra HWI if MAX_BITSIZE_MODE_ANY_INT is an exact multiple of HOST_BITS_PER_WIDE_INT. we will do this later when some other issues that Eric B raised are settled. I think you're talking about defining MAX_BITSIZE_MODE_ANY_INT as MAX_SIZE_MODE_ANY_INT * BITS_PER_UNIT, but that's orthogonal to the change above. IMO it doesn't make sense to both round up the division and also add 1 to the result. So I think we should make this change regardless of whatever follows. Looks good to me otherwise, thanks. Richard so this one works the way you want.While it is true the the problems are disjoint, the solution will likely evolving change the same lines of source in two different ways. Well, if we're changing the definition of WIDE_INT_MAX_ELTS now (and we are) I think we should change it to something that makes sense. All we want is 1 extra bit. We don't need to round up the size and then add a whole HWI on top. Doing that will potentially pessimise targets that don't need anything beyond SImode. I agree. Richard. Done, ok to commit? kenny It's not like I can approve the patch anyway though, so I'll leave it there. Thanks, Richard Index: gcc/tree-vrp.c === --- gcc/tree-vrp.c (revision 205726) +++ gcc/tree-vrp.c (working copy) @@ -2620,23 +2620,24 @@ extract_range_from_binary_expr_1 (value_ signop sign = TYPE_SIGN (expr_type); unsigned int prec = TYPE_PRECISION (expr_type); - unsigned int prec2 = (prec * 2) + (sign == UNSIGNED ? 2 : 0); if (range_int_cst_p (&vr0) && range_int_cst_p (&vr1) && TYPE_OVERFLOW_WRAPS (expr_type)) { - wide_int sizem1 = wi::mask (prec, false, prec2); - wide_int size = sizem1 + 1; + typedef FIXED_WIDE_INT (WIDE_INT_MAX_PRECISION * 2) vrp_int; + typedef generic_wide_int + > vrp_int_cst; + vrp_int sizem1 = wi::mask (prec, false); + vrp_int size = sizem1 + 1; /* Extend the values using the sign of the result to PREC2. From here on out, everthing is just signed math no matter what the input types were. */ - wide_int min0 = wide_int::from (vr0.min, prec2, sign); - wide_int max0 = wide_int::from (vr0.max, prec2, sign); - wide_int min1 = wide_int::from (vr1.min, prec2, sign); - wide_int max1 = wide_int::from (vr1.max, prec2, sign); - + vrp_int min0 = vrp_int_cst (vr0.min); + vrp_int max0 = vrp_int_cst (vr0.max); + vrp_int min1 = vrp_int_cst (vr1.min); + vrp_int max1 = vrp_int_cst (vr1.max); /* Canonicalize the intervals. */ if (sign == UNSIGNED) { @@ -2653,17 +2654,17 @@ extract_range_from_binary_expr_1 (value_ } } - wide_int prod0 = min0 * min1; - wide_int prod1 = min0 * max1; - wide_int prod2 = max0 * min1; - wide_int prod3 = max0 * max1; + vrp_int prod0 = min0 * min1; + vrp_int prod1 = min0 * max1; + vrp_int prod2 = max0 * min1; + vrp_int prod3 = max0 * max1; /* Sort the 4 products so that min is in prod0 and max is in prod3. */ /* min0min1 > max0max1 */ if (wi::gts_p (prod0, prod3)) { - wide_int tmp = prod3; + vrp_int tmp = prod3; prod3 = prod0; prod0 = tmp; } @@ -2671,21 +2672,21 @@ extract_range_from_binary_expr_1 (value_ /* min0max1 > max0min1 */ if (wi::gts_p (prod1, prod2)) { - wide_int tmp = prod2; + vrp_int tmp = prod2; prod2 = prod1; prod1 = tmp; } if (wi::gts_p (prod0, prod1)) { - wide_int tmp = prod1; + vrp_int tmp = prod1; prod1 = prod0; prod0 = tmp; } if (wi::gts_p (prod2, prod3)) { - wide_int tmp = prod3; + vrp_int tmp = prod3; prod3 = prod2; prod2 = tmp; } Index: gcc/tree.h === --- gcc/tree.h (revision 205726) +++ gcc/tree.h (working copy) @@ -4549,7 +4549,7 @@ namespace wi static const unsigned int precision = N; }; - generic_wide_int > + generic_wide_int > to_widest (const_tree); generic_wide_int > to_offset (const_tree); @@ -4570,7 +4570,7 @@ wi::int_traits ::decompose ( precision); } -inline generic_wide_int > +inline generic_wide_int > wi::to_widest (const_tree t) { return t; @@ -4609,7 +4609,7 @@ wi::extended_tree ::get_len () const { if (N == ADDR_MAX_PRECISION) return TR
Re: [wide-int] small cleanup in wide-int.*
On 12/06/2013 01:32 PM, Richard Sandiford wrote: Kenneth Zadeck writes: On 12/03/2013 11:52 AM, Richard Sandiford wrote: Kenneth Zadeck writes: Index: tree-vrp.c === --- tree-vrp.c (revision 205597) +++ tree-vrp.c (working copy) @@ -2611,22 +2611,28 @@ extract_range_from_binary_expr_1 (value_ signop sign = TYPE_SIGN (expr_type); unsigned int prec = TYPE_PRECISION (expr_type); - unsigned int prec2 = (prec * 2) + (sign == UNSIGNED ? 2 : 0); if (range_int_cst_p (&vr0) && range_int_cst_p (&vr1) && TYPE_OVERFLOW_WRAPS (expr_type)) { - wide_int sizem1 = wi::mask (prec, false, prec2); - wide_int size = sizem1 + 1; + /* vrp_int is twice as wide as anything that the target +supports so it can support a full width multiply. No +need to add any more padding for an extra sign bit +because that comes with the way that WIDE_INT_MAX_ELTS is +defined. */ + typedef FIXED_WIDE_INT (WIDE_INT_MAX_PRECISION * 2) + vrp_int; + vrp_int sizem1 = wi::mask (prec, false); + vrp_int size = sizem1 + 1; /* Extend the values using the sign of the result to PREC2. From here on out, everthing is just signed math no matter what the input types were. */ - wide_int min0 = wide_int::from (vr0.min, prec2, sign); - wide_int max0 = wide_int::from (vr0.max, prec2, sign); - wide_int min1 = wide_int::from (vr1.min, prec2, sign); - wide_int max1 = wide_int::from (vr1.max, prec2, sign); + vrp_int min0 = wi::to_vrp (vr0.min); + vrp_int max0 = wi::to_vrp (vr0.max); + vrp_int min1 = wi::to_vrp (vr1.min); + vrp_int max1 = wi::to_vrp (vr1.max); I think we should avoid putting to_vrp in tree.h if vrp_int is only local to this block. Instead you could have: typedef generic_wide_int > vrp_int_cst; ... vrp_int_cst min0 = vr0.min; vrp_int_cst max0 = vr0.max; vrp_int_cst min1 = vr1.min; vrp_int_cst max1 = vr1.max; i did this in a different way because i had trouble doing it as you suggested.the short answer is that all of the vrp_int code is now local to tree-vrp.c which i think was your primary goal Ah, so we later assign to these variables: /* Canonicalize the intervals. */ if (sign == UNSIGNED) { if (wi::ltu_p (size, min0 + max0)) { min0 -= size; max0 -= size; } if (wi::ltu_p (size, min1 + max1)) { min1 -= size; max1 -= size; } } OK, in that case I suppose a temporary is needed. But I'd prefer not to put local stuff in the wi:: namespace. You could just have: typedef generic_wide_int > vrp_int_cst; vrp_int min0 = vrp_int_cst (vr0.min); vrp_int max0 = vrp_int_cst (vr0.max); vrp_int min1 = vrp_int_cst (vr1.min); vrp_int max1 = vrp_int_cst (vr1.max); which removes the need for: +/* vrp_int is twice as wide as anything that the target supports so it + can support a full width multiply. No need to add any more padding + for an extra sign bit because that comes with the way that + WIDE_INT_MAX_ELTS is defined. */ +typedef FIXED_WIDE_INT (WIDE_INT_MAX_PRECISION * 2) vrp_int; +namespace wi +{ + generic_wide_int > to_vrp (const_tree); +} + +inline generic_wide_int > +wi::to_vrp (const_tree t) +{ + return t; +} + #define WIDE_INT_MAX_ELTS \ - ((4 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \ - / HOST_BITS_PER_WIDE_INT) + (((MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1)\ +/ HOST_BITS_PER_WIDE_INT) + 1) I think this should be: (MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT + 1) We only need an extra HWI if MAX_BITSIZE_MODE_ANY_INT is an exact multiple of HOST_BITS_PER_WIDE_INT. we will do this later when some other issues that Eric B raised are settled. I think you're talking about defining MAX_BITSIZE_MODE_ANY_INT as MAX_SIZE_MODE_ANY_INT * BITS_PER_UNIT, but that's orthogonal to the change above. IMO it doesn't make sense to both round up the division and also add 1 to the result. So I think we should make this change regardless of whatever follows. Looks good to me otherwise, thanks. Richard so this one works the way you want.While it is true the the problems are disjoint, the solution will likely evolving change the same lines of source in two different ways. ok to commit. kenny Index: gcc/tree-vrp.c === --- gcc/tree-vrp.c (revision 205726
Re: wide-int, rtl
On 11/27/2013 11:24 AM, Eric Botcazou wrote: Richi has asked the we break the wide-int patch so that the individual port and front end maintainers can review their parts without have to go through the entire patch.This patch covers the first half of the rtl code. --- a/gcc/cse.c +++ b/gcc/cse.c @@ -2336,15 +2336,23 @@ hash_rtx_cb (const_rtx x, enum machine_mode mode, + (unsigned int) INTVAL (x)); return hash; +case CONST_WIDE_INT: + { + int i; + for (i = 0; i < CONST_WIDE_INT_NUNITS (x); i++) + hash += CONST_WIDE_INT_ELT (x, i); + } + return hash; You can write "for (int i = 0; ..." now and remove the parentheses. done --- a/gcc/cselib.c +++ b/gcc/cselib.c @@ -1121,15 +1120,23 @@ cselib_hash_rtx (rtx x, int create, enum machine_mode memmode) hash += ((unsigned) CONST_INT << 7) + INTVAL (x); return hash ? hash : (unsigned int) CONST_INT; +case CONST_WIDE_INT: + { + int i; + for (i = 0; i < CONST_WIDE_INT_NUNITS (x); i++) + hash += CONST_WIDE_INT_ELT (x, i); + } + return hash; + Likewise. done --- a/gcc/expmed.c +++ b/gcc/expmed.c @@ -3298,23 +3268,13 @@ choose_multiplier (unsigned HOST_WIDE_INT d, int n, int precision, pow = n + lgup; pow2 = n + lgup - precision; - /* We could handle this with some effort, but this case is much - better handled directly with a scc insn, so rely on caller using - that. */ - gcc_assert (pow != HOST_BITS_PER_DOUBLE_INT); Why removing it? the code no longer has restrictions on the size/mode of the variables. --- a/gcc/expr.c +++ b/gcc/expr.c @@ -722,64 +722,33 @@ convert_modes (enum machine_mode mode, enum machine_mode oldmode, rtx x, int uns if (mode == oldmode) return x; - /* There is one case that we must handle specially: If we are converting - a CONST_INT into a mode whose size is twice HOST_BITS_PER_WIDE_INT and - we are to interpret the constant as unsigned, gen_lowpart will do - the wrong if the constant appears negative. What we want to do is - make the high-order word of the constant zero, not all ones. */ - - if (unsignedp && GET_MODE_CLASS (mode) == MODE_INT - && GET_MODE_BITSIZE (mode) == HOST_BITS_PER_DOUBLE_INT - && CONST_INT_P (x) && INTVAL (x) < 0) + if (CONST_SCALAR_INT_P (x) + && GET_MODE_CLASS (mode) == MODE_INT) On a single line. done { - double_int val = double_int::from_uhwi (INTVAL (x)); - - /* We need to zero extend VAL. */ - if (oldmode != VOIDmode) - val = val.zext (GET_MODE_BITSIZE (oldmode)); - - return immed_double_int_const (val, mode); + /* If the caller did not tell us the old mode, then there is +not much to do with respect to canonization. We have to assume +that all the bits are significant. */ + if (GET_MODE_CLASS (oldmode) != MODE_INT) + oldmode = MAX_MODE_INT; + wide_int w = wide_int::from (std::make_pair (x, oldmode), + GET_MODE_PRECISION (mode), + unsignedp ? UNSIGNED : SIGNED); + return immed_wide_int_const (w, mode); canonicalization? done @@ -5301,10 +5271,10 @@ store_expr (tree exp, rtx target, int call_param_p, bool nontemporal) &alt_rtl); } - /* If TEMP is a VOIDmode constant and the mode of the type of EXP is not - the same as that of TARGET, adjust the constant. This is needed, for - example, in case it is a CONST_DOUBLE and we want only a word-sized - value. */ + /* If TEMP is a VOIDmode constant and the mode of the type of EXP is + not the same as that of TARGET, adjust the constant. This is + needed, for example, in case it is a CONST_DOUBLE or + CONST_WIDE_INT and we want only a word-sized value. */ if (CONSTANT_P (temp) && GET_MODE (temp) == VOIDmode && TREE_CODE (exp) != ERROR_MARK && GET_MODE (target) != TYPE_MODE (TREE_TYPE (exp))) Why reformatting the whole comment? it is what emacs did. I have formatted it back. @@ -9481,11 +9459,19 @@ expand_expr_real_1 (tree exp, rtx target, enum machine_mode tmode, return decl_rtl; case INTEGER_CST: - temp = immed_double_const (TREE_INT_CST_LOW (exp), -TREE_INT_CST_HIGH (exp), mode); - - return temp; - + { + tree type = TREE_TYPE (exp); Redundant, see the beginning of the function. fixed + /* One could argue that GET_MODE_PRECISION (TYPE_MODE (type)) + should always be the same as TYPE_PRECISION (type). + However, it is not. Since we are converting from tree to + rtl, we have to expose this ugly truth here. */ + temp = immed_wide_int_const (wide_int::from + (exp, + GET_MODE_PRECISION (TYPE_MODE (type)), +
Re: [wide-int] small cleanup in wide-int.*
Richard asked to see the patch where i did the changes the way that he asked in his email. Doing it the way he wants potentially has advantages over the way that i did it, but his technique fails for non obvious reasons. The failure in building is: g++ -c -g -DIN_GCC -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wwrite-strings -Wcast-qual -Wno-format -Wmissing-format-attribute -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H -I. -I. -I../../gccBadMulVrp/gcc -I../../gccBadMulVrp/gcc/. -I../../gccBadMulVrp/gcc/../include -I../../gccBadMulVrp/gcc/../libcpp/include -I../../gccBadMulVrp/gcc/../libdecnumber -I../../gccBadMulVrp/gcc/../libdecnumber/bid -I../libdecnumber -I../../gccBadMulVrp/gcc/../libbacktrace -o tree-vrp.o -MT tree-vrp.o -MMD -MP -MF ./.deps/tree-vrp.TPo ../../gccBadMulVrp/gcc/tree-vrp.c In file included from ../../gccBadMulVrp/gcc/double-int.h:23:0, from ../../gccBadMulVrp/gcc/tree-core.h:28, from ../../gccBadMulVrp/gcc/tree.h:23, from ../../gccBadMulVrp/gcc/tree-vrp.c:26: ../../gccBadMulVrp/gcc/wide-int.h: In member function ‘generic_wide_int& generic_wide_int::operator=(const T&) [with T = generic_wide_int >, storage = wi::extended_tree<1152>]’: ../../gccBadMulVrp/gcc/wide-int.h:701:3: instantiated from ‘generic_wide_int& generic_wide_int::operator-=(const T&) [with T = generic_wide_int >, storage = wi::extended_tree<1152>, generic_wide_int = generic_wide_int >]’ ../../gccBadMulVrp/gcc/tree-vrp.c:2646:13: instantiated from here ../../gccBadMulVrp/gcc/wide-int.h:860:3: error: no matching function for call to ‘generic_wide_int >::operator=(const generic_wide_int >&)’ ../../gccBadMulVrp/gcc/wide-int.h:860:3: note: candidate is: ../../gccBadMulVrp/gcc/tree.h:4529:9: note: wi::extended_tree<1152>& wi::extended_tree<1152>::operator=(const wi::extended_tree<1152>&) ../../gccBadMulVrp/gcc/tree.h:4529:9: note: no known conversion for argument 1 from ‘const generic_wide_int >’ to ‘const wi::extended_tree<1152>&’ make[3]: *** [tree-vrp.o] Error 1 make[3]: Leaving directory `/home/zadeck/gcc/gbbBadMulVrp/gcc' make[2]: *** [all-stage1-gcc] Error 2 make[2]: Leaving directory `/home/zadeck/gcc/gbbBadMulVrp' make[1]: *** [stage1-bubble] Error 2 make[1]: Leaving directory `/home/zadeck/gcc/gbbBadMulVrp' make: *** [all] Error 2 heracles:~/gcc/gbbBadMulVrp(9) cd ../gccBadMulVrp/ On 12/06/2013 11:45 AM, Kenneth Zadeck wrote: On 12/03/2013 11:52 AM, Richard Sandiford wrote: Kenneth Zadeck writes: Index: tree-vrp.c === --- tree-vrp.c (revision 205597) +++ tree-vrp.c (working copy) @@ -2611,22 +2611,28 @@ extract_range_from_binary_expr_1 (value_ signop sign = TYPE_SIGN (expr_type); unsigned int prec = TYPE_PRECISION (expr_type); - unsigned int prec2 = (prec * 2) + (sign == UNSIGNED ? 2 : 0); if (range_int_cst_p (&vr0) && range_int_cst_p (&vr1) && TYPE_OVERFLOW_WRAPS (expr_type)) { - wide_int sizem1 = wi::mask (prec, false, prec2); - wide_int size = sizem1 + 1; + /* vrp_int is twice as wide as anything that the target + supports so it can support a full width multiply. No + need to add any more padding for an extra sign bit + because that comes with the way that WIDE_INT_MAX_ELTS is + defined. */ + typedef FIXED_WIDE_INT (WIDE_INT_MAX_PRECISION * 2) + vrp_int; + vrp_int sizem1 = wi::mask (prec, false); + vrp_int size = sizem1 + 1; /* Extend the values using the sign of the result to PREC2. From here on out, everthing is just signed math no matter what the input types were. */ - wide_int min0 = wide_int::from (vr0.min, prec2, sign); - wide_int max0 = wide_int::from (vr0.max, prec2, sign); - wide_int min1 = wide_int::from (vr1.min, prec2, sign); - wide_int max1 = wide_int::from (vr1.max, prec2, sign); + vrp_int min0 = wi::to_vrp (vr0.min); + vrp_int max0 = wi::to_vrp (vr0.max); + vrp_int min1 = wi::to_vrp (vr1.min); + vrp_int max1 = wi::to_vrp (vr1.max); I think we should avoid putting to_vrp in tree.h if vrp_int is only local to this block. Instead you could have: typedef generic_wide_int > vrp_int_cst; ... vrp_int_cst min0 = vr0.min; vrp_int_cst max0 = vr0.max; vrp_int_cst min1 = vr1.min; vrp_int_cst max1 = vr1.max; i did this in a different way because i had trouble doing it as you suggested. the short answer is that all of the vrp_int code is now local to tree-vrp.c which i think was your primary goal @@ -228,15 +228,16 @@ along with GCC; see the file COPYING3. #endif /* The MAX_BITSIZE_MODE_ANY_INT is automatically generated by a very - early examination of the target's mode file. Thus it is safe that - some small multiple of this number is easily larger than any number - that that target could compute. The place in the compiler that - currently needs the widest ints is the code that determines th
Re: [wide-int] small cleanup in wide-int.*
On 12/03/2013 11:52 AM, Richard Sandiford wrote: Kenneth Zadeck writes: Index: tree-vrp.c === --- tree-vrp.c (revision 205597) +++ tree-vrp.c (working copy) @@ -2611,22 +2611,28 @@ extract_range_from_binary_expr_1 (value_ signop sign = TYPE_SIGN (expr_type); unsigned int prec = TYPE_PRECISION (expr_type); - unsigned int prec2 = (prec * 2) + (sign == UNSIGNED ? 2 : 0); if (range_int_cst_p (&vr0) && range_int_cst_p (&vr1) && TYPE_OVERFLOW_WRAPS (expr_type)) { - wide_int sizem1 = wi::mask (prec, false, prec2); - wide_int size = sizem1 + 1; + /* vrp_int is twice as wide as anything that the target +supports so it can support a full width multiply. No +need to add any more padding for an extra sign bit +because that comes with the way that WIDE_INT_MAX_ELTS is +defined. */ + typedef FIXED_WIDE_INT (WIDE_INT_MAX_PRECISION * 2) + vrp_int; + vrp_int sizem1 = wi::mask (prec, false); + vrp_int size = sizem1 + 1; /* Extend the values using the sign of the result to PREC2. From here on out, everthing is just signed math no matter what the input types were. */ - wide_int min0 = wide_int::from (vr0.min, prec2, sign); - wide_int max0 = wide_int::from (vr0.max, prec2, sign); - wide_int min1 = wide_int::from (vr1.min, prec2, sign); - wide_int max1 = wide_int::from (vr1.max, prec2, sign); + vrp_int min0 = wi::to_vrp (vr0.min); + vrp_int max0 = wi::to_vrp (vr0.max); + vrp_int min1 = wi::to_vrp (vr1.min); + vrp_int max1 = wi::to_vrp (vr1.max); I think we should avoid putting to_vrp in tree.h if vrp_int is only local to this block. Instead you could have: typedef generic_wide_int > vrp_int_cst; ... vrp_int_cst min0 = vr0.min; vrp_int_cst max0 = vr0.max; vrp_int_cst min1 = vr1.min; vrp_int_cst max1 = vr1.max; i did this in a different way because i had trouble doing it as you suggested.the short answer is that all of the vrp_int code is now local to tree-vrp.c which i think was your primary goal @@ -228,15 +228,16 @@ along with GCC; see the file COPYING3. #endif /* The MAX_BITSIZE_MODE_ANY_INT is automatically generated by a very - early examination of the target's mode file. Thus it is safe that - some small multiple of this number is easily larger than any number - that that target could compute. The place in the compiler that - currently needs the widest ints is the code that determines the - range of a multiply. This code needs 2n + 2 bits. */ - + early examination of the target's mode file. The WIDE_INT_MAX_ELTS + can accomodate at least 1 more bit so that unsigned numbers of that + mode can be represented. This will accomodate every place in the + compiler except for a multiply routine in tree-vrp. That function + makes its own arrangements for larger wide-ints. */ I think we should drop the "This will accomodate..." bit, since it'll soon get out of date. Maybe something like: Note that it is still possible to create fixed_wide_ints that have precisions greater than MAX_BITSIZE_MODE_ANY_INT. This can be useful when representing a double-width multiplication result, for example. */ done #define WIDE_INT_MAX_ELTS \ - ((4 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \ - / HOST_BITS_PER_WIDE_INT) + (((MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1)\ +/ HOST_BITS_PER_WIDE_INT) + 1) I think this should be: (MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT + 1) We only need an extra HWI if MAX_BITSIZE_MODE_ANY_INT is an exact multiple of HOST_BITS_PER_WIDE_INT. we will do this later when some other issues that Eric B raised are settled. ok to commit to the branch? kenny Looks good to me otherwise FWIW. You probably already realise this, but for avoidance of doubt, Richard was also asking that we reduce MAX_BITSIZE_MODE_ANY_INT to 128 on x86_64, since that's the largest scalar_mode_supported_p mode. Thanks, Richard Index: gcc/tree-vrp.c === --- gcc/tree-vrp.c (revision 205726) +++ gcc/tree-vrp.c (working copy) @@ -2213,6 +2213,22 @@ extract_range_from_multiplicative_op_1 ( set_value_range (vr, type, min, max, NULL); } +/* vrp_int is twice as wide as anything that the target supports so it + can support a full width multiply. No need to add any more padding + for an extra sign bit because that comes with the way that + WIDE_INT_MAX_ELTS is defined. */ +typedef FIXED_WIDE_INT (WIDE_INT_MAX_PRECISION * 2) vrp_int; +namespace wi +{ + generic_wide_int > to_vr
Re: [patch] combine ICE fix
On 12/03/2013 02:38 PM, Jeff Law wrote: On 12/03/13 12:25, Kenneth Zadeck wrote: On 12/03/2013 01:52 PM, Mike Stump wrote: On Dec 2, 2013, at 10:26 PM, Jeff Law wrote: On 11/27/13 17:13, Cesar Philippidis wrote: I looked into adding support for incremental DF scanning from df*.[ch] in combine but there are a couple of problems. First of all, combine does its own DF analysis. It does so because its usage falls under this category (df-core.c): c) If the pass modifies insns several times, this incremental updating may be expensive. Furthermore, combine's DF relies on the DF scanning to be deferred, so the DF_REF_DEF_COUNT values would be off. Eg, calls to SET_INSN_DELETED take place before it updates the notes for those insns. Also, combine has a tendency to undo its changes occasionally. I think at this stage of the release cycle, converting combine to incremental DF is probably a no-go. However, we should keep it in mind for the future -- while hairy I'd really like to see that happen in the long term. I think Kenny has some thoughts in this area. I'll cc him to ensure he sees it. it is the tendency to undo things (i would use the word "frequently" rather than) occasionally that kept me from doing this years ago. Shove a bunch of things together, simplify, then try to recognize the result. If that fails, undo everything. In theory, this could be replaced by making a copy of the original, doing the combination/simplification, then recognition. If successful, then update DF and remove the original I3, if not successful, drop the copy. That avoids the undo nonsense. jeff that could certainly work.
Re: [wide-int] Add fast path for hosts with HWI widening multiplication
On 12/04/2013 07:56 AM, Richard Sandiford wrote: Richard Sandiford writes: This patch handles multiplications using a single HWIxHWI->2HWI multiplication on hosts that have one. This removes all uses of the slow (half-HWI) path for insn-recog.ii. The slow path is still used 58 times for cp/parser.ii and 168 times for fold-const.ii, but at that kind of level it shouldn't matter much. I followed Joseph's suggestion and reused longlong.h. I copied it from libgcc rather than glibc since it seemed better for GCC to have a single version across both gcc/ and libgcc/. I can put it in include/ if that seems better. I've committed the patch to move longlong.h to trunk and merged back to the branch, so all that's left is the wide-int.cc patch. OK to install? Thanks, Richard Index: gcc/wide-int.cc === --- gcc/wide-int.cc 2013-12-03 23:59:08.133658567 + +++ gcc/wide-int.cc 2013-12-04 12:55:28.466895358 + @@ -27,6 +27,16 @@ along with GCC; see the file COPYING3. #include "tree.h" #include "dumpfile.h" +#if GCC_VERSION >= 3000 +#define W_TYPE_SIZE HOST_BITS_PER_WIDE_INT +typedef unsigned HOST_HALF_WIDE_INT UHWtype; +typedef unsigned HOST_WIDE_INT UWtype; +typedef unsigned int UQItype __attribute__ ((mode (QI))); +typedef unsigned int USItype __attribute__ ((mode (SI))); +typedef unsigned int UDItype __attribute__ ((mode (DI))); +#include "longlong.h" +#endif + /* This is the maximal size of the buffer needed for dump. */ const unsigned int MAX_SIZE = (4 * (MAX_BITSIZE_MODE_ANY_INT / 4 + (MAX_BITSIZE_MODE_ANY_INT @@ -1255,8 +1265,8 @@ wi_pack (unsigned HOST_WIDE_INT *result, record in *OVERFLOW whether the result overflowed. SGN controls the signedness and is used to check overflow or if HIGH is set. */ unsigned int -wi::mul_internal (HOST_WIDE_INT *val, const HOST_WIDE_INT *op1, - unsigned int op1len, const HOST_WIDE_INT *op2, +wi::mul_internal (HOST_WIDE_INT *val, const HOST_WIDE_INT *op1val, + unsigned int op1len, const HOST_WIDE_INT *op2val, unsigned int op2len, unsigned int prec, signop sgn, bool *overflow, bool high) { @@ -1285,24 +1295,53 @@ wi::mul_internal (HOST_WIDE_INT *val, co if (needs_overflow) *overflow = false; + wide_int_ref op1 = wi::storage_ref (op1val, op1len, prec); + wide_int_ref op2 = wi::storage_ref (op2val, op2len, prec); + /* This is a surprisingly common case, so do it first. */ - if ((op1len == 1 && op1[0] == 0) || (op2len == 1 && op2[0] == 0)) + if (op1 == 0 || op2 == 0) { val[0] = 0; return 1; } +#ifdef umul_ppmm + if (sgn == UNSIGNED) +{ + /* If the inputs are single HWIs and the output has room for at +least two HWIs, we can use umul_ppmm directly. */ + if (prec >= HOST_BITS_PER_WIDE_INT * 2 + && wi::fits_uhwi_p (op1) + && wi::fits_uhwi_p (op2)) + { + umul_ppmm (val[1], val[0], op1.ulow (), op2.ulow ()); + return 1 + (val[1] != 0 || val[0] < 0); + } + /* Likewise if the output is a full single HWI, except that the +upper HWI of the result is only used for determining overflow. +(We handle this case inline when overflow isn't needed.) */ + else if (prec == HOST_BITS_PER_WIDE_INT) + { + unsigned HOST_WIDE_INT upper; + umul_ppmm (upper, val[0], op1.ulow (), op2.ulow ()); + if (needs_overflow) + *overflow = (upper != 0); + return 1; + } +} +#endif + /* Handle multiplications by 1. */ - if (op1len == 1 && op1[0] == 1) + if (op1 == 1) { for (i = 0; i < op2len; i++) - val[i] = op2[i]; + val[i] = op2val[i]; return op2len; } - if (op2len == 1 && op2[0] == 1) + if (op2 == 1) { for (i = 0; i < op1len; i++) - val[i] = op1[i]; + val[i] = op1val[i]; return op1len; } @@ -1316,13 +1355,13 @@ wi::mul_internal (HOST_WIDE_INT *val, co if (sgn == SIGNED) { - o0 = sext_hwi (op1[0], prec); - o1 = sext_hwi (op2[0], prec); + o0 = op1.to_shwi (); + o1 = op2.to_shwi (); } else { - o0 = zext_hwi (op1[0], prec); - o1 = zext_hwi (op2[0], prec); + o0 = op1.to_uhwi (); + o1 = op2.to_uhwi (); } r = o0 * o1; @@ -1344,9 +1383,9 @@ wi::mul_internal (HOST_WIDE_INT *val, co } /* We do unsigned mul and then correct it. */ - wi_unpack (u, (const unsigned HOST_WIDE_INT*)op1, op1len, + wi_unpack (u, (const unsigned HOST_WIDE_INT *) op1val, op1len, half_blocks_needed, prec, SIGNED); - wi_unpack (v, (const unsigned HOST_WIDE_INT*)op2, op2len, + wi_unpack (v, (const unsigned HOST_WIDE_INT *) op2val, op2len, half_blocks_needed, prec, SIGNED);
Re: [patch] combine ICE fix
On 12/03/2013 01:52 PM, Mike Stump wrote: On Dec 2, 2013, at 10:26 PM, Jeff Law wrote: On 11/27/13 17:13, Cesar Philippidis wrote: I looked into adding support for incremental DF scanning from df*.[ch] in combine but there are a couple of problems. First of all, combine does its own DF analysis. It does so because its usage falls under this category (df-core.c): c) If the pass modifies insns several times, this incremental updating may be expensive. Furthermore, combine's DF relies on the DF scanning to be deferred, so the DF_REF_DEF_COUNT values would be off. Eg, calls to SET_INSN_DELETED take place before it updates the notes for those insns. Also, combine has a tendency to undo its changes occasionally. I think at this stage of the release cycle, converting combine to incremental DF is probably a no-go. However, we should keep it in mind for the future -- while hairy I'd really like to see that happen in the long term. I think Kenny has some thoughts in this area. I'll cc him to ensure he sees it. it is the tendency to undo things (i would use the word "frequently" rather than) occasionally that kept me from doing this years ago. kenny
Re: [wide-int] small cleanup in wide-int.*
On 11/29/2013 05:24 AM, Richard Biener wrote: On Thu, Nov 28, 2013 at 6:11 PM, Kenneth Zadeck wrote: This patch does three things in wide-int: 1) it cleans up some comments. 2) removes a small amount of trash. 3) it changes the max size of the wide int from being 4x of MAX_BITSIZE_MODE_ANY_INT to 2x +1. This should improve large muls and divs as well as perhaps help with some cache behavior. @@ -235,8 +233,8 @@ along with GCC; see the file COPYING3. range of a multiply. This code needs 2n + 2 bits. */ #define WIDE_INT_MAX_ELTS \ - ((4 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \ - / HOST_BITS_PER_WIDE_INT) + (((2 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \ +/ HOST_BITS_PER_WIDE_INT) + 1) I always wondered why VRP (if that is the only reason we do 2*n+1) cannot simply use FIXED_WIDE_INT(MAX_BITSIZE_MODE_ANY_INT*2 + 1)? Other widest_int users should not suffer IMHO. widest_int should strictly cover all modes that the target can do any arithmetic on (thus not XImode or OImode on x86_64). Richard. ok to commit enclosed is the code to change the large multiplies in tree-vrp as well as sets the size of the WIDE_INT_MAX_ELTS as we have discussed. Bootstrapped and tested on x86-64. ok to commit? kenny Index: tree-vrp.c === --- tree-vrp.c (revision 205597) +++ tree-vrp.c (working copy) @@ -2611,22 +2611,28 @@ extract_range_from_binary_expr_1 (value_ signop sign = TYPE_SIGN (expr_type); unsigned int prec = TYPE_PRECISION (expr_type); - unsigned int prec2 = (prec * 2) + (sign == UNSIGNED ? 2 : 0); if (range_int_cst_p (&vr0) && range_int_cst_p (&vr1) && TYPE_OVERFLOW_WRAPS (expr_type)) { - wide_int sizem1 = wi::mask (prec, false, prec2); - wide_int size = sizem1 + 1; + /* vrp_int is twice as wide as anything that the target + supports so it can support a full width multiply. No + need to add any more padding for an extra sign bit + because that comes with the way that WIDE_INT_MAX_ELTS is + defined. */ + typedef FIXED_WIDE_INT (WIDE_INT_MAX_PRECISION * 2) + vrp_int; + vrp_int sizem1 = wi::mask (prec, false); + vrp_int size = sizem1 + 1; /* Extend the values using the sign of the result to PREC2. From here on out, everthing is just signed math no matter what the input types were. */ - wide_int min0 = wide_int::from (vr0.min, prec2, sign); - wide_int max0 = wide_int::from (vr0.max, prec2, sign); - wide_int min1 = wide_int::from (vr1.min, prec2, sign); - wide_int max1 = wide_int::from (vr1.max, prec2, sign); + vrp_int min0 = wi::to_vrp (vr0.min); + vrp_int max0 = wi::to_vrp (vr0.max); + vrp_int min1 = wi::to_vrp (vr1.min); + vrp_int max1 = wi::to_vrp (vr1.max); /* Canonicalize the intervals. */ if (sign == UNSIGNED) @@ -2644,17 +2650,17 @@ extract_range_from_binary_expr_1 (value_ } } - wide_int prod0 = min0 * min1; - wide_int prod1 = min0 * max1; - wide_int prod2 = max0 * min1; - wide_int prod3 = max0 * max1; + vrp_int prod0 = min0 * min1; + vrp_int prod1 = min0 * max1; + vrp_int prod2 = max0 * min1; + vrp_int prod3 = max0 * max1; /* Sort the 4 products so that min is in prod0 and max is in prod3. */ /* min0min1 > max0max1 */ if (wi::gts_p (prod0, prod3)) { - wide_int tmp = prod3; + vrp_int tmp = prod3; prod3 = prod0; prod0 = tmp; } @@ -2662,21 +2668,21 @@ extract_range_from_binary_expr_1 (value_ /* min0max1 > max0min1 */ if (wi::gts_p (prod1, prod2)) { - wide_int tmp = prod2; + vrp_int tmp = prod2; prod2 = prod1; prod1 = tmp; } if (wi::gts_p (prod0, prod1)) { - wide_int tmp = prod1; + vrp_int tmp = prod1; prod1 = prod0; prod0 = tmp; } if (wi::gts_p (prod2, prod3)) { - wide_int tmp = prod3; + vrp_int tmp = prod3; prod3 = prod2; prod2 = tmp; } Index: tree.h === --- tree.h (revision 205597) +++ tree.h (working copy) @@ -4565,10 +4565,12 @@ namespace wi static const unsigned int precision = N; }; - generic_wide_int > + generic_wide_int > to_widest (const_tree); generic_wide_int > to_offset (const_tree); + + generic_wide_int > to_vrp (const_tree); } inline unsigned int @@ -4586,7 +4588,7 @@ wi::int_traits ::decompose ( precision); } -inline generic_wide_int > +inline generic_wide_int > wi::to_widest (const_tree t) { return t; @@ -4597,6 +4599,12 @@ wi::to_offset (const_tree t) { return t; } + +inline generic_wide_int > +wi::to_vrp (const_tree t) +{ + return t; +} template inline wi::extended_tree ::extended_tree (const_tree t) Index: wide-int.h ===
Re: [wide-int] Drop some lingering uses of precision 0
if i did not already say so, this is fine. kenny On 12/02/2013 03:20 PM, Richard Sandiford wrote: I noticed that there were still a couple of tests for zero precision. This patch replaces them with asserts when handling separately-supplied precisions and simply drops them when handling existing wide_ints. (The idea is that most code would break for zero precision wide_ints and only asserting in some use sites would be inconsistent.) Also, to_shwi is called either with a nonzero precision argument or with no argument. I think it'd be clearer to split the two cases into separate (overloaded) functions. It's also more efficient, since the compiler doesn't know that a variable-precision argument must be nonzero. Tested on x86_64-linux-gnu. OK to install? Thanks, Richard Index: gcc/wide-int.cc === --- gcc/wide-int.cc 2013-12-02 20:03:50.112581766 + +++ gcc/wide-int.cc 2013-12-02 20:12:22.178998274 + @@ -275,9 +275,8 @@ wi::from_mpz (const_tree type, mpz_t x, wide_int wi::max_value (unsigned int precision, signop sgn) { - if (precision == 0) -return shwi (0, precision); - else if (sgn == UNSIGNED) + gcc_checking_assert (precision != 0); + if (sgn == UNSIGNED) /* The unsigned max is just all ones. */ return shwi (-1, precision); else @@ -290,7 +289,8 @@ wi::max_value (unsigned int precision, s wide_int wi::min_value (unsigned int precision, signop sgn) { - if (precision == 0 || sgn == UNSIGNED) + gcc_checking_assert (precision != 0); + if (sgn == UNSIGNED) return uhwi (0, precision); else /* The signed min is all zeros except the top bit. This must be @@ -1487,9 +1487,6 @@ wi::popcount (const wide_int_ref &x) unsigned int i; int count; - if (x.precision == 0) -return 0; - /* The high order block is special if it is the last block and the precision is not an even multiple of HOST_BITS_PER_WIDE_INT. We have to clear out any ones above the precision before doing @@ -2082,10 +2079,6 @@ wi::ctz (const wide_int_ref &x) int wi::exact_log2 (const wide_int_ref &x) { - /* 0-precision values can only hold 0. */ - if (x.precision == 0) -return -1; - /* Reject cases where there are implicit -1 blocks above HIGH. */ if (x.len * HOST_BITS_PER_WIDE_INT < x.precision && x.sign_mask () < 0) return -1; Index: gcc/wide-int.h === --- gcc/wide-int.h 2013-12-02 19:52:05.424989079 + +++ gcc/wide-int.h 2013-12-02 20:12:22.179998282 + @@ -644,8 +644,10 @@ class GTY(()) generic_wide_int : public generic_wide_int (const T &, unsigned int); /* Conversions. */ - HOST_WIDE_INT to_shwi (unsigned int = 0) const; - unsigned HOST_WIDE_INT to_uhwi (unsigned int = 0) const; + HOST_WIDE_INT to_shwi (unsigned int) const; + HOST_WIDE_INT to_shwi () const; + unsigned HOST_WIDE_INT to_uhwi (unsigned int) const; + unsigned HOST_WIDE_INT to_uhwi () const; HOST_WIDE_INT to_short_addr () const; /* Public accessors for the interior of a wide int. */ @@ -735,18 +737,23 @@ inline generic_wide_int ::gener inline HOST_WIDE_INT generic_wide_int ::to_shwi (unsigned int precision) const { - if (precision == 0) -{ - if (is_sign_extended) - return this->get_val ()[0]; - precision = this->get_precision (); -} if (precision < HOST_BITS_PER_WIDE_INT) return sext_hwi (this->get_val ()[0], precision); else return this->get_val ()[0]; } +/* Return THIS as a signed HOST_WIDE_INT, in its natural precision. */ +template +inline HOST_WIDE_INT +generic_wide_int ::to_shwi () const +{ + if (is_sign_extended) +return this->get_val ()[0]; + else +return to_shwi (this->get_precision ()); +} + /* Return THIS as an unsigned HOST_WIDE_INT, zero-extending from PRECISION. If THIS does not fit in PRECISION, the information is lost. */ @@ -754,14 +761,20 @@ generic_wide_int ::to_shwi (uns inline unsigned HOST_WIDE_INT generic_wide_int ::to_uhwi (unsigned int precision) const { - if (precision == 0) -precision = this->get_precision (); if (precision < HOST_BITS_PER_WIDE_INT) return zext_hwi (this->get_val ()[0], precision); else return this->get_val ()[0]; } +/* Return THIS as an signed HOST_WIDE_INT, in its natural precision. */ +template +inline unsigned HOST_WIDE_INT +generic_wide_int ::to_uhwi () const +{ + return to_uhwi (this->get_precision ()); +} + /* TODO: The compiler is half converted from using HOST_WIDE_INT to represent addresses to using offset_int to represent addresses. We use to_short_addr at the interface from new code to old, @@ -2289,9 +2302,7 @@ wi::add (const T1 &x, const T2 &y, signo unsigned HOST_WIDE_INT xl = xi.ulow (); unsigned HOST_WIDE_INT yl = yi.ulow (); unsigned HOST_WIDE_INT resultl
Re: [wide-int] i am concerned about the typedef for widest-int.
On 12/02/2013 03:34 PM, Richard Sandiford wrote: Kenneth Zadeck writes: see wide-int.h around line 290 the MAX_BITSIZE_MODE_ANY_INT is the largest mode on the machine. however if the value coming in is an unsigned number of the type the represents that mode, don't we loose a bit? That was the +1 mentioned here: http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03745.html I.e. it should be "widest supported arithmetic input + 1". Thanks, Richard do we want 129 or do we want to round that up to the next hwi?
[wide-int] i am concerned about the typedef for widest-int.
see wide-int.h around line 290 the MAX_BITSIZE_MODE_ANY_INT is the largest mode on the machine. however if the value coming in is an unsigned number of the type the represents that mode, don't we loose a bit? kenny
Re: [wide-int] small cleanup in wide-int.*
committed as revision 205599 to wide-int branch. kenny On 12/02/2013 05:50 AM, Richard Biener wrote: On Sat, Nov 30, 2013 at 1:55 AM, Kenneth Zadeck wrote: Richi, this is the first of either 2 or 3 patches to fix this.There are two places that need be fixed for us to do 1X + 1 and this patch fixes the first one. There was an unnecessary call to mul_full and this was the only call to mul_full. So this patch removes the call and also the function itself. The other place is the tree-vpn that is discussed here and will be dealt with in the other patches. tested on x86-64. Ok to commit? Ok. Thanks, Richard. Kenny On 11/29/2013 05:24 AM, Richard Biener wrote: On Thu, Nov 28, 2013 at 6:11 PM, Kenneth Zadeck wrote: This patch does three things in wide-int: 1) it cleans up some comments. 2) removes a small amount of trash. 3) it changes the max size of the wide int from being 4x of MAX_BITSIZE_MODE_ANY_INT to 2x +1. This should improve large muls and divs as well as perhaps help with some cache behavior. @@ -235,8 +233,8 @@ along with GCC; see the file COPYING3. range of a multiply. This code needs 2n + 2 bits. */ #define WIDE_INT_MAX_ELTS \ - ((4 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \ - / HOST_BITS_PER_WIDE_INT) + (((2 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \ +/ HOST_BITS_PER_WIDE_INT) + 1) I always wondered why VRP (if that is the only reason we do 2*n+1) cannot simply use FIXED_WIDE_INT(MAX_BITSIZE_MODE_ANY_INT*2 + 1)? Other widest_int users should not suffer IMHO. widest_int should strictly cover all modes that the target can do any arithmetic on (thus not XImode or OImode on x86_64). Richard. ok to commit Index: gcc/wide-int.cc === --- gcc/wide-int.cc (revision 205597) +++ gcc/wide-int.cc (working copy) @@ -1247,22 +1247,18 @@ wi_pack (unsigned HOST_WIDE_INT *result, } /* Multiply Op1 by Op2. If HIGH is set, only the upper half of the - result is returned. If FULL is set, the entire result is returned - in a mode that is twice the width of the inputs. However, that - mode needs to exist if the value is to be usable. Clients that use - FULL need to check for this. - - If HIGH or FULL are not set, throw away the upper half after the - check is made to see if it overflows. Unfortunately there is no - better way to check for overflow than to do this. If OVERFLOW is - nonnull, record in *OVERFLOW whether the result overflowed. SGN - controls the signedness and is used to check overflow or if HIGH or - FULL is set. */ + result is returned. + + If HIGH is not set, throw away the upper half after the check is + made to see if it overflows. Unfortunately there is no better way + to check for overflow than to do this. If OVERFLOW is nonnull, + record in *OVERFLOW whether the result overflowed. SGN controls + the signedness and is used to check overflow or if HIGH is set. */ unsigned int wi::mul_internal (HOST_WIDE_INT *val, const HOST_WIDE_INT *op1, unsigned int op1len, const HOST_WIDE_INT *op2, unsigned int op2len, unsigned int prec, signop sgn, - bool *overflow, bool high, bool full) + bool *overflow, bool high) { unsigned HOST_WIDE_INT o0, o1, k, t; unsigned int i; @@ -1313,7 +1309,7 @@ wi::mul_internal (HOST_WIDE_INT *val, co /* If we need to check for overflow, we can only do half wide multiplies quickly because we need to look at the top bits to check for the overflow. */ - if ((high || full || needs_overflow) + if ((high || needs_overflow) && (prec <= HOST_BITS_PER_HALF_WIDE_INT)) { unsigned HOST_WIDE_INT r; @@ -1372,7 +1368,7 @@ wi::mul_internal (HOST_WIDE_INT *val, co /* We did unsigned math above. For signed we must adjust the product (assuming we need to see that). */ - if (sgn == SIGNED && (full || high || needs_overflow)) + if (sgn == SIGNED && (high || needs_overflow)) { unsigned HOST_WIDE_INT b; if (op1[op1len-1] < 0) @@ -1420,13 +1416,7 @@ wi::mul_internal (HOST_WIDE_INT *val, co *overflow = true; } - if (full) -{ - /* compute [2prec] <- [prec] * [prec] */ - wi_pack ((unsigned HOST_WIDE_INT *) val, r, 2 * half_blocks_needed); - return canonize (val, blocks_needed * 2, prec * 2); -} - else if (high) + if (high) { /* compute [prec] <- ([prec] * [prec]) >> [prec] */ wi_pack ((unsigned HOST_WIDE_INT *) val, Index: gcc/fold-const.c === --- gcc/fold-const.c (revision 205597) +++ gcc/fold-const.c (working copy) @@ -5962,11 +5962,12 @@ extract_muldiv_1 (tree t, tree c, enum t assuming no overflow. */ if (tcode == code) { - bool overflow_p; + bool overflow_p = false; + bool overflow_mul_p; signop sign = TYPE_SIGN (ctype); -
Re: [wide-int] small cleanup in wide-int.*
Richi, this is the first of either 2 or 3 patches to fix this.There are two places that need be fixed for us to do 1X + 1 and this patch fixes the first one. There was an unnecessary call to mul_full and this was the only call to mul_full. So this patch removes the call and also the function itself. The other place is the tree-vpn that is discussed here and will be dealt with in the other patches. tested on x86-64. Ok to commit? Kenny On 11/29/2013 05:24 AM, Richard Biener wrote: On Thu, Nov 28, 2013 at 6:11 PM, Kenneth Zadeck wrote: This patch does three things in wide-int: 1) it cleans up some comments. 2) removes a small amount of trash. 3) it changes the max size of the wide int from being 4x of MAX_BITSIZE_MODE_ANY_INT to 2x +1. This should improve large muls and divs as well as perhaps help with some cache behavior. @@ -235,8 +233,8 @@ along with GCC; see the file COPYING3. range of a multiply. This code needs 2n + 2 bits. */ #define WIDE_INT_MAX_ELTS \ - ((4 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \ - / HOST_BITS_PER_WIDE_INT) + (((2 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \ +/ HOST_BITS_PER_WIDE_INT) + 1) I always wondered why VRP (if that is the only reason we do 2*n+1) cannot simply use FIXED_WIDE_INT(MAX_BITSIZE_MODE_ANY_INT*2 + 1)? Other widest_int users should not suffer IMHO. widest_int should strictly cover all modes that the target can do any arithmetic on (thus not XImode or OImode on x86_64). Richard. ok to commit Index: fold-const.c === --- fold-const.c (revision 205496) +++ fold-const.c (working copy) @@ -5962,11 +5962,12 @@ extract_muldiv_1 (tree t, tree c, enum t assuming no overflow. */ if (tcode == code) { - bool overflow_p; + bool overflow_p = false; + bool overflow_mul_p; signop sign = TYPE_SIGN (ctype); - wide_int mul = wi::mul_full (op1, c, sign); + wide_int mul = wi::mul (op1, c, sign, &overflow_mul_p); overflow_p = TREE_OVERFLOW (c) | TREE_OVERFLOW (op1); - if (!wi::fits_to_tree_p (mul, ctype) + if (overflow_mul_p && ((sign == UNSIGNED && tcode != MULT_EXPR) || sign == SIGNED)) overflow_p = true; if (!overflow_p) Index: wide-int.cc === --- wide-int.cc (revision 205508) +++ wide-int.cc (working copy) @@ -1247,22 +1247,18 @@ wi_pack (unsigned HOST_WIDE_INT *result, } /* Multiply Op1 by Op2. If HIGH is set, only the upper half of the - result is returned. If FULL is set, the entire result is returned - in a mode that is twice the width of the inputs. However, that - mode needs to exist if the value is to be usable. Clients that use - FULL need to check for this. - - If HIGH or FULL are not set, throw away the upper half after the - check is made to see if it overflows. Unfortunately there is no - better way to check for overflow than to do this. If OVERFLOW is - nonnull, record in *OVERFLOW whether the result overflowed. SGN - controls the signedness and is used to check overflow or if HIGH or - FULL is set. */ + result is returned. + + If HIGH is not set, throw away the upper half after the check is + made to see if it overflows. Unfortunately there is no better way + to check for overflow than to do this. If OVERFLOW is nonnull, + record in *OVERFLOW whether the result overflowed. SGN controls + the signedness and is used to check overflow or if HIGH is set. */ unsigned int wi::mul_internal (HOST_WIDE_INT *val, const HOST_WIDE_INT *op1, unsigned int op1len, const HOST_WIDE_INT *op2, unsigned int op2len, unsigned int prec, signop sgn, - bool *overflow, bool high, bool full) + bool *overflow, bool high) { unsigned HOST_WIDE_INT o0, o1, k, t; unsigned int i; @@ -1292,7 +1288,7 @@ wi::mul_internal (HOST_WIDE_INT *val, co /* If we need to check for overflow, we can only do half wide multiplies quickly because we need to look at the top bits to check for the overflow. */ - if ((high || full || needs_overflow) + if ((high || needs_overflow) && (prec <= HOST_BITS_PER_HALF_WIDE_INT)) { unsigned HOST_WIDE_INT r; @@ -1351,7 +1347,7 @@ wi::mul_internal (HOST_WIDE_INT *val, co /* We did unsigned math above. For signed we must adjust the product (assuming we need to see that). */ - if (sgn == SIGNED && (full || high || needs_overflow)) + if (sgn == SIGNED && (high || needs_overflow)) { unsigned HOST_WIDE_INT b; if (op1[op1len-1] < 0) @@ -1399,13 +1395,7 @@ wi::mul_internal (HOST_WIDE_INT *val, co *overflow = true; } - if (full) -{ - /* compute [2prec] <- [prec] * [prec] */ - wi_pack ((unsigned HOST_WIDE_INT *) val, r, 2 * half_blocks_needed); - return canonize (val, blocks_needed
Re: [wide-int] small cleanup in wide-int.*
could be. On 11/29/2013 06:57 AM, Richard Sandiford wrote: Looks good to me FWIW, except: Kenneth Zadeck writes: @@ -112,11 +114,11 @@ along with GCC; see the file COPYING3. two, the default is the prefered representation. All three flavors of wide_int are represented as a vector of - HOST_WIDE_INTs. The default and widest_int vectors contain enough elements - to hold a value of MAX_BITSIZE_MODE_ANY_INT bits. offset_int contains only - enough elements to hold ADDR_MAX_PRECISION bits. The values are stored - in the vector with the least significant HOST_BITS_PER_WIDE_INT bits - in element 0. + HOST_WIDE_INTs. The default and widest_int vectors contain enough + elements to hold a value of MAX_BITSIZE_MODE_ANY_INT bits. + offset_int contains only enough elements to hold ADDR_MAX_PRECISION + bits. The values are stored in the vector with the least + significant HOST_BITS_PER_WIDE_INT bits in element 0. The default wide_int contains three fields: the vector (VAL), the precision and a length (LEN). The length is the number of HWIs is this just changing the line breaks? Thanks, Richard
Re: [wide-int] small cleanup in wide-int.*
On Nov 29, 2013, at 4:24 AM, Richard Biener wrote: > On Thu, Nov 28, 2013 at 6:11 PM, Kenneth Zadeck > wrote: >> This patch does three things in wide-int: >> >> 1) it cleans up some comments. >> 2) removes a small amount of trash. >> 3) it changes the max size of the wide int from being 4x of >> MAX_BITSIZE_MODE_ANY_INT to 2x +1. This should improve large muls and divs >> as well as perhaps help with some cache behavior. > > @@ -235,8 +233,8 @@ along with GCC; see the file COPYING3. >range of a multiply. This code needs 2n + 2 bits. */ > > #define WIDE_INT_MAX_ELTS \ > - ((4 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \ > - / HOST_BITS_PER_WIDE_INT) > + (((2 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \ > +/ HOST_BITS_PER_WIDE_INT) + 1) > > I always wondered why VRP (if that is the only reason we do 2*n+1) > cannot simply use FIXED_WIDE_INT(MAX_BITSIZE_MODE_ANY_INT*2 + 1)? > Other widest_int users should not suffer IMHO. widest_int should > strictly cover all modes that the target can do any arithmetic on > (thus not XImode or OImode on x86_64). > > Richard. We did not used to be able to do that. This code is much older. So how do you detect if someone has a mode and does not do math it? I had thought that the only reason that people defined these mode and did not use them was because it took a long time to properly understand how broken gcc really is. Anyway there is I call to mul-full that would need reworking. > >> ok to commit
Re: [wide-int] Handle more cmps and cmpu cases inline
like the add/sub patch, enhance the comment so that it says that it is designed to hit the widestint and offset int common cases. kenny On 11/28/2013 12:34 PM, Richard Sandiford wrote: As Richi asked, this patch makes cmps use the same shortcuts as lts_p. It also makes cmpu use the shortcut that I justed added to ltu_p. On that same fold-const.ii testcase, this reduces the number of cmps_large calls from 66924 to 916. It reduces the number of cmpu_large calls from 3462 to 4. Tested on x86_64-linux-gnu. OK to install? Thanks, Richard Index: gcc/wide-int.h === --- gcc/wide-int.h 2001-01-01 00:00:00.0 + +++ gcc/wide-int.h 2013-11-28 16:08:22.527681077 + @@ -1858,17 +1858,31 @@ wi::cmps (const T1 &x, const T2 &y) unsigned int precision = get_binary_precision (x, y); WIDE_INT_REF_FOR (T1) xi (x, precision); WIDE_INT_REF_FOR (T2) yi (y, precision); - if (precision <= HOST_BITS_PER_WIDE_INT) + if (wi::fits_shwi_p (yi)) { - HOST_WIDE_INT xl = xi.to_shwi (); - HOST_WIDE_INT yl = yi.to_shwi (); - if (xl < yl) + /* Special case for comparisons with 0. */ + if (STATIC_CONSTANT_P (yi.val[0] == 0)) + return neg_p (xi) ? -1 : !(xi.len == 1 && xi.val[0] == 0); + /* If x fits into a signed HWI, we can compare directly. */ + if (wi::fits_shwi_p (xi)) + { + HOST_WIDE_INT xl = xi.to_shwi (); + HOST_WIDE_INT yl = yi.to_shwi (); + return xl < yl ? -1 : xl > yl; + } + /* If x doesn't fit and is negative, then it must be more +negative than any signed HWI, and hence smaller than y. */ + if (neg_p (xi)) return -1; - else if (xl > yl) - return 1; - else - return 0; + /* If x is positive, then it must be larger than any signed HWI, +and hence greater than y. */ + return 1; } + /* Optimize the opposite case, if it can be detected at compile time. */ + if (STATIC_CONSTANT_P (xi.len == 1)) +/* If YI is negative it is lower than the least HWI. + If YI is positive it is greater than the greatest HWI. */ +return neg_p (yi) ? 1 : -1; return cmps_large (xi.val, xi.len, precision, yi.val, yi.len); } @@ -1881,16 +1895,35 @@ wi::cmpu (const T1 &x, const T2 &y) unsigned int precision = get_binary_precision (x, y); WIDE_INT_REF_FOR (T1) xi (x, precision); WIDE_INT_REF_FOR (T2) yi (y, precision); - if (precision <= HOST_BITS_PER_WIDE_INT) + /* Optimize comparisons with constants. */ + if (STATIC_CONSTANT_P (yi.len == 1 && yi.val[0] >= 0)) { + /* If XI doesn't fit in a HWI then it must be larger than YI. */ + if (xi.len != 1) + return 1; + /* Otherwise compare directly. */ unsigned HOST_WIDE_INT xl = xi.to_uhwi (); - unsigned HOST_WIDE_INT yl = yi.to_uhwi (); - if (xl < yl) + unsigned HOST_WIDE_INT yl = yi.val[0]; + return xl < yl ? -1 : xl > yl; +} + if (STATIC_CONSTANT_P (xi.len == 1 && xi.val[0] >= 0)) +{ + /* If YI doesn't fit in a HWI then it must be larger than XI. */ + if (yi.len != 1) return -1; - else if (xl == yl) - return 0; - else - return 1; + /* Otherwise compare directly. */ + unsigned HOST_WIDE_INT xl = xi.val[0]; + unsigned HOST_WIDE_INT yl = yi.to_uhwi (); + return xl < yl ? -1 : xl > yl; +} + /* Optimize the case of two HWIs. The HWIs are implicitly sign-extended + for precisions greater than HOST_BITS_WIDE_INT, but sign-extending both + values does not change the result. */ + if (xi.len + yi.len == 2) +{ + unsigned HOST_WIDE_INT xl = xi.to_uhwi (); + unsigned HOST_WIDE_INT yl = yi.to_uhwi (); + return xl < yl ? -1 : xl > yl; } return cmpu_large (xi.val, xi.len, precision, yi.val, yi.len); }
Re: [wide-int] Handle more ltu_p cases inline
this is fine. kenny On 11/28/2013 12:29 PM, Richard Sandiford wrote: The existing ltu_p fast path can handle any pairs of single-HWI inputs, even for precision > HOST_BITS_PER_WIDE_INT. In that case both xl and yl are implicitly sign-extended to the larger precision, but with the extended values still being compared as unsigned. The extension doesn't change the result in that case. When compiling a recent fold-const.ii, this reduces the number of ltu_p_large calls from 23849 to 697. Tested on x86_64-linux-gnu. OK to install? Thanks, Richard Index: gcc/alias.c === --- gcc/alias.c 2013-11-20 12:12:49.393055063 + +++ gcc/alias.c 2013-11-28 12:24:23.307549245 + @@ -342,7 +342,7 @@ ao_ref_from_mem (ao_ref *ref, const_rtx || (DECL_P (ref->base) && (DECL_SIZE (ref->base) == NULL_TREE || TREE_CODE (DECL_SIZE (ref->base)) != INTEGER_CST - || wi::ltu_p (DECL_SIZE (ref->base), + || wi::ltu_p (wi::to_offset (DECL_SIZE (ref->base)), ref->offset + ref->size) return false; Index: gcc/wide-int.h === --- gcc/wide-int.h 2013-11-28 11:44:39.041731636 + +++ gcc/wide-int.h 2013-11-28 12:48:36.200764215 + @@ -1740,13 +1740,15 @@ wi::ltu_p (const T1 &x, const T2 &y) unsigned int precision = get_binary_precision (x, y); WIDE_INT_REF_FOR (T1) xi (x, precision); WIDE_INT_REF_FOR (T2) yi (y, precision); - /* Optimize comparisons with constants and with sub-HWI unsigned - integers. */ + /* Optimize comparisons with constants. */ if (STATIC_CONSTANT_P (yi.len == 1 && yi.val[0] >= 0)) return xi.len == 1 && xi.to_uhwi () < (unsigned HOST_WIDE_INT) yi.val[0]; if (STATIC_CONSTANT_P (xi.len == 1 && xi.val[0] >= 0)) return yi.len != 1 || yi.to_uhwi () > (unsigned HOST_WIDE_INT) xi.val[0]; - if (precision <= HOST_BITS_PER_WIDE_INT) + /* Optimize the case of two HWIs. The HWIs are implicitly sign-extended + for precisions greater than HOST_BITS_WIDE_INT, but sign-extending both + values does not change the result. */ + if (xi.len + yi.len == 2) { unsigned HOST_WIDE_INT xl = xi.to_uhwi (); unsigned HOST_WIDE_INT yl = yi.to_uhwi ();
Re: [wide-int] Handle more add and sub cases inline
I would like to see some comment to the effect that this to allow inlining for the common case for widest int and offset int without inlining the uncommon case for regular wide-int. On 11/28/2013 12:38 PM, Richard Sandiford wrote: Currently add and sub have no fast path for offset_int and widest_int, they just call the out-of-line version. This patch handles the single-HWI cases inline. At least on x86_64, this only adds one branch per call; the fast path itself is straight-line code. On the same fold-const.ii testcase, this reduces the number of add_large calls from 877507 to 42459. It reduces the number of sub_large calls from 25707 to 148. Tested on x86_64-linux-gnu. OK to install? Thanks, Richard Index: gcc/wide-int.h === --- gcc/wide-int.h 2013-11-28 13:34:19.596839877 + +++ gcc/wide-int.h 2013-11-28 16:08:11.387731775 + @@ -2234,6 +2234,17 @@ wi::add (const T1 &x, const T2 &y) val[0] = xi.ulow () + yi.ulow (); result.set_len (1); } + else if (STATIC_CONSTANT_P (precision > HOST_BITS_PER_WIDE_INT) + && xi.len + yi.len == 2) +{ + unsigned HOST_WIDE_INT xl = xi.ulow (); + unsigned HOST_WIDE_INT yl = yi.ulow (); + unsigned HOST_WIDE_INT resultl = xl + yl; + val[0] = resultl; + val[1] = (HOST_WIDE_INT) resultl < 0 ? 0 : -1; + result.set_len (1 + (((resultl ^ xl) & (resultl ^ yl)) + >> (HOST_BITS_PER_WIDE_INT - 1))); +} else result.set_len (add_large (val, xi.val, xi.len, yi.val, yi.len, precision, @@ -2288,6 +2299,17 @@ wi::sub (const T1 &x, const T2 &y) val[0] = xi.ulow () - yi.ulow (); result.set_len (1); } + else if (STATIC_CONSTANT_P (precision > HOST_BITS_PER_WIDE_INT) + && xi.len + yi.len == 2) +{ + unsigned HOST_WIDE_INT xl = xi.ulow (); + unsigned HOST_WIDE_INT yl = yi.ulow (); + unsigned HOST_WIDE_INT resultl = xl - yl; + val[0] = resultl; + val[1] = (HOST_WIDE_INT) resultl < 0 ? 0 : -1; + result.set_len (1 + (((resultl ^ xl) & (xl ^ yl)) + >> (HOST_BITS_PER_WIDE_INT - 1))); +} else result.set_len (sub_large (val, xi.val, xi.len, yi.val, yi.len, precision,
[wide-int] small cleanup in wide-int.*
This patch does three things in wide-int: 1) it cleans up some comments. 2) removes a small amount of trash. 3) it changes the max size of the wide int from being 4x of MAX_BITSIZE_MODE_ANY_INT to 2x +1. This should improve large muls and divs as well as perhaps help with some cache behavior. ok to commit Index: gcc/wide-int.h === --- gcc/wide-int.h (revision 205488) +++ gcc/wide-int.h (working copy) @@ -55,10 +55,12 @@ along with GCC; see the file COPYING3. 2) offset_int. This is a fixed size representation that is guaranteed to be large enough to compute any bit or byte sized - address calculation on the target. Currently the value is 64 + 4 - bits rounded up to the next number even multiple of + address calculation on the target. Currently the value is 64 + 3 + + 1 bits rounded up to the next number even multiple of HOST_BITS_PER_WIDE_INT (but this can be changed when the first - port needs more than 64 bits for the size of a pointer). + port needs more than 64 bits for the size of a pointer). The 3 + bits allow the bits of byte to accessed, the 1 allows any + unsigned value to be converted to signed without overflowing. This flavor can be used for all address math on the target. In this representation, the values are sign or zero extended based @@ -112,11 +114,11 @@ along with GCC; see the file COPYING3. two, the default is the prefered representation. All three flavors of wide_int are represented as a vector of - HOST_WIDE_INTs. The default and widest_int vectors contain enough elements - to hold a value of MAX_BITSIZE_MODE_ANY_INT bits. offset_int contains only - enough elements to hold ADDR_MAX_PRECISION bits. The values are stored - in the vector with the least significant HOST_BITS_PER_WIDE_INT bits - in element 0. + HOST_WIDE_INTs. The default and widest_int vectors contain enough + elements to hold a value of MAX_BITSIZE_MODE_ANY_INT bits. + offset_int contains only enough elements to hold ADDR_MAX_PRECISION + bits. The values are stored in the vector with the least + significant HOST_BITS_PER_WIDE_INT bits in element 0. The default wide_int contains three fields: the vector (VAL), the precision and a length (LEN). The length is the number of HWIs @@ -223,10 +225,6 @@ along with GCC; see the file COPYING3. #include "signop.h" #include "insn-modes.h" -#if 0 -#define DEBUG_WIDE_INT -#endif - /* The MAX_BITSIZE_MODE_ANY_INT is automatically generated by a very early examination of the target's mode file. Thus it is safe that some small multiple of this number is easily larger than any number @@ -235,8 +233,8 @@ along with GCC; see the file COPYING3. range of a multiply. This code needs 2n + 2 bits. */ #define WIDE_INT_MAX_ELTS \ - ((4 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \ - / HOST_BITS_PER_WIDE_INT) + (((2 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \ +/ HOST_BITS_PER_WIDE_INT) + 1) /* This is the max size of any pointer on any machine. It does not seem to be as easy to sniff this out of the machine description as Index: gcc/wide-int.cc === --- gcc/wide-int.cc (revision 205488) +++ gcc/wide-int.cc (working copy) @@ -2090,59 +2090,5 @@ void gt_ggc_mx (widest_int *) { } void gt_pch_nx (widest_int *, void (*) (void *, void *), void *) { } void gt_pch_nx (widest_int *) { } -/* - * Private debug printing routines. - */ -#ifdef DEBUG_WIDE_INT -/* The debugging routines print results of wide operations into the - dump files of the respective passes in which they were called. */ -static char * -dumpa (const HOST_WIDE_INT *val, unsigned int len, unsigned int prec, char *buf) -{ - int i; - unsigned int l; - const char * sep = ""; - l = sprintf (buf, "[%d (", prec); - for (i = len - 1; i >= 0; i--) -{ - l += sprintf (&buf[l], "%s" HOST_WIDE_INT_PRINT_HEX, sep, val[i]); - sep = " "; -} - gcc_assert (len != 0); - - l += sprintf (&buf[l], ")]"); - - gcc_assert (l < MAX_SIZE); - return buf; - - -} -#endif - -#if 0 -/* The debugging routines print results of wide operations into the - dump files of the respective passes in which they were called. */ -char * -wide_int_ro::dump (char* buf) const -{ - int i; - unsigned int l; - const char * sep = ""; - - l = sprintf (buf, "[%d (", precision); - for (i = len - 1; i >= 0; i--) -{ - l += sprintf (&buf[l], "%s" HOST_WIDE_INT_PRINT_HEX, sep, val[i]); - sep = " "; -} - - gcc_assert (len != 0); - - l += sprintf (&buf[l], ")]"); - - gcc_assert (l < MAX_SIZE); - return buf; -} -#endif
Re: wide-int, rtl
Eric, Let me make one high level comment here and the low level comments will be responded to when i fix the patch. CONST_DOUBLE has two hwis in it. So in practice, you get 128 bits and that is it.a CONST_WIDE_INT has an array of HWIs that has as many elements as it needs to represent the value. If TARGET_SUPPORTS_WIDE_INT == 0 (the default and currently the value for every public port except the RS6000), then rtl integer constants that do not fit in a CONST_INT are put in a CONST_DOUBLE and there are never any CONST_WIDE_INTS created.A platform like this cannot correctly represent variables larger than 128 bits. i.e. this is like the current trunk except that there are fewer bugs for the 128 bit variables. if TARGET_SUPPORTS_WIDE_INT is not 0, then rtl integer constants are put in CONST_WIDE_INT and CONST_DOUBLES only hold floating point values. For the last comment that you made, if the target does not support wide int, then there is simply no way to represent that constant properly and if the target accepts types larger than TImode, then the maintainer needs to do the work to use CONST_WIDE_INTs. This is not a lot a work, but it requires target specific knowledge to convert the patterns, predicates and constraints to look for CONST_WIDE_INTs rather than CONST_DOUBLE for larger integers. As far as your comment about const_double_operand, i could add some conditional code to make this not accept integers if the TARGET... is not 0. Richard Sandiford wanted my to use that test as infrequently as possible. thanks for your comments. Kenny On 11/27/2013 11:24 AM, Eric Botcazou wrote: Richi has asked the we break the wide-int patch so that the individual port and front end maintainers can review their parts without have to go through the entire patch.This patch covers the first half of the rtl code. --- a/gcc/cse.c +++ b/gcc/cse.c @@ -2336,15 +2336,23 @@ hash_rtx_cb (const_rtx x, enum machine_mode mode, + (unsigned int) INTVAL (x)); return hash; +case CONST_WIDE_INT: + { + int i; + for (i = 0; i < CONST_WIDE_INT_NUNITS (x); i++) + hash += CONST_WIDE_INT_ELT (x, i); + } + return hash; You can write "for (int i = 0; ..." now and remove the parentheses. --- a/gcc/cselib.c +++ b/gcc/cselib.c @@ -1121,15 +1120,23 @@ cselib_hash_rtx (rtx x, int create, enum machine_mode memmode) hash += ((unsigned) CONST_INT << 7) + INTVAL (x); return hash ? hash : (unsigned int) CONST_INT; +case CONST_WIDE_INT: + { + int i; + for (i = 0; i < CONST_WIDE_INT_NUNITS (x); i++) + hash += CONST_WIDE_INT_ELT (x, i); + } + return hash; + Likewise. --- a/gcc/expmed.c +++ b/gcc/expmed.c @@ -3298,23 +3268,13 @@ choose_multiplier (unsigned HOST_WIDE_INT d, int n, int precision, pow = n + lgup; pow2 = n + lgup - precision; - /* We could handle this with some effort, but this case is much - better handled directly with a scc insn, so rely on caller using - that. */ - gcc_assert (pow != HOST_BITS_PER_DOUBLE_INT); Why removing it? --- a/gcc/expr.c +++ b/gcc/expr.c @@ -722,64 +722,33 @@ convert_modes (enum machine_mode mode, enum machine_mode oldmode, rtx x, int uns if (mode == oldmode) return x; - /* There is one case that we must handle specially: If we are converting - a CONST_INT into a mode whose size is twice HOST_BITS_PER_WIDE_INT and - we are to interpret the constant as unsigned, gen_lowpart will do - the wrong if the constant appears negative. What we want to do is - make the high-order word of the constant zero, not all ones. */ - - if (unsignedp && GET_MODE_CLASS (mode) == MODE_INT - && GET_MODE_BITSIZE (mode) == HOST_BITS_PER_DOUBLE_INT - && CONST_INT_P (x) && INTVAL (x) < 0) + if (CONST_SCALAR_INT_P (x) + && GET_MODE_CLASS (mode) == MODE_INT) On a single line. { - double_int val = double_int::from_uhwi (INTVAL (x)); - - /* We need to zero extend VAL. */ - if (oldmode != VOIDmode) - val = val.zext (GET_MODE_BITSIZE (oldmode)); - - return immed_double_int_const (val, mode); + /* If the caller did not tell us the old mode, then there is +not much to do with respect to canonization. We have to assume +that all the bits are significant. */ + if (GET_MODE_CLASS (oldmode) != MODE_INT) + oldmode = MAX_MODE_INT; + wide_int w = wide_int::from (std::make_pair (x, oldmode), + GET_MODE_PRECISION (mode), + unsignedp ? UNSIGNED : SIGNED); + return immed_wide_int_const (w, mode); canonicalization? @@ -5301,10 +5271,10 @@ store_expr (tree exp, rtx target, int call_param_p, bool nontemporal) &alt_rtl); } - /* If TEMP is a VOIDmode constant and the mode of the type of EXP is not - the same as that of TA
Re: Some wide-int review comments
committed as revision 205448 to trunk. committed as revision 205455 to wide-int branch. On 11/27/2013 05:50 AM, Richard Biener wrote: On Tue, Nov 26, 2013 at 5:33 PM, Kenneth Zadeck wrote: Richi, patch ping Ok. Thanks, Richard. also two more pieces of information.With further testing, this seems to fix Tests that now work, but didn't before: === ext/random/hypergeometric_distribution/operators/values.cc (test for excess errors) New tests that PASS: ext/random/hypergeometric_distribution/operators/values.cc execution test also, the corresponding frag for fold-const.c on the wide-int branch will look like Index: gcc/fold-const.c === --- gcc/fold-const.c(revision 205224) +++ gcc/fold-const.c(working copy) @@ -1030,51 +1030,51 @@ int_const_binop_1 (enum tree_code code, case TRUNC_DIV_EXPR: case EXACT_DIV_EXPR: - res = wi::div_trunc (arg1, arg2, sign, &overflow); - if (overflow) + if (arg2 == 0) return NULL_TREE; + res = wi::div_trunc (arg1, arg2, sign, &overflow); break; case FLOOR_DIV_EXPR: - res = wi::div_floor (arg1, arg2, sign, &overflow); - if (overflow) + if (arg2 == 0) return NULL_TREE; + res = wi::div_floor (arg1, arg2, sign, &overflow); break; case CEIL_DIV_EXPR: - res = wi::div_ceil (arg1, arg2, sign, &overflow); - if (overflow) + if (arg2 == 0) return NULL_TREE; + res = wi::div_ceil (arg1, arg2, sign, &overflow); break; case ROUND_DIV_EXPR: - res = wi::div_round (arg1, arg2, sign, &overflow); - if (overflow) + if (arg2 == 0) return NULL_TREE; + res = wi::div_round (arg1, arg2, sign, &overflow); break; case TRUNC_MOD_EXPR: - res = wi::mod_trunc (arg1, arg2, sign, &overflow); - if (overflow) + if (arg2 == 0) return NULL_TREE; + res = wi::mod_trunc (arg1, arg2, sign, &overflow); break; case FLOOR_MOD_EXPR: - res = wi::mod_floor (arg1, arg2, sign, &overflow); - if (overflow) + if (arg2 == 0) return NULL_TREE; + res = wi::mod_floor (arg1, arg2, sign, &overflow); break; case CEIL_MOD_EXPR: - res = wi::mod_ceil (arg1, arg2, sign, &overflow); - if (overflow) + if (arg2 == 0) return NULL_TREE; + res = wi::mod_ceil (arg1, arg2, sign, &overflow); break; case ROUND_MOD_EXPR: - res = wi::mod_round (arg1, arg2, sign, &overflow); - if (overflow) + if (arg2 == 0) return NULL_TREE; + res = wi::mod_round (arg1, arg2, sign, &overflow); break; case MIN_EXPR: On 11/20/2013 06:31 PM, Kenneth Zadeck wrote: On 11/13/2013 04:59 AM, Richard Biener wrote: On Tue, Nov 12, 2013 at 5:43 PM, Kenneth Zadeck wrote: On 11/12/2013 11:27 AM, Joseph S. Myers wrote: On Tue, 12 Nov 2013, Kenneth Zadeck wrote: Richi, i am having a little trouble putting this back the way that you want. The issue is rem. what is supposed to happen for INT_MIN % -1? I would assume because i am failing the last case of gcc.dg/c90-const-expr-8.c that INT_MIN %-1 should not overflow even if INT_MIN / -1 does. however, Given the conclusion in C11 that a%b should be considered undefined if a/b is not representable, I think it's reasonable to say INT_MIN % -1 *should* be considered to overflow (for all C standard versions) (and bug 30484 is only a bug for -fwrapv). however, my local question is what do we want the api to be int-const-binop-1?The existing behavior seems to be that at least for common modes this function silently returns 0 and it is up to the front ends to put their own spin on it. For wide-int you create 1:1 the behavior of current trunk (if a change of behavior in TImode is not tested in the testsuite then you can ignore that). Whatever change you do to semantics of functions you do separately from wide-int (preferably first on trunk, or at your choice after the wide-int merge). For this case in question I'd say a % -1 should return 0, but for INT_MIN % -1 that 0 should have TREE_OVERFLOW set (and thus you need to adjust that c90-const-expr-8.c testcase). Richard. kenny richi, I have done this exactly as you suggested. bootstrapped and regression tested on x86-64. 2013-11-20 Kenneth Zadeck * fold-const.c (int_const_binop_1): Make INT_MIN % -1 return 0 with the overflow bit set. 2013-11-20 Kenneth Zadeck * gcc.dg/c90-const-expr-8.c: Look for overflow on INT_MIN % -1. * gcc.dg/c99-const-expr-8.c: Look for overflow on INT_MIN % -1. ok to commit? kenny Index: gcc/fold-const.c === --- gcc/fold-const.c (revision 205447) +++ gcc/fold-const.c (working
Re: wide-int, rs6000
We will of course measure it but the only thing that is different because of the conversion is that timode integers are packaged differently > On Nov 26, 2013, at 6:17 PM, David Edelsohn wrote: > >> On Tue, Nov 26, 2013 at 5:46 PM, Mike Stump wrote: >> >> Ok? > > The revised version of the patch looks okay from a technical > perspective, but I still am concerned about the compile-time impact. I > thought that I saw a comment on IRC that wide-int seems to have a 3-5% > compile time impact, which is a concern. I am concerned about the > rs6000 port being further slowed with respect to other targets. > > Thanks, David
Re: Some wide-int review comments
Richi, patch ping also two more pieces of information.With further testing, this seems to fix Tests that now work, but didn't before: === ext/random/hypergeometric_distribution/operators/values.cc (test for excess errors) New tests that PASS: ext/random/hypergeometric_distribution/operators/values.cc execution test also, the corresponding frag for fold-const.c on the wide-int branch will look like Index: gcc/fold-const.c === --- gcc/fold-const.c(revision 205224) +++ gcc/fold-const.c(working copy) @@ -1030,51 +1030,51 @@ int_const_binop_1 (enum tree_code code, case TRUNC_DIV_EXPR: case EXACT_DIV_EXPR: - res = wi::div_trunc (arg1, arg2, sign, &overflow); - if (overflow) + if (arg2 == 0) return NULL_TREE; + res = wi::div_trunc (arg1, arg2, sign, &overflow); break; case FLOOR_DIV_EXPR: - res = wi::div_floor (arg1, arg2, sign, &overflow); - if (overflow) + if (arg2 == 0) return NULL_TREE; + res = wi::div_floor (arg1, arg2, sign, &overflow); break; case CEIL_DIV_EXPR: - res = wi::div_ceil (arg1, arg2, sign, &overflow); - if (overflow) + if (arg2 == 0) return NULL_TREE; + res = wi::div_ceil (arg1, arg2, sign, &overflow); break; case ROUND_DIV_EXPR: - res = wi::div_round (arg1, arg2, sign, &overflow); - if (overflow) + if (arg2 == 0) return NULL_TREE; + res = wi::div_round (arg1, arg2, sign, &overflow); break; case TRUNC_MOD_EXPR: - res = wi::mod_trunc (arg1, arg2, sign, &overflow); - if (overflow) + if (arg2 == 0) return NULL_TREE; + res = wi::mod_trunc (arg1, arg2, sign, &overflow); break; case FLOOR_MOD_EXPR: - res = wi::mod_floor (arg1, arg2, sign, &overflow); - if (overflow) + if (arg2 == 0) return NULL_TREE; + res = wi::mod_floor (arg1, arg2, sign, &overflow); break; case CEIL_MOD_EXPR: - res = wi::mod_ceil (arg1, arg2, sign, &overflow); - if (overflow) + if (arg2 == 0) return NULL_TREE; + res = wi::mod_ceil (arg1, arg2, sign, &overflow); break; case ROUND_MOD_EXPR: - res = wi::mod_round (arg1, arg2, sign, &overflow); - if (overflow) + if (arg2 == 0) return NULL_TREE; + res = wi::mod_round (arg1, arg2, sign, &overflow); break; case MIN_EXPR: On 11/20/2013 06:31 PM, Kenneth Zadeck wrote: On 11/13/2013 04:59 AM, Richard Biener wrote: On Tue, Nov 12, 2013 at 5:43 PM, Kenneth Zadeck wrote: On 11/12/2013 11:27 AM, Joseph S. Myers wrote: On Tue, 12 Nov 2013, Kenneth Zadeck wrote: Richi, i am having a little trouble putting this back the way that you want. The issue is rem. what is supposed to happen for INT_MIN % -1? I would assume because i am failing the last case of gcc.dg/c90-const-expr-8.c that INT_MIN %-1 should not overflow even if INT_MIN / -1 does. however, Given the conclusion in C11 that a%b should be considered undefined if a/b is not representable, I think it's reasonable to say INT_MIN % -1 *should* be considered to overflow (for all C standard versions) (and bug 30484 is only a bug for -fwrapv). however, my local question is what do we want the api to be int-const-binop-1?The existing behavior seems to be that at least for common modes this function silently returns 0 and it is up to the front ends to put their own spin on it. For wide-int you create 1:1 the behavior of current trunk (if a change of behavior in TImode is not tested in the testsuite then you can ignore that). Whatever change you do to semantics of functions you do separately from wide-int (preferably first on trunk, or at your choice after the wide-int merge). For this case in question I'd say a % -1 should return 0, but for INT_MIN % -1 that 0 should have TREE_OVERFLOW set (and thus you need to adjust that c90-const-expr-8.c testcase). Richard. kenny richi, I have done this exactly as you suggested. bootstrapped and regression tested on x86-64. 2013-11-20 Kenneth Zadeck * fold-const.c (int_const_binop_1): Make INT_MIN % -1 return 0 with the overflow bit set. 2013-11-20 Kenneth Zadeck * gcc.dg/c90-const-expr-8.c: Look for overflow on INT_MIN % -1. * gcc.dg/c99-const-expr-8.c: Look for overflow on INT_MIN % -1. ok to commit? kenny
Re: wide-int, ada
On 11/26/2013 09:16 AM, Richard Biener wrote: On Tue, Nov 26, 2013 at 3:15 PM, H.J. Lu wrote: On Tue, Nov 26, 2013 at 6:03 AM, wrote: On Nov 26, 2013, at 6:00 AM, "H.J. Lu" wrote: On Tue, Nov 26, 2013 at 5:55 AM, Richard Biener wrote: On Tue, Nov 26, 2013 at 2:44 PM, Richard Earnshaw wrote: On 26/11/13 09:18, Eric Botcazou wrote: you are correct - this was an incorrect change. I believe that the patch below would be correct, but it is impossible to test it because (i believe) that gcc no longer works if the host_bits_per_wide_int is 32. I could be wrong about this but if i am correct, what do you want me to do? While you're right that most mainstream architectures now require a 64-bit HWI, not all of them do according to config.gcc, so I don't think that this path is entirely dead yet. I'll carry out the testing once we agree on the final change. I'm hoping, once this patch series is in that we might be able to migrate the ARM port back to supporting a 32-bit HWI. The driving factor behind the original switch was supporting 128-bit constants for Neon and these patches should resolve that. i?86 would be another candidate (if you don't build a compiler with -m64 support). Not true for x86 since we have Variable HOST_WIDE_INT ix86_isa_flags = TARGET_64BIT_DEFAULT | TARGET_SUBTARGET_ISA_DEFAULT in i386.opt. We need more than 32 bits for ix86_isa_flags. Then that should be HOST_WIDEST_INT instead. Also one function in i386.c generates less optimal results when HOST_WIDE_INT is 32-bit such that we were generating different outputs from the same input on x86 and on x86-64 with -m32. Well - we knew the code would bitrot once we decided to always default to 64bit HWI ... Richard. -- H.J. Just for the record, the straw the broke the camel's back for me was some bit map data structure in one of the gen functions that has a HWI worth of bits and i needed 35 of them for the port i was trying.It was all fixable, but i had already signed up for one task that seemed like it was going to take the rest of my like and i did not need another. kenny
Re: wide-int, ada
On 11/26/2013 09:12 AM, Richard Biener wrote: On Tue, Nov 26, 2013 at 3:00 PM, Kenneth Zadeck wrote: On 11/26/2013 08:44 AM, Richard Earnshaw wrote: On 26/11/13 09:18, Eric Botcazou wrote: you are correct - this was an incorrect change. I believe that the patch below would be correct, but it is impossible to test it because (i believe) that gcc no longer works if the host_bits_per_wide_int is 32. I could be wrong about this but if i am correct, what do you want me to do? While you're right that most mainstream architectures now require a 64-bit HWI, not all of them do according to config.gcc, so I don't think that this path is entirely dead yet. I'll carry out the testing once we agree on the final change. I'm hoping, once this patch series is in that we might be able to migrate the ARM port back to supporting a 32-bit HWI. The driving factor behind the original switch was supporting 128-bit constants for Neon and these patches should resolve that. R. Richard, I had not realized that you were into self abuse like that. you are going to have a bad time. I tried this as a way to test the wide-int branch because if we made hwi be 32bits, then it would trigger the long version of the implementation wide-int routines. What a disaster richard sandiford told me not to try and i ignored him. that was a wasted weekend!!!There are roadblocks everywhere. i certainly do not have a complete list, because i gave up after hacking a few things back. But very quickly you find places like real.[ch] which no longer work because the math that the dfp people hacked into it no longer works if the hwis are 32 bits.it is misleading because all of the tests are still there, it is just that the equations return the wrong answers.It is like this, one place after another. As far as your hope that once this patch goes in you would be able to support 128 bit constants is only partially true.it will/should be the case, that you always get the correct answer and the code will not ice. That is the good news.But the bad news is that there is still way to much code of the form, if var fits in hwi, do some optimization, else do not do the optimization. Richi did not want me to attack this code yet because it would have made the patches even bigger. For the rtl level a lot of this was removed because i did it before he told me not to, but the tree level still has tons of code like this. So at least for a long time, like the close of the next release, this is not going to get fixed. Note that in the long run we want wide-int to use HOST_WIDEST_FAST_INT and not HOST_WIDE_INT internally. I know I asked you to do that but it's of course too late now. Richard. i started to do it and it turned out to be harder than either of us thought it would. As i remember it, the problem came up of what were you going to populate the tree_csts and const_wide_ints. I assume that you wanted those to have HWFI in them also.Then i started looking at the *fits_*hwi to_*hwi which is used in a billion places and that was now starting to look expensive. perhaps this change could/should be done later after there are almost no calls to these functions. But while there are still a billion of them in the code, this seemed like a mistake. Kenny
Re: wide-int, ada
On 11/26/2013 08:44 AM, Richard Earnshaw wrote: On 26/11/13 09:18, Eric Botcazou wrote: you are correct - this was an incorrect change. I believe that the patch below would be correct, but it is impossible to test it because (i believe) that gcc no longer works if the host_bits_per_wide_int is 32. I could be wrong about this but if i am correct, what do you want me to do? While you're right that most mainstream architectures now require a 64-bit HWI, not all of them do according to config.gcc, so I don't think that this path is entirely dead yet. I'll carry out the testing once we agree on the final change. I'm hoping, once this patch series is in that we might be able to migrate the ARM port back to supporting a 32-bit HWI. The driving factor behind the original switch was supporting 128-bit constants for Neon and these patches should resolve that. R. Richard, I had not realized that you were into self abuse like that. you are going to have a bad time. I tried this as a way to test the wide-int branch because if we made hwi be 32bits, then it would trigger the long version of the implementation wide-int routines. What a disaster richard sandiford told me not to try and i ignored him. that was a wasted weekend!!!There are roadblocks everywhere. i certainly do not have a complete list, because i gave up after hacking a few things back. But very quickly you find places like real.[ch] which no longer work because the math that the dfp people hacked into it no longer works if the hwis are 32 bits.it is misleading because all of the tests are still there, it is just that the equations return the wrong answers.It is like this, one place after another. As far as your hope that once this patch goes in you would be able to support 128 bit constants is only partially true.it will/should be the case, that you always get the correct answer and the code will not ice. That is the good news.But the bad news is that there is still way to much code of the form, if var fits in hwi, do some optimization, else do not do the optimization. Richi did not want me to attack this code yet because it would have made the patches even bigger. For the rtl level a lot of this was removed because i did it before he told me not to, but the tree level still has tons of code like this. So at least for a long time, like the close of the next release, this is not going to get fixed. Kenny
Re: wide-int, ada
On 11/25/2013 03:46 AM, Eric Botcazou wrote: Richi has asked the we break the wide-int patch so that the individual port and front end maintainers can review their parts without have to go through the entire patch.This patch covers the ada front-end. I don't think that the mechanical change in UI_From_gnu is correct, see the comment just above. The annotate_value change is very likely correct, but please double check and, upon positive outcome, remove the last sentence of the comment just above. The rest looks good, thanks. Eric, you are correct - this was an incorrect change. I believe that the patch below would be correct, but it is impossible to test it because (i believe) that gcc no longer works if the host_bits_per_wide_int is 32. I could be wrong about this but if i am correct, what do you want me to do? Kenny Index: gcc/ada/gcc-interface/cuintp.c === --- gcc/ada/gcc-interface/cuintp.c(revision 205374) +++ gcc/ada/gcc-interface/cuintp.c(working copy) @@ -167,7 +167,7 @@ UI_From_gnu (tree Input) in a signed 64-bit integer. */ if (tree_fits_shwi_p (Input)) return UI_From_Int (tree_to_shwi (Input)); - else if (wi::lts_p (Input, 0) && TYPE_UNSIGNED (gnu_type)) + else if (wi::gtu_p (wi::lrshift (Input, 63), 0) && TYPE_UNSIGNED (gnu_type)) return No_Uint; #endif
Re: wide-int, dwarf
I replied to the wrong email when i sent the first version of this emal. sorry.This was the comment that was addressed by this fix. fixed on the wide-int branch 205363. On 11/24/2013 08:43 AM, Jason Merrill wrote: On 11/23/2013 09:55 PM, Kenneth Zadeck wrote: On 11/23/2013 08:47 PM, Jason Merrill wrote: So if the target supports wide ints you'll always use the scalar float code? Why does that do the right thing? if TARGET_SUPPORTS_WIDE_INT != 0, then integers are NEVER placed in const-doubles. large integers go into CONST_WIDE_INTS so this case always represents a floating point constant and nothing else. So yes. I think that you could argue that the comment above this frag needs to be rewritten because it is wrong if TARGET_SUPPORTS_WIDE_INT != 0. Please. OK with that change. Jason Index: dwarf2out.c === --- dwarf2out.c (revision 205360) +++ dwarf2out.c (working copy) @@ -12880,10 +12880,15 @@ mem_loc_descriptor (rtx rtl, enum machin { dw_die_ref type_die; - /* Note that a CONST_DOUBLE rtx could represent either an integer - or a floating-point constant. A CONST_DOUBLE is used whenever - the constant requires more than one word in order to be - adequately represented. We output CONST_DOUBLEs as blocks. */ + /* Note that if TARGET_SUPPORTS_WIDE_INT == 0, a + CONST_DOUBLE rtx could represent either an large integer + or a floating-point constant. If + TARGET_SUPPORTS_WIDE_INT != 0, the value is always a + floating point constant. + + When it is an integer, a CONST_DOUBLE is used whenever + the constant requires 2 HWIs to be adequately + represented. We output CONST_DOUBLEs as blocks. */ if (mode == VOIDmode || (GET_MODE (rtl) == VOIDmode && GET_MODE_BITSIZE (mode) != HOST_BITS_PER_DOUBLE_INT))
Re: wide-int, C++ front end
fixed on the wide-int branch 205363. On 11/23/2013 09:00 PM, Jason Merrill wrote: On 11/23/2013 02:20 PM, Mike Stump wrote: @@ -2605,8 +2606,7 @@ cp_tree_equal (tree t1, tree t2) switch (code1) { case INTEGER_CST: - return TREE_INT_CST_LOW (t1) == TREE_INT_CST_LOW (t2) -&& TREE_INT_CST_HIGH (t1) == TREE_INT_CST_HIGH (t2); + return wi::to_widest (t1) == wi::to_widest (t2); Why not use wi::eq_p like you do in the C front end? Jason Index: dwarf2out.c === --- dwarf2out.c (revision 205360) +++ dwarf2out.c (working copy) @@ -12880,10 +12880,15 @@ mem_loc_descriptor (rtx rtl, enum machin { dw_die_ref type_die; - /* Note that a CONST_DOUBLE rtx could represent either an integer - or a floating-point constant. A CONST_DOUBLE is used whenever - the constant requires more than one word in order to be - adequately represented. We output CONST_DOUBLEs as blocks. */ + /* Note that if TARGET_SUPPORTS_WIDE_INT == 0, a + CONST_DOUBLE rtx could represent either an large integer + or a floating-point constant. If + TARGET_SUPPORTS_WIDE_INT != 0, the value is always a + floating point constant. + + When it is an integer, a CONST_DOUBLE is used whenever + the constant requires 2 HWIs to be adequately + represented. We output CONST_DOUBLEs as blocks. */ if (mode == VOIDmode || (GET_MODE (rtl) == VOIDmode && GET_MODE_BITSIZE (mode) != HOST_BITS_PER_DOUBLE_INT))
Re: wide-int, loop
On 11/25/2013 06:04 AM, Richard Biener wrote: On Sat, Nov 23, 2013 at 8:22 PM, Mike Stump wrote: Richi has asked the we break the wide-int patch so that the individual port and front end maintainers can review their parts without have to go through the entire patch.This patch covers the loop code. Ok? @@ -2662,8 +2661,8 @@ iv_number_of_iterations (struct loop *loop, rtx insn, rtx condition, iv1.step = const0_rtx; if (INTVAL (iv0.step) < 0) { - iv0.step = simplify_gen_unary (NEG, comp_mode, iv0.step, mode); - iv1.base = simplify_gen_unary (NEG, comp_mode, iv1.base, mode); + iv0.step = simplify_gen_unary (NEG, comp_mode, iv0.step, comp_mode); + iv1.base = simplify_gen_unary (NEG, comp_mode, iv1.base, comp_mode); } iv0.step = lowpart_subreg (mode, iv0.step, comp_mode); separate bugfix? most likely.i will submit separately. @@ -1378,7 +1368,8 @@ decide_peel_simple (struct loop *loop, int flags) /* If we have realistic estimate on number of iterations, use it. */ if (get_estimated_loop_iterations (loop, &iterations)) { - if (double_int::from_shwi (npeel).ule (iterations)) + /* TODO: unsigned/signed confusion */ + if (wi::leu_p (npeel, iterations)) { if (dump_file) { what does this refer to? npeel is unsigned. it was the fact that they were doing the from_shwi and then using an unsigned test. Otherwise looks good to me. Thanks, Richard.
Re: wide-int, fortran
On 11/24/2013 08:38 AM, N.M. Maclaren wrote: On Nov 24 2013, Kenneth Zadeck wrote: Thank you for your posting. That certainly clears up my understanding. If there is a clear description of the subset of C++ that the front-end is allowed to use, a pointer to it for the benefit of Fortran/C/Ada/whatever people would be useful. But that's an aside from this thread. There is a saying in the US that "you can always tell the pioneers because they are the ones with the arrows in their back." I feel this way with respect to C++. When this branch first went public several months ago, the wide-int class made very modest use of C++. It has publicly evolved away from that and i would now describe the usage as somewhat aggressive. I do not know if i think that this is good or bad, but i expect that because we are the first big patch to use C++ aggressively , that we are going to take some arrows. I think that using C++ even slightly aggressively without first deciding the grounds rules is a seriously bad idea. The trouble about following random pioneers is that they are very likely to lead you into a swamp, with the loss of the whole expedition. And, with C++, that is more than just likely if you follow an adventurous pioneer rather than a cautious one - it's damn-near certain :-( i do not actually disagree with you.my point is that this was all done in the open. A useful way to think about this patch is that it is nothing but plumbing.Gcc has had a painful and long evolution from being a 32 bit compiler to a 64 bit compiler and beyond. ... Yes, but that's not my point. The main problem is that integer constant expressions in C are limited to the built-in operators, of which the only tricky ones are division and remainder (and, occasionally, multiplication) - see C11 6.6#3. Fortran is not so limited, and there are much wider requirements for expression evaluation at compile time. But at both ends of the compiler there are still limits. And my point is that this is NOT an area that separates cleanly into front and back end! However, from your description, this is one component of any solution for gfortran, and it doesn't sound as if it will cause trouble until and unless someone wants to extend gfortran to wider types. They will then have to implement the other components In any event, the problem is not with making the math work correctly, which is really the primary point of this patch. It is as you say how the various front ends decide how they wish to exploit that power. Regards, Nick Maclaren.
Re: wide-int, fortran
On 11/24/2013 05:50 AM, Tobias Burnus wrote: Mike Stump wrote: Richi has asked the we break the wide-int patch so that the individual port and front end maintainers can review their parts without have to go through the entire patch.This patch covers the fortran front end. Nice clean up. The new class looks much cleaner as it avoids the LO/HI handling. - hi = TREE_INT_CST_HIGH (bound); - low = TREE_INT_CST_LOW (bound); - if (hi || low < 0 - || ((!as || as->type != AS_ASSUMED_RANK) - && low >= GFC_TYPE_ARRAY_RANK (TREE_TYPE (desc))) - || low > GFC_MAX_DIMENSIONS) + if (((!as || as->type != AS_ASSUMED_RANK) + && wi::geu_p (bound, GFC_TYPE_ARRAY_RANK (TREE_TYPE (desc + || wi::gtu_p (bound, GFC_MAX_DIMENSIONS)) gfc_error ("'dim' argument of %s intrinsic at %L is not a valid " "dimension index", upper ? "UBOUND" : "LBOUND", &expr->where); I don't see what happened to the "low < 0" check. (Ditto for the next chunk in conv_intrinsic_cobound). Otherwise, it looks okay to me. Tobias This is the magic of using the correct representation. All this code really wanted to check is that bound is a small positive integer.
Re: wide-int, fortran
On 11/24/2013 05:16 AM, N.M. Maclaren wrote: On Nov 23 2013, Andrew Pinski wrote: On Sat, Nov 23, 2013 at 12:16 PM, Steve Kargl wrote: On Sat, Nov 23, 2013 at 11:21:21AM -0800, Mike Stump wrote: Richi has asked the we break the wide-int patch so that the individual port and front end maintainers can review their parts without have to go through the entire patch. This patch covers the fortran front end. Ok? + *logical = wi::eq_p (t, 0) ? 0 : 1; I can't find the meaning of :: in n1256.pdf. What does this do? wi:: eq_p means the function eq_p inside the wi struct. But you can't tell that from the code - wi might be a namespace, and eq_p might be a class. If there is a clear description of the subset of C++ that the front-end is allowed to use, a pointer to it for the benefit of Fortran/C/Ada/whatever people would be useful. But that's an aside from this thread. There is a saying in the US that "you can always tell the pioneers because they are the ones with the arrows in their back." I feel this way with respect to C++. When this branch first went public several months ago, the wide-int class made very modest use of C++. It has publicly evolved away from that and i would now describe the usage as somewhat aggressive. I do not know if i think that this is good or bad, but i expect that because we are the first big patch to use C++ aggressively , that we are going to take some arrows. Also, given the complete lack of a description of what this patch does and no pointer to a discussion of what this patch does, and no description of its benefit to gfortran, I vote "no". The general description was in a different email: http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02877.html The main benefit is it allows for targets to support wider integer than two times HOST_WIDE_INT. So gfortran, is that it connects to the rest of the middle-end of GCC. Hmm. Looking at that makes me none the wiser, and even a web search doesn't do more than tell me the same aspects. Given that Fortran has somewhat different constraints on type widths than C, it would be very useful to know exactly what you mean by that. C++ is almost entirely irrelevant here. A useful way to think about this patch is that it is nothing but plumbing.Gcc has had a painful and long evolution from being a 32 bit compiler to a 64 bit compiler and beyond.The primary goal of this patch is change the underlying data structures used to represent integer constants from being either single host wide ints (that were limited to 64 bits) and double-int (that worked reliably for 127 bits and mostly ok for 128 bits) to a data structure that would just work reliably for any precision integer that a back end supported. But at both ends of the compiler there are still limits.It is expected that with this patch in place, that a back end maintainer can now, for the first time, support architectures that support integers wider than 128 bits. We have several public ports that appear to have tried to support some operations on 256 bits but when we tried this on our private port, we found an intolerable number of ices and incorrect code issues.They are now gone. To the extent that this patch touches the front ends, this patch is strictly plumbing. The front ends now use the new wide-int class, or the new int-cst to pass integer constants around and do constant prop on them.Beyond that, it is up to the front end maintainers to decide how or even if they will expose this to the users of the language. C and C++ currently allow the back end to tell it that they can use wider types but actually specifying a large integer constant is at best painful. Now, obviously to an implementor, it doesn't mean unlimited-width types, but does it (a) allow implementations to use hardware/firmware/emulated built-in types however wide they are, (b) allow arbitrary-width types, or (c) something else? In all cases, I can see implementation issues in gfortran that do not arise in gcc, but they are different between those models. the patch sniffs the port (in particular, it looks at the extra modes specified by the port), and allows integers that are some multiple of size of the largest mode specified by the back end. In this way ports for small machines pay less of price than the big ports will. One could imagine that this is not good enough if a front end wanted to say that it supports everything up to 256 bits no matter what the hardware can do natively, much the way that java said that every host had to do 64 bit math, even if it was just a 32 bit host. However, that would require a lot more work than was done here. In particular, at some level someone would have to sniff the port and break the math into pieces that could be implemented on the port. So I agree that some clarification would be a good idea, to avoid introducing secondary problems by accident. Quite likely there will be
Re: wide-int, i386
On 11/24/2013 05:47 AM, Uros Bizjak wrote: On Sun, Nov 24, 2013 at 11:23 AM, Kenneth Zadeck wrote: We did not do this kind of transformation for any port beyond the minimum of having the port use wide-int rather than double-int. we did do a lot of this in the common code, especially in the code that was just not correct for types beyond 64 bits. Our motivation was that this is already a huge patch and going down that road for some of the crusty old ports would have made the patch just bigger. so we limited ourselves to the places in the common code that were obstructions to port writers to make large types work. I should point out that there are even a lot of places in the common code where we left the old code alone as long as it was correct for larger types. For testing purposes, we made no changes that were not bit for bit compatible for code of 64 bits or shorter. There are some places where the right transformation would yield better code, but we left those for later so we could test this patch in a sane way. I see. I was just wondering about those "obvious" places. The patch is fairly mechanical (BTW: there are some unnecessary whitespace changes that obscure real change), so it looks OK to me. thanks. Thanks, Uros.
Re: wide-int, i386
We did not do this kind of transformation for any port beyond the minimum of having the port use wide-int rather than double-int. we did do a lot of this in the common code, especially in the code that was just not correct for types beyond 64 bits. Our motivation was that this is already a huge patch and going down that road for some of the crusty old ports would have made the patch just bigger. so we limited ourselves to the places in the common code that were obstructions to port writers to make large types work. I should point out that there are even a lot of places in the common code where we left the old code alone as long as it was correct for larger types. For testing purposes, we made no changes that were not bit for bit compatible for code of 64 bits or shorter. There are some places where the right transformation would yield better code, but we left those for later so we could test this patch in a sane way. Kenny On Nov 24, 2013, at 3:34 AM, Uros Bizjak wrote: > On Sat, Nov 23, 2013 at 8:22 PM, Mike Stump wrote: >> Richi has asked the we break the wide-int patch so that the individual port >> and front end maintainers can review their parts without have to go through >> the entire patch.This patch covers the i386 port. > > Should this patch also address construction of 128bit immediates? One > example is in ix86_build_signbit_mask, where we have to use special > tricks to create them. > > You can find other interesting places by grepping config/i386/* > sources for HOST_BITS_PER_WIDE_INT checks. These remains from times > when wide_int was 32 for i686 host, and IMO should be addressed by > wide-int patch. > > Thanks, > Uros.
Re: wide-int, aarch64
On 11/23/2013 04:36 PM, Andrew Pinski wrote: On Sat, Nov 23, 2013 at 11:19 AM, Mike Stump wrote: Richi has asked the we break the wide-int patch so that the individual port and front end maintainers can review their parts without have to go through the entire patch.This patch covers the aarch64 port. + wide_int w = real_to_integer (&m, &fail, HOST_BITS_PER_WIDE_INT * 2); Should we have a gcc_assert (!fail); after the real_to_integer? Other than that this seems obvious. It actually is not obvious. In general we did not add checking where there was none before. the fact that this interface allows the checking the trunk interface did not does not mean that we are obligated to do that. Thanks, Andrew Ok?
Re: wide-int, arc
On 11/23/2013 08:19 PM, Joern Rennecke wrote: On 23 November 2013 19:19, Mike Stump wrote: Richi has asked the we break the wide-int patch so that the individual port and front end maintainers can review their parts without have to go through the entire patch.This patch covers the arc port. Ok? wide-int.h says widest_int is always signed, yet the iterations count in the doloop interface is now unsigned. So, which is right, the code or the documentation? I see that dfp.c and vax.c directly include wide-int.h, but aarch64.c and arc.c don't. Is that safe? if the compiler produced an error message about not finding something, we added the include. It is possible that some of these may be redundant as things have moved around, but safety is not an issue. Is there a rationale which files directly include wide-int.h and which ones don't?
Re: wide-int, dwarf
On 11/23/2013 08:47 PM, Jason Merrill wrote: On 11/23/2013 02:21 PM, Mike Stump wrote: - if (SCALAR_FLOAT_MODE_P (mode)) +#if TARGET_SUPPORTS_WIDE_INT == 0 + if (!SCALAR_FLOAT_MODE_P (mode)) +{ + mem_loc_result->dw_loc_oprnd2.val_class += dw_val_class_const_double; + mem_loc_result->dw_loc_oprnd2.v.val_double += rtx_to_double_int (rtl); +} + else +#endif { unsigned int length = GET_MODE_SIZE (mode); unsigned char *array So if the target supports wide ints you'll always use the scalar float code? Why does that do the right thing? Jason if TARGET_SUPPORTS_WIDE_INT != 0, then integers are NEVER placed in const-doubles. large integers go into CONST_WIDE_INTS so this case always represents a floating point constant and nothing else. So yes. I think that you could argue that the comment above this frag needs to be rewritten because it is wrong if TARGET_SUPPORTS_WIDE_INT != 0. kenny
Re: wide-int
On 11/22/2013 03:03 PM, Richard Biener wrote: On Thu, Nov 21, 2013 at 11:08 PM, Mike Stump wrote: This patch adds support for ints wider than double_int. Ok? Please split the patch into pieces. I suggest to separate changes to the various frontends (CC maintainers), the new wide-int files, the example rs6000 conversion and the rest. Otherwise I'm sure you'll get no attention. we are happy to break the patch any way that makes since. Otherwise I'm sure you need to do some teaching about the different wide-int kinds and their behavior. i will start writing some documentation. Is the patch an old revision? Because I see the int_const_binop_1 code that I asked Kenny to revert to trunk behavior (return TREE_OVERFLOW flagged constants, not NULL_TREE). this is a special case.what we did with the stor-layout patch was that we committed the fix the the branch simultaneously with the fix going on the trunk. i have the fix for the wide-int branch, i was waiting for you to approve the fix for trunk. (as a side point i think that their is a third test case that i am going to have fix for it. it is not c or c++ i have a build with every language going on now to check this. ) +#ifndef WIDE_INT_PRINT_H +#define WIDE_INT_PRINT_H + +#include +#include "wide-int.h" + system headers must be only included from [t]system.h For the merge you want to have trunk frozen and you want to be available during the following days (I do expect fallout). unless you want to go the route of approving the wide-int* first, the rtl second But yes, i think that we would request a sunday to do the commit and be around for the next few days. kenny Thanks, Richard.
Re: wide-int
I am sorry that in the haste of battle that mike did not have an opportunity to write a proper introduction to the is patch. The patch was submitted last night so that it could be formally submitted by the end of stage 1. This patch is the same as the top of the wide-int branch that has been publicly available for public view and comment for the last several months. During that time it has significantly evolved and improved. The wide-int interface has changed in significant ways as well as it has now been tested on at least one instance of every active architecture and has been tested on one or two architectures for every single language. The wide-int branch deals with a fundamental data structure problem in the current gcc trunk.Integer math larger than 64 bits is generally buggy and beyond 128 bits is completely wrong and generally causes the compiler to ICE. This patch fixes all of the 128 bits and below problems for every port, and if certain changes are made to the port (see the part of the diff relating to the RS6000 for an example) can support integers of any size. Currently there are a few public ports that have some code that uses OImode. The usage of OImode in those ports is currently limited. I personally do not know if it is because the architectures make very limited use of OImode, or if the port maintainers could not justify doing the work that we have done here. However, when this patch is accepted, GCC will be able support any length integer for any kind of operation. Kenny On 11/21/2013 05:08 PM, Mike Stump wrote: This patch adds support for ints wider than double_int. Ok?
Re: three problems with stor-layout.c.
committed as revision 205260. thanks kenny On 11/22/2013 03:58 AM, Richard Biener wrote: On Thu, 21 Nov 2013, Kenneth Zadeck wrote: Richi, Here is the patch. As you can see, i chose the unsigned option. It was bootstrapped and tested on x86 with all languages including ada. Ok to commit? Ok. Thanks, Richard. kenny 2013-11-21 zad...@naturalbridge.com * store-layout.c (place-field): Fix hwi test and accessor mismatch. On 11/21/2013 11:07 AM, Eric Botcazou wrote: I think most of these are because in the past (yes I have fixed that!!) all 'sizetype' constants were sign-extended (and the signedness, that is, TYPE_UNSIGNED (sizetype), was frontend dependend (ugh) and then later true, thus unsigned). So I think all _SIZE stuff should check fits_uhwi_p and be used as uhwi. But that may have ripple-down effects, so consistently using fits_shwi_p and using as shwi is also fine (it just restricts the maximum values we accept(?)). And please make sure to test Ada if you're tweaking this, it's fragile stuff. Index: gcc/stor-layout.c === --- gcc/stor-layout.c(revision 205259) +++ gcc/stor-layout.c(working copy) @@ -1204,7 +1204,7 @@ place_field (record_layout_info rli, tre unsigned int type_align = TYPE_ALIGN (type); tree dsize = DECL_SIZE (field); HOST_WIDE_INT field_size = tree_to_uhwi (dsize); - HOST_WIDE_INT offset = tree_to_shwi (rli->offset); + HOST_WIDE_INT offset = tree_to_uhwi (rli->offset); HOST_WIDE_INT bit_offset = tree_to_shwi (rli->bitpos); #ifdef ADJUST_FIELD_ALIGN @@ -1248,7 +1248,7 @@ place_field (record_layout_info rli, tre unsigned int type_align = TYPE_ALIGN (type); tree dsize = DECL_SIZE (field); HOST_WIDE_INT field_size = tree_to_uhwi (dsize); - HOST_WIDE_INT offset = tree_to_shwi (rli->offset); + HOST_WIDE_INT offset = tree_to_uhwi (rli->offset); HOST_WIDE_INT bit_offset = tree_to_shwi (rli->bitpos); #ifdef ADJUST_FIELD_ALIGN @@ -1304,7 +1304,7 @@ place_field (record_layout_info rli, tre && !integer_zerop (DECL_SIZE (field)) && !integer_zerop (DECL_SIZE (rli->prev_field)) && tree_fits_shwi_p (DECL_SIZE (rli->prev_field)) - && tree_fits_shwi_p (TYPE_SIZE (type)) + && tree_fits_uhwi_p (TYPE_SIZE (type)) && simple_cst_equal (TYPE_SIZE (type), TYPE_SIZE (prev_type))) { /* We're in the middle of a run of equal type size fields; make
Re: three problems with stor-layout.c.
Richi, Here is the patch. As you can see, i chose the unsigned option. It was bootstrapped and tested on x86 with all languages including ada. Ok to commit? kenny 2013-11-21 zad...@naturalbridge.com * store-layout.c (place-field): Fix hwi test and accessor mismatch. On 11/21/2013 11:07 AM, Eric Botcazou wrote: I think most of these are because in the past (yes I have fixed that!!) all 'sizetype' constants were sign-extended (and the signedness, that is, TYPE_UNSIGNED (sizetype), was frontend dependend (ugh) and then later true, thus unsigned). So I think all _SIZE stuff should check fits_uhwi_p and be used as uhwi. But that may have ripple-down effects, so consistently using fits_shwi_p and using as shwi is also fine (it just restricts the maximum values we accept(?)). And please make sure to test Ada if you're tweaking this, it's fragile stuff. Index: gcc/stor-layout.c === --- gcc/stor-layout.c (revision 205220) +++ gcc/stor-layout.c (working copy) @@ -1204,7 +1204,7 @@ place_field (record_layout_info rli, tre unsigned int type_align = TYPE_ALIGN (type); tree dsize = DECL_SIZE (field); HOST_WIDE_INT field_size = tree_to_uhwi (dsize); - HOST_WIDE_INT offset = tree_to_shwi (rli->offset); + HOST_WIDE_INT offset = tree_to_uhwi (rli->offset); HOST_WIDE_INT bit_offset = tree_to_shwi (rli->bitpos); #ifdef ADJUST_FIELD_ALIGN @@ -1248,7 +1248,7 @@ place_field (record_layout_info rli, tre unsigned int type_align = TYPE_ALIGN (type); tree dsize = DECL_SIZE (field); HOST_WIDE_INT field_size = tree_to_uhwi (dsize); - HOST_WIDE_INT offset = tree_to_shwi (rli->offset); + HOST_WIDE_INT offset = tree_to_uhwi (rli->offset); HOST_WIDE_INT bit_offset = tree_to_shwi (rli->bitpos); #ifdef ADJUST_FIELD_ALIGN @@ -1304,7 +1304,7 @@ place_field (record_layout_info rli, tre && !integer_zerop (DECL_SIZE (field)) && !integer_zerop (DECL_SIZE (rli->prev_field)) && tree_fits_shwi_p (DECL_SIZE (rli->prev_field)) - && tree_fits_shwi_p (TYPE_SIZE (type)) + && tree_fits_uhwi_p (TYPE_SIZE (type)) && simple_cst_equal (TYPE_SIZE (type), TYPE_SIZE (prev_type))) { /* We're in the middle of a run of equal type size fields; make
Re: Some wide-int review comments
On 11/13/2013 04:59 AM, Richard Biener wrote: On Tue, Nov 12, 2013 at 5:43 PM, Kenneth Zadeck wrote: On 11/12/2013 11:27 AM, Joseph S. Myers wrote: On Tue, 12 Nov 2013, Kenneth Zadeck wrote: Richi, i am having a little trouble putting this back the way that you want. The issue is rem. what is supposed to happen for INT_MIN % -1? I would assume because i am failing the last case of gcc.dg/c90-const-expr-8.c that INT_MIN %-1 should not overflow even if INT_MIN / -1 does. however, Given the conclusion in C11 that a%b should be considered undefined if a/b is not representable, I think it's reasonable to say INT_MIN % -1 *should* be considered to overflow (for all C standard versions) (and bug 30484 is only a bug for -fwrapv). however, my local question is what do we want the api to be int-const-binop-1?The existing behavior seems to be that at least for common modes this function silently returns 0 and it is up to the front ends to put their own spin on it. For wide-int you create 1:1 the behavior of current trunk (if a change of behavior in TImode is not tested in the testsuite then you can ignore that). Whatever change you do to semantics of functions you do separately from wide-int (preferably first on trunk, or at your choice after the wide-int merge). For this case in question I'd say a % -1 should return 0, but for INT_MIN % -1 that 0 should have TREE_OVERFLOW set (and thus you need to adjust that c90-const-expr-8.c testcase). Richard. kenny richi, I have done this exactly as you suggested. bootstrapped and regression tested on x86-64. 2013-11-20 Kenneth Zadeck * fold-const.c (int_const_binop_1): Make INT_MIN % -1 return 0 with the overflow bit set. 2013-11-20 Kenneth Zadeck * gcc.dg/c90-const-expr-8.c: Look for overflow on INT_MIN % -1. * gcc.dg/c99-const-expr-8.c: Look for overflow on INT_MIN % -1. ok to commit? kenny Index: gcc/fold-const.c === --- gcc/fold-const.c (revision 205161) +++ gcc/fold-const.c (working copy) @@ -1105,7 +1105,22 @@ int_const_binop_1 (enum tree_code code, case ROUND_MOD_EXPR: if (op2.is_zero ()) return NULL_TREE; - tmp = op1.divmod_with_overflow (op2, uns, code, &res, &overflow); + + /* Check for the case the case of INT_MIN % -1 and return + overflow and result = 0. The TImode case is handled properly + in double-int. */ + if (TYPE_PRECISION (type) <= HOST_BITS_PER_WIDE_INT + && !uns + && op2.is_minus_one () + && op1.high == (HOST_WIDE_INT) -1 + && (HOST_WIDE_INT) op1.low + == (((HOST_WIDE_INT)-1) << (TYPE_PRECISION (type) - 1))) + { + overflow = 1; + res = double_int_zero; + } + else + tmp = op1.divmod_with_overflow (op2, uns, code, &res, &overflow); break; case MIN_EXPR: Index: gcc/testsuite/gcc.dg/c90-const-expr-8.c === --- gcc/testsuite/gcc.dg/c90-const-expr-8.c (revision 205161) +++ gcc/testsuite/gcc.dg/c90-const-expr-8.c (working copy) @@ -23,5 +23,6 @@ enum e { /* { dg-error "3:overflow in constant expression" "constant" { target *-*-* } 22 } */ E6 = 0 * !-INT_MIN, /* { dg-warning "13:integer overflow in expression" } */ /* { dg-error "8:not an integer constant" "constant" { target *-*-* } 24 } */ - E7 = INT_MIN % -1 /* Not an overflow. */ + E7 = INT_MIN % -1 /* { dg-warning "16:integer overflow in expression" } */ + /* { dg-error "1:overflow in constant expression" "constant" { target *-*-* } 28 } */ }; Index: gcc/testsuite/gcc.dg/c99-const-expr-8.c === --- gcc/testsuite/gcc.dg/c99-const-expr-8.c (revision 205161) +++ gcc/testsuite/gcc.dg/c99-const-expr-8.c (working copy) @@ -23,5 +23,6 @@ enum e { /* { dg-error "overflow in constant expression" "constant" { target *-*-* } 22 } */ E6 = 0 * !-INT_MIN, /* { dg-warning "integer overflow in expression" } */ /* { dg-error "not an integer constant" "constant" { target *-*-* } 24 } */ - E7 = INT_MIN % -1 /* Not an overflow. */ + E7 = INT_MIN % -1 /* { dg-warning "16:integer overflow in expression" } */ + /* { dg-error "1:overflow in constant expression" "constant" { target *-*-* } 28 } */ };