Re: [PATCH] Fix PR58775
On 18 October 2013 19:09, Jakub Jelinek wrote: > On Fri, Oct 18, 2013 at 06:36:44PM +0800, Zhenqiang Chen wrote: > --- a/gcc/tree-ssa-reassoc.c > +++ b/gcc/tree-ssa-reassoc.c > @@ -2861,6 +2861,19 @@ swap_ops_for_binary_stmt (vec ops, > } > } > > +/* Determine if stmt A is in th next list of stmt B. */ > +static inline bool > +next_stmt_of (gimple a, gimple b) > +{ > + gimple_stmt_iterator gsi; > + for (gsi = gsi_for_stmt (b); !gsi_end_p (gsi); gsi_next (&gsi)) > +{ > + if (gsi_stmt (gsi) == a) > + return true; > +} > + return false; > +} > > How is that different from appears_later_in_bb? More importantly, > not_dominated_by shouldn't be called with the same uid and basic block, > unless the stmts are the same, except for the is_gimple_debug case > (which looks like a bug). And more importantly, if a stmt has zero uid, > we'd better set it from the previous or next non-zero uid stmt, so that > we don't lineary traverse the whole bb many times. Thanks for the comments. Patch is updated to set uid in update_range_test after force_gimple_operand_gsi. I am not sure the patch can cover all cases. If not, I think force_gimple_operand_gsi_1 maybe the only place to set uid for all new generated statements. Thanks! -Zhenqiang > That said, I've lost track of the bugfixes for this, don't know if there is > any patch still pending or not. > > Jakub pr58775-2.patch Description: Binary data
Re: Honnor ix86_accumulate_outgoing_args again
On 13-10-19 4:30 PM, Jan Hubicka wrote: Jan, Does this seem reasonable to you? Oops, sorry, I missed your email. (I was travelling and I am finishing a paper now). Thanks, Igor -Original Message- From: Zamyatin, Igor Sent: Tuesday, October 15, 2013 3:48 PM To: Jan Hubicka Subject: RE: Honnor ix86_accumulate_outgoing_args again Jan, Now we have following prologue in, say, phi0 routine in equake 0x804aa90 1 push %ebp 0x804aa91 2 mov%esp,%ebp 0x804aa93 3 sub$0x18,%esp 0x804aa96 4 vmovsd 0x80ef7a8,%xmm0 0x804aa9e 5 vmovsd 0x8(%ebp),%xmm1 0x804aaa3 6 vcomisd %xmm1,%xmm0 <-- we see big stall somewhere here or 1-2 instructions above While earlier it was 0x804abd0 1 sub$0x2c,%esp 0x804abd3 2 vmovsd 0x30(%esp),%xmm1 0x804abd9 3 vmovsd 0x80efcc8,%xmm0 0x804abe1 4 vcomisd %xmm1,%xmm0 Thanks for analysis! It is a different benchmark than for bulldozer, but apparently same case. Again we used to eliminate frame pointer here but IRS now doesn't Do you see the same regression with -fno-omit-frame-pointer -maccumulate-outgoing-args? I suppose this is a conflict in between the push instruction hanled by stack engine and initialization of EBP that isn't. That would explain why bulldozer don't seem to care about this particular benchmark (its stack engine seems to have quite different design). This is a bit sad situation - accumulate-outgoing-args is expensive code size wise and it seems we don't really need esp with -mno-accumulate-outgoing-args. The non-accumulation code path was mistakely disabled for too long ;( Vladimir, how much effort do you think it will be to fix the frame pointer elimination here? My guess is a week. The problem I am busy and having some problems with two small projects right now which I'd like to include into gcc-4.9. But I think, this still can be fixed on stage2 as it is a PR. I can imagine it is a quite tricky case. If so I would suggest adding m_CORE_ALL to X86_TUNE_ACCUMULATE_OUTGOING_ARGS with a comment explaining the problem and mentioning the regression on equake on core and mgrid on Bulldizer and opening an enhancement request for this... I also wonder if direct ESP use and push/pop instructions are causing so noticeable issues, I wonder if we can't "shrink wrap" this into red-zone in the 64bit compilation. It seems that even with -maccumulate-outgoing-args pushing the frame allocation as late as possible in the function would be a good idea so it is not close to the push/pop/call/ret.
Re: patch to enable LRA for ppc
On 13-10-18 11:26 AM, David Edelsohn wrote: On Thu, Oct 3, 2013 at 5:02 PM, Vladimir Makarov wrote: The following patch permits today trunk to use LRA for ppc by default. To switch it off -mno-lra can be used. The patch was bootstrapped on ppc64. GCC testsuite does not have regressions too (in comparison with reload). The change in rs6000.md is for fix LRA failure on a recently added ppc test. Vlad, I have not forgotten this patch. We are trying to figure out the right timeframe to make this change. The patch does affect performance -- both positively and negatively; most are in the noise but not all. And there still are some SPEC benchmarks that fail to build with the patch, at least in Mike's tests. And Mike is implementing some patches to utilize reload to improve use of VSX registers, which would need to be mirrored in LRA for the equivalent functionality. Thanks for informing me, David. I am ready to work on any LRA ppc issues when it will be in the trunk. It would be easier for me to work on LRA ppc if the patch is committed to the trunk and of course LRA is used as non-default local RA. I don't know what Mike is doing on reload to use VSX registers. I guess it is usage of VSX regs as spilled locations for GENERAL regs instead of memory. If it is so, it is 2 day work to add this functionality in LRA (as it already has analogous functionality for Intel processors and that gave a nice SPECFP2000 improvement for them) and probably more work on resolving issues especially as I have no power8.
Re: FW: MAX_PATH problems with mingw gcc
Vladimir, I found no more issue on patch itself. But ChangeLogs are missing. Please refer to format in /include/ChangeLog, and generate yours in your next email for /include /libcpp /libiberty Maintainers of libiberty are in TO list. - Joey
Re: [PATCH] reimplement -fstrict-volatile-bitfields v4, part 1/2
On 10/18/2013 10:38 AM, Richard Biener wrote: Sandra Loosemore wrote: On 10/18/2013 04:50 AM, Richard Biener wrote: On Sat, Sep 28, 2013 at 4:19 AM, Sandra Loosemore wrote: This patch fixes various -fstrict-volatile-bitfields wrong-code bugs, including instances where bitfield values were being quietly truncated as well as bugs relating to using the wrong width. The code changes are identical to those in the previous version of this patch series (originally posted in June and re-pinged many times since then), but for this version I have cleaned up the test cases to remove dependencies on header files, as Bernd requested. Ok. Just to clarify, is this approval unconditional, independent of part 2 and other patches or changes still under active discussion? Yes. Hr. After some further testing, I'm afraid this patch is still buggy. :-( I tried a backport to GCC 4.8 and tested on arm-none-eabi. On the new pr56997-1.c testcase, it got stuck in infinite recursion between store_split_bit_field/store_fixed_bit_field and/or extract_split_bit_field/extract_fixed_bit_field. This didn't show up in my previous mainline testing. The difference between 4.8 and mainline head is the alignment of the incoming str_rtx passed to store_bit_field/extract_bit_field, due to the changes in r199898. The alignment is 8 bits on mainline, and 32 on 4.8. It seems to me that the bitfield code ought to handle store/extract from a more-aligned object and it's probably possible to construct an example that fails in this way on mainline as well. It looks like there are conflicting assumptions between get_best_mode, the places that call it in store_fixed_bit_field and extract_fixed_bit_field, and the code that actually does the splitting -- which uses a unit based on the MEM_ALIGN of the incoming rtx, rather than its mode. In the case where it's failing, get_best_mode is deciding it can't do a HImode access without splitting, but then the split code is assuming SImode units because of the 32-bit alignment, but not actually changing the mode of the rtx to match that. On top of that, this is one of the cases that strict_volatile_bitfield_p checks for and returns false, but the callers of store_bit_field and extract_bit_field in expr.c have already fiddled with the mode of the incoming rtx assuming that -fstrict-volatile-bitfields does apply. It doesn't get into that infinite recursion if it's compiled with -fno-strict-volatile-bitfields instead; in that case the incoming rtx has BLKmode, get_best_mode chooses SImode, and it's able to do the access without splitting at all. Anyway I tried a couple different bandaids that solved the infinite recursion problem but caused regressions elsewhere, and now I'm not sure of the right place to fix this. Given that there is also still ongoing discussion about making this a 3-way switch (etc) I am going to hold off on committing this patch for now. -Sandra
[PR libstdc++/58804][PR libstdc++/58729] tr2/dynamic_bitset issues.
Greetings. Here is a patch to correct tr2/dynamic_bitset to use __builtin_xxxll for long long instead of the long versions. Relevant bugs: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58804 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58729 Builds and *really* tests clean on x86_64-linux. OK? Index: include/tr2/dynamic_bitset === --- include/tr2/dynamic_bitset (revision 203841) +++ include/tr2/dynamic_bitset (working copy) @@ -287,7 +287,7 @@ if (_M_w[__i] != ~static_cast(0)) return 0; return ((this->_M_w.size() - 1) * _S_bits_per_block - + __builtin_popcountl(this->_M_hiword())); + + __builtin_popcountll(this->_M_hiword())); } bool @@ -332,7 +332,7 @@ { size_t __result = 0; for (size_t __i = 0; __i < this->_M_w.size(); ++__i) - __result += __builtin_popcountl(this->_M_w[__i]); + __result += __builtin_popcountll(this->_M_w[__i]); return __result; } Index: include/tr2/dynamic_bitset.tcc === --- include/tr2/dynamic_bitset.tcc (revision 203841) +++ include/tr2/dynamic_bitset.tcc (working copy) @@ -131,7 +131,7 @@ _WordT __thisword = this->_M_w[__i]; if (__thisword != static_cast<_WordT>(0)) return (__i * _S_bits_per_block - + __builtin_ctzl(__thisword)); + + __builtin_ctzll(__thisword)); } // not found, so return an indication of failure. return __not_found; @@ -158,7 +158,7 @@ if (__thisword != static_cast<_WordT>(0)) return (__i * _S_bits_per_block - + __builtin_ctzl(__thisword)); + + __builtin_ctzll(__thisword)); // check subsequent words for (++__i; __i < this->_M_w.size(); ++__i) @@ -166,7 +166,7 @@ __thisword = this->_M_w[__i]; if (__thisword != static_cast<_WordT>(0)) return (__i * _S_bits_per_block - + __builtin_ctzl(__thisword)); + + __builtin_ctzll(__thisword)); } // not found, so return an indication of failure. return __not_found; 2013-10-20 Edward Smith-Rowland <3dw...@verizon.net> PR libstdc++/58804 PR libstdc++/58729 * include/tr2/dynamic_bitset (__dynamic_bitset_base<_WordT, _Alloc>::_M_are_all_aux, __dynamic_bitset_base<_WordT, _Alloc>::_M_do_count): Use __builtin_popcountll() instead of __builtin_popcountl(). * include/tr2/dynamic_bitset.tcc (__dynamic_bitset_base<_WordT, _Alloc>::_M_do_find_first, __dynamic_bitset_base<_WordT, _Alloc>::_M_do_find_next): Use __builtin_ctzll() instead of __builtin_ctzl().
Re: [PATCH, SH] Add support for inlined builtin-strcmp (2/2)
Christian Bruel wrote: > thanks for having retested this, The tests are still not complete for > RTL generated functions, there are cases where no str/cmp wil be > emitted, because we can predict than the size is less than 4 and so have > a direct fallthru into the byte at a byte loop copying. > > Also I will consider a size optimized implementation, so we don't jump > to the library > > I will post examples for this shortly (and add them as a testcase) with > a strncmp implementation helper, that pertains to strcmp with constant > strings by the way. Please allow me some time to complete by benchmarking. > > thanks for the hints about removing empty "" is the expanders > > Kaz, before proceeding with the next patch, was your approval for 1/2 > only or 2/2 with the expander cleanup ? Only for 1/2. The revised 2/2 patch is pre-approved if the usual tests are Ok, though. One minor formatting problem: >+ if (! optimize_insn_for_size_p () && sh4_expand_cmpstr(operands)) >+ DONE; A space is needed just before (operands). Regards, kaz
Re: User-define literals for std::complex.
On 09/27/2013 05:39 AM, Jonathan Wakely wrote: On 27 September 2013 05:17, Ed Smith-Rowland wrote: The complex user-defined literals finally passed (n3779) with the resolution to DR1473 allowing the suffix id to touch the quotes (Can't find it but I put it in not too long ago). I think it's been approved by the LWG and looks like it will go to a vote by the full committee, but let's wait for that to pass before making any changes. Now that this is in the working paper can we go ahead?
Re: [PATCH]Fix computation of offset in ivopt
On Fri, Oct 18, 2013 at 7:18 PM, Richard Biener wrote: > On Thu, Oct 17, 2013 at 7:52 AM, bin.cheng wrote: >> Hi, >> As noted in previous messages, GCC forces offset to unsigned in middle end. >> It also gets offset value and stores it in HOST_WIDE_INT then uses it in >> various computation. In this scenario, It should use int_cst_value to do >> additional sign extension against the type of tree node, otherwise we might >> lose sign information. Take function fold_plusminus_mult_expr as an >> example, the sign information of arg01/arg11 might be lost because it uses >> TREE_INT_CST_LOW directly. I think this is the offender of the problem in >> this thread. I think we should fix the offender, rather the hacking >> strip_offset as in the original patch, so I split original patch into two as >> attached, with one fixing the offset of COMPONENT_REF in strip_offset_1, the >> other fixing the mentioned sign extension problem. > > Index: gcc/fold-const.c > === > --- gcc/fold-const.c (revision 203267) > +++ gcc/fold-const.c (working copy) > @@ -7270,8 +7270,8 @@ fold_plusminus_mult_expr (location_t loc, enum tre >HOST_WIDE_INT int01, int11, tmp; >bool swap = false; >tree maybe_same; > - int01 = TREE_INT_CST_LOW (arg01); > - int11 = TREE_INT_CST_LOW (arg11); > + int01 = int_cst_value (arg01); > + int11 = int_cst_value (arg11); > > this is not correct - it will mishandle all large unsigned numbers. As far as the patch itself. I think the preceding host_integerp already checks for this case: int host_integerp (const_tree t, int pos) { if (t == NULL_TREE) return 0; return (TREE_CODE (t) == INTEGER_CST && ((TREE_INT_CST_HIGH (t) == 0 && (HOST_WIDE_INT) TREE_INT_CST_LOW (t) >= 0) || (! pos && TREE_INT_CST_HIGH (t) == -1 && (HOST_WIDE_INT) TREE_INT_CST_LOW (t) < 0 && !TYPE_UNSIGNED (TREE_TYPE (t))) || (pos && TREE_INT_CST_HIGH (t) == 0))); } Since the call is host_integerp (xxx, 0), it returns 0 for large unsigned numbers, right? > > The real issue is that we rely on twos-complement arithmetic to work > when operating on pointer offsets (because we convert them all to > unsigned sizetype). That makes interpreting the offsets or expressions > that compute offsets hard (or impossible in some cases), as you can > see from the issues in IVOPTs. > > The above means that strip_offset_1 cannot reliably look through > MULT_EXPRs as that can make an offset X * CST that is > "effectively" signed "surely" unsigned in the process. Think of > this looking into X * CST as performing a division - whose result > is dependent on the sign of X which we lost with our unsigned > casting. Now, removing the MULT_EXPR look-through might > be too pessimizing ... but it may be worth trying to see if it fixes > your issue? In this context I also remember a new bug filed > recently of us not folding x /[ex] 4 * 4 to x. > > In the past I was working multiple times on stopping doing that > (make all offsets unsigned), but I never managed to finish ... > > Richard. > >> Bootstrap and test on x86/x86_64/arm. Is it OK? >> >> Thanks. >> bin >> >> Patch a: >> 2013-10-17 Bin Cheng >> >> * tree-ssa-loop-ivopts.c (strip_offset_1): Change parameter type. >> Count DECL_FIELD_BIT_OFFSET when computing offset for COMPONENT_REF. >> >> Patch b: >> 2013-10-17 Bin Cheng >> >> * fold-const.c (fold_plusminus_mult_expr): Use int_cst_value instead >> of TREE_INT_CST_LOW. >> >> gcc/testsuite/ChangeLog >> 2013-10-17 Bin Cheng >> >> * gcc.dg/tree-ssa/ivopts-outside-loop-use-1.c: New test. >> >>> -Original Message- >>> From: Richard Biener [mailto:richard.guent...@gmail.com] >>> Sent: Wednesday, October 02, 2013 4:32 PM >>> To: Bin.Cheng >>> Cc: Bin Cheng; GCC Patches >>> Subject: Re: [PATCH]Fix computation of offset in ivopt >>> >>> On Tue, Oct 1, 2013 at 6:13 PM, Bin.Cheng wrote: >>> > On Tue, Oct 1, 2013 at 6:50 PM, Richard Biener >>> > wrote: >>> >> On Mon, Sep 30, 2013 at 7:39 AM, bin.cheng >>> wrote: >>> >>> >>> >>> >>> >> >>> >> I don't think you need >>> >> >>> >> + /* Sign extend off if expr is in type which has lower precision >>> >> + than HOST_WIDE_INT. */ >>> >> + if (TYPE_PRECISION (TREE_TYPE (expr)) <= HOST_BITS_PER_WIDE_INT) >>> >> +off = sext_hwi (off, TYPE_PRECISION (TREE_TYPE (expr))); >>> >> >>> >> at least it would be suspicious if you did ... >>> > There is a problem for example of the first message. The iv base if >> like: >>> > pretmp_184 + ((sizetype) KeyIndex_180 + 1073741823) * 4 I am not sure >>> > why but it seems (-4/0xFFFC) is represented by (1073741823*4). >>> > For each operand strip_offset_1 returns exactly the positive number >>> > and result of multiplication never get its chance of sign extend. >>> > That's why the sign extend is necessary to fix the problem. Or it >>> > shou
[wwwdocs] steering.html -- update list of members
This updates the list of steering committee members after Joe stepped down a bit ago. Committed. Gerald Index: steering.html === RCS file: /cvs/gcc/wwwdocs/htdocs/steering.html,v retrieving revision 1.35 diff -u -3 -p -r1.35 steering.html --- steering.html 12 Feb 2012 21:15:16 - 1.35 +++ steering.html 20 Oct 2013 19:42:32 - @@ -26,7 +26,6 @@ committee. place to reach them is the gcc mailinglist. -Joe Buck (Synopsys) David Edelsohn (IBM) Kaveh R. Ghazi Jeffrey A. Law (Red Hat)
[C++ Patch] PR 58647
Hi, this ICE on valid, [4.7/4.8/4.9 Regression], happens only with -std=c++11 and is quite simple to analyze: cxx_eval_constant_expression sees a COMPONENT_REF and forwards to cxx_eval_component_reference, but the latter, evidently, only handles "fields" not functions (per the comment) and an ICE soon happens when it tries to use DECL_MUTABLE_P. If, for comparison, instead of a static member function, we have a static free function, like in, eg static void foo(); template void bar() { foo; } what happens is that cxx_eval_constant_expression sees a FUNCTION_DECL and returns it as-is. Thus I believe that doing the same in the case at issue should be fine. Tested x86_64-linux. Thanks, Paolo. / /cp 2013-10-20 Paolo Carlini PR c++/58647 * semantics.c (cxx_eval_constant_expression, [COMPONENT_REF]): Handle functions as TREE_OPERAND (t, 1). /testsuite 2013-10-20 Paolo Carlini PR c++/58647 * g++.dg/parse/crash66.C: New. Index: cp/semantics.c === --- cp/semantics.c (revision 203876) +++ cp/semantics.c (working copy) @@ -9553,6 +9558,8 @@ cxx_eval_constant_expression (const constexpr_call break; case COMPONENT_REF: + if (is_overloaded_fn (TREE_OPERAND (t, 1))) + return t; r = cxx_eval_component_reference (call, t, allow_non_constant, addr, non_constant_p, overflow_p); break; Index: testsuite/g++.dg/parse/crash66.C === --- testsuite/g++.dg/parse/crash66.C(revision 0) +++ testsuite/g++.dg/parse/crash66.C(working copy) @@ -0,0 +1,11 @@ +// PR c++/58647 + +struct A +{ + static void foo(); +}; + +template void bar() +{ + A().foo; +}
Re: [patch] Cleanup tree-ssa-ter.c exports
> Unfortunately, tree-ssa-ter.c also has 2 functions > (find_replaceable_exprs() and dump_replaceable_exprs()) which are > exported and utilized by tree-outof-ssa.c (the file is a part of the > out-of-ssa module). So I moved the prototypes from tree-ssa-live.h into > a newly created tree-ssa-ter.h file, and included it directly from > tree-outof-ssa.c, the only consumer. Apparently something went wrong when tree-ssa-ter.h was created. -- Eric Botcazou
Re: [wide-int] Reduce the size of the scratch arrays
Kenneth Zadeck writes: > On 10/20/2013 06:30 AM, Richard Sandiford wrote: >> If yesterday's patch goes in, we'll only need the scratch array for >> HWI inputs. We can therefore shrink it to 2 HWIs. >> >> Tested on x86_64-linux-gnu. OK for wide-int? >> >> Thanks, >> Richard >> >> >> Index: gcc/wide-int.h >> === >> --- gcc/wide-int.h 2013-10-19 20:08:15.571987579 +0100 >> +++ gcc/wide-int.h 2013-10-19 20:09:12.299436972 +0100 >> @@ -778,7 +778,7 @@ struct wide_int_ref_storage : public wi: >> private: >> /* Scratch space that can be used when decomposing the original integer. >>It must live as long as this object. */ >> - HOST_WIDE_INT scratch[WIDE_INT_MAX_ELTS]; >> + HOST_WIDE_INT scratch[2]; >> >> public: >> template > we now allow partial int precisions that are any size. this will break > that. Note that this was only "if yesterday's patch goes in". That patch removes the use of the scratch array from the tree decompose routine. We already removed the scratch array from the rtx decompose routine. So the only decompose routines left that need scratch space are those for pure HWI inputs. Thanks, Richard
Re: [wide-int] Reduce the size of the scratch arrays
On 10/20/2013 06:30 AM, Richard Sandiford wrote: If yesterday's patch goes in, we'll only need the scratch array for HWI inputs. We can therefore shrink it to 2 HWIs. Tested on x86_64-linux-gnu. OK for wide-int? Thanks, Richard Index: gcc/wide-int.h === --- gcc/wide-int.h 2013-10-19 20:08:15.571987579 +0100 +++ gcc/wide-int.h 2013-10-19 20:09:12.299436972 +0100 @@ -778,7 +778,7 @@ struct wide_int_ref_storage : public wi: private: /* Scratch space that can be used when decomposing the original integer. It must live as long as this object. */ - HOST_WIDE_INT scratch[WIDE_INT_MAX_ELTS]; + HOST_WIDE_INT scratch[2]; public: template we now allow partial int precisions that are any size. this will break that. kenny
Re: [PATCH, rs6000] Adjust vec_unpacku patterns for little endian
On Sun, Oct 20, 2013 at 12:00 AM, Bill Schmidt wrote: > Hi, > > For little endian, the permute control vector for unpacking high and low > halves of a vector register must be reversed from the one used for big > endian. Fixing this corrects 27 failing tests for > powerpc64le-unknown-linux-gnu. > > Bootstrapped and tested for powerpc64{,le}-unknown-linux-gnu with no new > regressions. Is this ok for trunk? > > Thanks, > Bill > > > 2013-10-19 Bill Schmidt > > * altivec.md (vec_unpacku_hi_v16qi): Adjust for little endian. > (vec_unpacku_hi_v8hi): Likewise. > (vec_unpacku_lo_v16qi): Likewise. > (vec_unpacku_lo_v8hi): Likewise. Okay. Thanks, David
Re: changing a collision resolution strategy of the symbol table of identifiers
On Sun, Oct 20, 2013 at 06:55:40PM +0600, Roman Gareev wrote: > Dear gcc contributors, > > Recently I came across the list of "ideas for speeding up GCC" > (http://gcc.gnu.org/wiki/Speedup_areas). Among others, there was > suggested to replace identifier hash table with other data structure. > > Please find attached a patch, that switches the resolution strategy of > a hash table used in the symbol table of identifiers from open > addressing to separate chaining with linked list. I would be very > grateful for your comments, feedback and ideas about further > enhancements to gcc, in which I could take a part. I am pursuing > master of science and looking for thesis theme, that is compiler > related. > > Below are results of testing and the patch. > > During testing of the linux kernel (3.8.8) compilation time, the > acquired results were the following: increasing by 0.17% for the > version 4.8.0, increasing by 1.12% for the version 4.8.1, decreasing > by 0.598% for trunk (this are average values). > > > Tested x86_64-unknown-linux-gnu, applying to 4.8.0, 4.8.1 and trunk. > > > diff --git a/gcc/stringpool.c b/gcc/stringpool.c > > index f4d0dae..cc696f7 100644 > > --- a/gcc/stringpool.c > > +++ b/gcc/stringpool.c > > @@ -241,12 +241,8 @@ gt_pch_nx (unsigned char *x, gt_pointer_operator > op, void *cookie) > > /* SPD is saved in the PCH file and holds the information needed > > to restore the string pool. */ > > -struct GTY(()) string_pool_data { > > - ht_identifier_ptr * > > - GTY((length ("%h.nslots"), > Mail client broke it?
changing a collision resolution strategy of the symbol table of identifiers
Dear gcc contributors, Recently I came across the list of "ideas for speeding up GCC" (http://gcc.gnu.org/wiki/Speedup_areas). Among others, there was suggested to replace identifier hash table with other data structure. Please find attached a patch, that switches the resolution strategy of a hash table used in the symbol table of identifiers from open addressing to separate chaining with linked list. I would be very grateful for your comments, feedback and ideas about further enhancements to gcc, in which I could take a part. I am pursuing master of science and looking for thesis theme, that is compiler related. Below are results of testing and the patch. During testing of the linux kernel (3.8.8) compilation time, the acquired results were the following: increasing by 0.17% for the version 4.8.0, increasing by 1.12% for the version 4.8.1, decreasing by 0.598% for trunk (this are average values). Tested x86_64-unknown-linux-gnu, applying to 4.8.0, 4.8.1 and trunk. diff --git a/gcc/stringpool.c b/gcc/stringpool.c index f4d0dae..cc696f7 100644 --- a/gcc/stringpool.c +++ b/gcc/stringpool.c @@ -241,12 +241,8 @@ gt_pch_nx (unsigned char *x, gt_pointer_operator op, void *cookie) /* SPD is saved in the PCH file and holds the information needed to restore the string pool. */ -struct GTY(()) string_pool_data { - ht_identifier_ptr * - GTY((length ("%h.nslots"), - nested_ptr (union tree_node, "%h ? GCC_IDENT_TO_HT_IDENT (%h) : NULL", - "%h ? HT_IDENT_TO_GCC_IDENT (%h) : NULL"))) - entries; +struct GTY((user)) string_pool_data { + ht_identifier_ptr * entries; unsigned int nslots; unsigned int nelements; }; @@ -284,4 +280,113 @@ gt_pch_restore_stringpool (void) spd = NULL; } +/* User-provided GC marking routines for the struct ht_identifier. They are using + only in GC marking routines for the struct string_pool_data. */ + +extern void ht_identifier_ggc_mx (void *x_p); +extern void ht_identifier_pch_nx (void *x_p); +extern void ht_identifier_pch_nx (void *this_obj, void *x_p, gt_pointer_operator op, void *cookie); + +void +ht_identifier_ggc_mx (void *x_p) +{ + struct ht_identifier * x = (struct ht_identifier *)x_p; + struct ht_identifier * xlimit = x; + while (ggc_test_and_set_mark (xlimit)) + xlimit = ((*xlimit).next); + while (x != xlimit) + { + union tree_node * const x0 = ((*x).next) ? HT_IDENT_TO_GCC_IDENT (((*x).next)) : NULL; + gt_ggc_m_9tree_node (x0); + x = ((*x).next); + } +} + +void +ht_identifier_pch_nx (void *x_p) +{ + struct ht_identifier * x = (struct ht_identifier *)x_p; + struct ht_identifier * xlimit = x; + while (gt_pch_note_object (xlimit, xlimit, ht_identifier_pch_nx)) + xlimit = ((*xlimit).next); + while (x != xlimit) + { + union tree_node * const x0 = ((*x).next) ? HT_IDENT_TO_GCC_IDENT (((*x).next)) : NULL; + gt_pch_n_9tree_node (x0); + x = ((*x).next); + } +} + +void +ht_identifier_pch_nx (void *this_obj, void *x_p, gt_pointer_operator op, void *cookie) +{ + struct ht_identifier * x = (struct ht_identifier *)x_p; + op (&((*x).str), cookie); + union tree_node * x0 = ((*x).next) ? HT_IDENT_TO_GCC_IDENT (((*x).next)) : NULL; + op (&(x0), cookie); + (*x).next = (x0) ? (GCC_IDENT_TO_HT_IDENT ((x0))) : NULL; +} + +/* A pointer walker for entries of the struct string_pool_data. */ + +void +string_pool_data_pch_entries_walker (void *this_obj, void *x_p, gt_pointer_operator op, void *cookie) +{ + struct string_pool_data * x ATTRIBUTE_UNUSED = (struct string_pool_data *)x_p; + size_t l0 = (size_t)(((*x)).nslots); + if ((*x).entries != NULL) + { + size_t i0; + for (i0 = 0; i0 != (size_t)(l0); i0++) + { + union tree_node * x0 = ((*x).entries[i0]) ? HT_IDENT_TO_GCC_IDENT (((*x).entries[i0])) : NULL; + op (&(x0), cookie); + (*x).entries[i0] = (x0) ? (GCC_IDENT_TO_HT_IDENT ((x0))) : NULL; + } + } +} + +/* User-provided GC marking routines for the struct string_pool_data. */ + +void +gt_ggc_mx (string_pool_data *x) +{ + size_t l0 = (size_t)(((*x)).nslots); + if ((*x).entries != NULL) + { + size_t i0; + for (i0 = 0; i0 != (size_t)(l0); i0++) + { + ht_identifier_ggc_mx ((*x).entries[i0]); + union tree_node * const x0 = ((*x).entries[i0]) ? HT_IDENT_TO_GCC_IDENT (((*x).entries[i0])) : NULL; + gt_ggc_m_9tree_node (x0); + } + ggc_mark ((*x).entries); + } +} + +void +gt_pch_nx (string_pool_data *x) +{ + size_t l0 = (size_t)(((*x)).nslots); + if ((*x).entries != NULL) + { + size_t i0; + for (i0 = 0; i0 != (size_t)(l0); i0++) + { + union tree_node * const x1 = ((*x).entries[i0]) ? HT_IDENT_TO_GCC_IDENT (((*x).entries[i0])) : NULL; + gt_pch_n_9tree_node (x1); + ht_identifier_pch_nx ((*x).entries[i0]); + } + gt_pch_note_object ((*x).entries, x, string_pool_data_pch_entries_walker); + } +} + +void +gt_pch_nx (string_pool_data *x, gt_pointer_operator op, void *cookie) +{ + if ((*x).entries != NULL) + op (&((*x).entries), cookie); +} + #include "gt-stringpool.h" diff --git a/libcpp/inc
RE: [PATCH] reimplement -fstrict-volatile-bitfields v4, part 2/2
Bernd Edlinger wrote: >Hi, > >>> What I would suggest is to have a -fgnu-strict-volatile-bit-fields >> >> Why a new option? The -fstrict-volatile-bitfields option is already >> GCC-specific, and I think it can do what you want anyway. > >As I understand Richard's comment, he proposes to >have an option for true AAPCS compliance, which will >be allowed to break the C++11 memory model and >which will _not_ be the default on any target. >Name it -fstrict-volatile-bitfields. > >And an option that addresses your requirements, >which will _not_ break the C++11 memory model >and which will be the default on some targets, >dependent on the respective ABI requirements. >Name it -fgnu-strict-volatile-bit-fields. Yes. You could also make it -fstrict-volatile-bitfields={off,gnu,aacps} if you think that's better. Richard. > >Bernd.
Re: [wide-int] Add a copy helper routine
Richard Sandiford wrote: >When the wi:: patch went in, Kenny rightly complained that there were >too many copy loops. This patch gets rid of an unnecessary loop in >wi::abs and consolidates the rest into a single function. Again, it >helps >the ::is_sign_extended patch. > >Tested on x86_64-linux-gnu. OK to install? Ok. Thanks, Richard. >Thanks, >Richard > > >Index: gcc/wide-int.h >=== >--- gcc/wide-int.h 2013-10-20 09:38:41.175501696 +0100 >+++ gcc/wide-int.h 2013-10-20 11:32:47.425353384 +0100 >@@ -347,6 +347,9 @@ typedef generic_wide_inttemplate > unsigned int get_binary_precision (const T1 &, const T2 &); > >+ template >+ void copy (T1 &, const T2 &); >+ > #define UNARY_PREDICATE \ > template bool > #define UNARY_FUNCTION \ >@@ -870,10 +873,7 @@ inline wide_int_storage::wide_int_storag > STATIC_ASSERT (!wi::int_traits::host_dependent_precision); > wide_int_ref xi (x); > precision = xi.precision; >- unsigned int l = xi.len; >- for (unsigned int i = 0; i < l; ++i) >-val[i] = xi.val[i]; >- set_len (l); >+ wi::copy (*this, xi); > } > > inline unsigned int >@@ -969,6 +969,21 @@ wi::int_traits ::get_b > return wide_int::create (wi::get_precision (x)); > } > >+/* Copy the contents of Y to X, but keeping X's current precision. */ >+template >+void >+wi::copy (T1 &x, const T2 &y) >+{ >+ HOST_WIDE_INT *xval = x.write_val (); >+ const HOST_WIDE_INT *yval = y.get_val (); >+ unsigned int len = y.get_len (); >+ unsigned int i = 0; >+ do >+xval[i] = yval[i]; >+ while (++i < len); >+ x.set_len (len); >+} >+ >/* An N-bit integer. Until we can use typedef templates, use this >instead. */ > #define FIXED_WIDE_INT(N) \ > generic_wide_int < fixed_wide_int_storage > >@@ -1012,10 +1027,7 @@ inline fixed_wide_int_storage ::fixed > /* Check for type compatibility. We don't want to initialize a > fixed-width integer from something like a wide_int. */ > WI_BINARY_RESULT (T, FIXED_WIDE_INT (N)) *assertion ATTRIBUTE_UNUSED; >- wide_int_ref xi (x, N); >- len = xi.len; >- for (unsigned int i = 0; i < len; ++i) >-val[i] = xi.val[i]; >+ wi::copy (*this, wide_int_ref (x, N)); > } > > template >@@ -1716,16 +1728,7 @@ wi::neg (const T &x, bool *overflow) > inline WI_UNARY_RESULT (T) > wi::abs (const T &x) > { >- if (neg_p (x)) >-return neg (x); >- >- WI_UNARY_RESULT_VAR (result, val, T, x); >- wide_int_ref xi (x, get_precision (result)); >- for (unsigned int i = 0; i < xi.len; ++i) >-val[i] = xi.val[i]; >- result.set_len (xi.len); >- >- return result; >+ return neg_p (x) ? neg (x) : x; > } > > /* Return the result of sign-extending the low OFFSET bits of X. */ >@@ -1760,9 +1763,7 @@ wi::zext (const T &x, unsigned int offse > maintain canonization. */ > if (offset >= precision) > { >- for (unsigned int i = 0; i < xi.len; ++i) >- val[i] = xi.val[i]; >- result.set_len (xi.len); >+ wi::copy (result, xi); > return result; > } > >@@ -1809,22 +1810,12 @@ wi::set_bit (const T &x, unsigned int bi > inline WI_BINARY_RESULT (T1, T2) > wi::min (const T1 &x, const T2 &y, signop sgn) > { >- WI_BINARY_RESULT_VAR (result, val, T1, x, T2, y); >+ WI_BINARY_RESULT_VAR (result, val ATTRIBUTE_UNUSED, T1, x, T2, y); > unsigned int precision = get_precision (result); > if (wi::le_p (x, y, sgn)) >-{ >- wide_int_ref xi (x, precision); >- for (unsigned int i = 0; i < xi.len; ++i) >- val[i] = xi.val[i]; >- result.set_len (xi.len); >-} >+wi::copy (result, wide_int_ref (x, precision)); > else >-{ >- wide_int_ref yi (y, precision); >- for (unsigned int i = 0; i < yi.len; ++i) >- val[i] = yi.val[i]; >- result.set_len (yi.len); >-} >+wi::copy (result, wide_int_ref (y, precision)); > return result; > } > >@@ -1850,22 +1841,12 @@ wi::umin (const T1 &x, const T2 &y) > inline WI_BINARY_RESULT (T1, T2) > wi::max (const T1 &x, const T2 &y, signop sgn) > { >- WI_BINARY_RESULT_VAR (result, val, T1, x, T2, y); >+ WI_BINARY_RESULT_VAR (result, val ATTRIBUTE_UNUSED, T1, x, T2, y); > unsigned int precision = get_precision (result); > if (wi::ge_p (x, y, sgn)) >-{ >- wide_int_ref xi (x, precision); >- for (unsigned int i = 0; i < xi.len; ++i) >- val[i] = xi.val[i]; >- result.set_len (xi.len); >-} >+wi::copy (result, wide_int_ref (x, precision)); > else >-{ >- wide_int_ref yi (y, precision); >- for (unsigned int i = 0; i < yi.len; ++i) >- val[i] = yi.val[i]; >- result.set_len (yi.len); >-} >+wi::copy (result, wide_int_ref (y, precision)); > return result; > } >
Re: [wide-int] Reduce the size of the scratch arrays
Richard Sandiford wrote: >If yesterday's patch goes in, we'll only need the scratch array for >HWI inputs. We can therefore shrink it to 2 HWIs. >Tested on x86_64-linux-gnu. OK for wide-int? > Ok Thanks, Richard. >Thanks, >Richard > > >Index: gcc/wide-int.h >=== >--- gcc/wide-int.h 2013-10-19 20:08:15.571987579 +0100 >+++ gcc/wide-int.h 2013-10-19 20:09:12.299436972 +0100 >@@ -778,7 +778,7 @@ struct wide_int_ref_storage : public wi: > private: >/* Scratch space that can be used when decomposing the original >integer. > It must live as long as this object. */ >- HOST_WIDE_INT scratch[WIDE_INT_MAX_ELTS]; >+ HOST_WIDE_INT scratch[2]; > > public: > template
Re: [wide-int] Turn lts_p & co. back into template functions
Richard Sandiford wrote: >This patch just converts some functions back to template functions, >for the benefit of the upcoming ::is_sign_extended patch. There's no >behavioural change. > >Tested on x86_64-linux-gnu. OK for wide-int? Ok. Thanks, Richard. >Thanks, >Richard > > >Index: gcc/gcc/wide-int.h >=== >--- gcc.orig/gcc/wide-int.h >+++ gcc/gcc/wide-int.h >@@ -347,42 +347,48 @@ namespace wi > template > unsigned int get_binary_precision (const T1 &, const T2 &); > >- bool fits_shwi_p (const wide_int_ref &); >- bool fits_uhwi_p (const wide_int_ref &); >- bool neg_p (const wide_int_ref &, signop = SIGNED); >- bool only_sign_bit_p (const wide_int_ref &, unsigned int); >- bool only_sign_bit_p (const wide_int_ref &); >- HOST_WIDE_INT sign_mask (const wide_int_ref &); >- >- template >- bool eq_p (const T1 &, const T2 &); >- >- template >- bool ne_p (const T1 &, const T2 &); >- >- bool lt_p (const wide_int_ref &, const wide_int_ref &, signop); >- bool lts_p (const wide_int_ref &, const wide_int_ref &); >- bool ltu_p (const wide_int_ref &, const wide_int_ref &); >- bool le_p (const wide_int_ref &, const wide_int_ref &, signop); >- bool les_p (const wide_int_ref &, const wide_int_ref &); >- bool leu_p (const wide_int_ref &, const wide_int_ref &); >- bool gt_p (const wide_int_ref &, const wide_int_ref &, signop); >- bool gts_p (const wide_int_ref &, const wide_int_ref &); >- bool gtu_p (const wide_int_ref &, const wide_int_ref &); >- bool ge_p (const wide_int_ref &, const wide_int_ref &, signop); >- bool ges_p (const wide_int_ref &, const wide_int_ref &); >- bool geu_p (const wide_int_ref &, const wide_int_ref &); >- int cmp (const wide_int_ref &, const wide_int_ref &, signop); >- int cmps (const wide_int_ref &, const wide_int_ref &); >- int cmpu (const wide_int_ref &, const wide_int_ref &); >- >+#define UNARY_PREDICATE \ >+ template bool > #define UNARY_FUNCTION \ > template WI_UNARY_RESULT (T) >+#define BINARY_PREDICATE \ >+ template bool > #define BINARY_FUNCTION \ > template WI_BINARY_RESULT (T1, T2) > #define SHIFT_FUNCTION \ > template WI_UNARY_RESULT (T) > >+ UNARY_PREDICATE fits_shwi_p (const T &); >+ UNARY_PREDICATE fits_uhwi_p (const T &); >+ UNARY_PREDICATE neg_p (const T &, signop = SIGNED); >+ >+ template >+ HOST_WIDE_INT sign_mask (const T &); >+ >+ BINARY_PREDICATE eq_p (const T1 &, const T2 &); >+ BINARY_PREDICATE ne_p (const T1 &, const T2 &); >+ BINARY_PREDICATE lt_p (const T1 &, const T2 &, signop); >+ BINARY_PREDICATE lts_p (const T1 &, const T2 &); >+ BINARY_PREDICATE ltu_p (const T1 &, const T2 &); >+ BINARY_PREDICATE le_p (const T1 &, const T2 &, signop); >+ BINARY_PREDICATE les_p (const T1 &, const T2 &); >+ BINARY_PREDICATE leu_p (const T1 &, const T2 &); >+ BINARY_PREDICATE gt_p (const T1 &, const T2 &, signop); >+ BINARY_PREDICATE gts_p (const T1 &, const T2 &); >+ BINARY_PREDICATE gtu_p (const T1 &, const T2 &); >+ BINARY_PREDICATE ge_p (const T1 &, const T2 &, signop); >+ BINARY_PREDICATE ges_p (const T1 &, const T2 &); >+ BINARY_PREDICATE geu_p (const T1 &, const T2 &); >+ >+ template >+ int cmp (const T1 &, const T2 &, signop); >+ >+ template >+ int cmps (const T1 &, const T2 &); >+ >+ template >+ int cmpu (const T1 &, const T2 &); >+ > UNARY_FUNCTION bit_not (const T &); > UNARY_FUNCTION neg (const T &); > UNARY_FUNCTION neg (const T &, bool *); >@@ -446,9 +452,13 @@ namespace wi >SHIFT_FUNCTION rrotate (const T &, const wide_int_ref &, unsigned int = >0); > > #undef SHIFT_FUNCTION >+#undef BINARY_PREDICATE > #undef BINARY_FUNCTION >+#undef UNARY_PREDICATE > #undef UNARY_FUNCTION > >+ bool only_sign_bit_p (const wide_int_ref &, unsigned int); >+ bool only_sign_bit_p (const wide_int_ref &); > int clz (const wide_int_ref &); > int clrsb (const wide_int_ref &); > int ctz (const wide_int_ref &); >@@ -1401,40 +1411,48 @@ wi::get_binary_precision (const T1 &x, c > > /* Return true if X fits in a HOST_WIDE_INT with no loss of >precision. */ >+template > inline bool >-wi::fits_shwi_p (const wide_int_ref &x) >+wi::fits_shwi_p (const T &x) > { >- return x.len == 1; >+ wide_int_ref xi (x); >+ return xi.len == 1; > } > > /* Return true if X fits in an unsigned HOST_WIDE_INT with no loss of >precision. */ >+template > inline bool >-wi::fits_uhwi_p (const wide_int_ref &x) >+wi::fits_uhwi_p (const T &x) > { >- if (x.precision <= HOST_BITS_PER_WIDE_INT) >+ wide_int_ref xi (x); >+ if (xi.precision <= HOST_BITS_PER_WIDE_INT) > return true; >- if (x.len == 1) >-return x.slow () >= 0; >- return x.len == 2 && x.uhigh () == 0; >+ if (xi.len == 1) >+return xi.slow () >= 0; >+ return xi.len == 2 && xi.uhigh () == 0; > } > > /* Return true if X is negative based on the interpretation of SGN. >For UNSIGNED, this is always false. */ >+template > inline bool >-wi::neg_p (const wide_int_ref &x, signop sgn) >+
Re: Always treat sext_hwi and zext_hwi as inline
Richard Sandiford wrote: >At the moment there are two copies of sext_hwi and zext_hwi, one inline >for !ENABLE_CHECKING and one out-of-line for ENABLE_CHECKING. However, >there are several wide-int callers where it's obvious from context that >the precision is <= HOST_BITS_PER_WIDE_INT, so if the functions are >inline for ENABLE_CHECKING, the assert can often be compiled away. > >This improves the ENABLE_CHECKING code for some query functions on >wide-int branch. Tested on x86_64-linux-gnu. OK for mainline? Ok. >Thanks, >Richard > > >gcc/ > * system.h: Move hwint.h include further down. > * hwint.h (sext_hwi, zext_hwi): Define unconditionally. Add > gcc_checking_asserts. > * hwint.c (sext_hwi, zext_hwi): Delete ENABLE_CHECKING versions. > >Index: gcc/hwint.c >=== >--- gcc/hwint.c2013-08-21 19:28:49.560621645 +0100 >+++ gcc/hwint.c2013-10-19 20:05:43.399978400 +0100 >@@ -204,35 +204,3 @@ least_common_multiple (HOST_WIDE_INT a, > { > return mul_hwi (abs_hwi (a) / gcd (a, b), abs_hwi (b)); > } >- >-#ifdef ENABLE_CHECKING >-/* Sign extend SRC starting from PREC. */ >- >-HOST_WIDE_INT >-sext_hwi (HOST_WIDE_INT src, unsigned int prec) >-{ >- gcc_checking_assert (prec <= HOST_BITS_PER_WIDE_INT); >- >- if (prec == HOST_BITS_PER_WIDE_INT) >-return src; >- else >-{ >- int shift = HOST_BITS_PER_WIDE_INT - prec; >- return (src << shift) >> shift; >-} >-} >- >-/* Zero extend SRC starting from PREC. */ >- >-unsigned HOST_WIDE_INT >-zext_hwi (unsigned HOST_WIDE_INT src, unsigned int prec) >-{ >- gcc_checking_assert (prec <= HOST_BITS_PER_WIDE_INT); >- >- if (prec == HOST_BITS_PER_WIDE_INT) >-return src; >- else >-return src & (((HOST_WIDE_INT)1 << prec) - 1); >-} >- >-#endif >Index: gcc/hwint.h >=== >--- gcc/hwint.h2013-09-05 20:55:31.192518091 +0100 >+++ gcc/hwint.h2013-10-19 20:05:23.469855942 +0100 >@@ -322,9 +322,6 @@ extern HOST_WIDE_INT least_common_multip > > /* Sign extend SRC starting from PREC. */ > >-#ifdef ENABLE_CHECKING >-extern HOST_WIDE_INT sext_hwi (HOST_WIDE_INT, unsigned int); >-#else > static inline HOST_WIDE_INT > sext_hwi (HOST_WIDE_INT src, unsigned int prec) > { >@@ -332,24 +329,23 @@ sext_hwi (HOST_WIDE_INT src, unsigned in > return src; > else > { >+ gcc_checking_assert (prec < HOST_BITS_PER_WIDE_INT); > int shift = HOST_BITS_PER_WIDE_INT - prec; > return (src << shift) >> shift; > } > } >-#endif > > /* Zero extend SRC starting from PREC. */ >-#ifdef ENABLE_CHECKING >-extern unsigned HOST_WIDE_INT zext_hwi (unsigned HOST_WIDE_INT, >unsigned int); >-#else > static inline unsigned HOST_WIDE_INT > zext_hwi (unsigned HOST_WIDE_INT src, unsigned int prec) > { > if (prec == HOST_BITS_PER_WIDE_INT) > return src; > else >-return src & (((HOST_WIDE_INT)1 << prec) - 1); >+{ >+ gcc_checking_assert (prec < HOST_BITS_PER_WIDE_INT); >+ return src & (((HOST_WIDE_INT) 1 << prec) - 1); >+} > } >-#endif > > #endif /* ! GCC_HWINT_H */ >Index: gcc/system.h >=== >--- gcc/system.h 2013-09-07 10:19:22.797432289 +0100 >+++ gcc/system.h 2013-10-19 20:06:14.417170260 +0100 >@@ -272,9 +272,6 @@ #define ICE_EXIT_CODE 4 > # include > #endif > >-/* Get definitions of HOST_WIDE_INT and HOST_WIDEST_INT. */ >-#include "hwint.h" >- > /* A macro to determine whether a VALUE lies inclusively within a >certain range without evaluating the VALUE more than once. This >macro won't warn if the VALUE is unsigned and the LOWER bound is >@@ -1066,4 +1063,7 @@ #define DEBUG_FUNCTION > #define DEBUG_VARIABLE > #endif > >+/* Get definitions of HOST_WIDE_INT and HOST_WIDEST_INT. */ >+#include "hwint.h" >+ > #endif /* ! GCC_SYSTEM_H */ Abs_hwi might be treated the same. IIRC it is out of line as the assert macros were not available in hwi.h. Thanks, Richard.
Re: [PATCH, SH] Add support for inlined builtin-strcmp (2/2)
Hi Oleg, On 10/19/2013 11:30 AM, Oleg Endo wrote: > > > I've attached two test cases, tested with > make -k check-gcc RUNTESTFLAGS="sh.exp=strcmp* --target_board=sh-sim > \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}" > > Could you please include them? > > Cheers, > Oleg > > thanks for having retested this, The tests are still not complete for RTL generated functions, there are cases where no str/cmp wil be emitted, because we can predict than the size is less than 4 and so have a direct fallthru into the byte at a byte loop copying. Also I will consider a size optimized implementation, so we don't jump to the library I will post examples for this shortly (and add them as a testcase) with a strncmp implementation helper, that pertains to strcmp with constant strings by the way. Please allow me some time to complete by benchmarking. thanks for the hints about removing empty "" is the expanders Kaz, before proceeding with the next patch, was your approval for 1/2 only or 2/2 with the expander cleanup ? Many thanks, Christian
[wide-int] Add is_sign_extended optimisation
Another follow-up to yesterday's patch. This one implements Richard's suggestion of having an is_sign_extended trait to optimise cases where excess upper bits are known to be signs rather than undefined. The uses so far are: * make to_shwi () equivalent to slow () * turn eq_p into a simple loop * avoid extensions in sign_mask () * avoid extensions in set_len if the input was already sign-extended The first two are new (compared to wide-int svn) while the second two partially undo the negative effects of yesterday's patch on is_sign_extended values. E.g. bool f (wide_int x, HOST_WIDE_INT y) { return x == y; } now gives: xorl%eax, %eax cmpl$1, 264(%rsp) je .L27 ret .p2align 4,,10 .p2align 3 .L27: cmpq8(%rsp), %rdi sete%al ret And: wide_int f (wide_int x, wide_int y) { return x == y; } gives: movl264(%rsp), %ecx xorl%eax, %eax cmpl528(%rsp), %ecx je .L42 rep ret .p2align 4,,10 .p2align 3 .L42: xorl%eax, %eax jmp .L38 .p2align 4,,10 .p2align 3 .L44: addl$1, %eax cmpl%eax, %ecx je .L43 .L38: movl%eax, %edx movq272(%rsp,%rdx,8), %rsi cmpq%rsi, 8(%rsp,%rdx,8) je .L44 xorl%eax, %eax ret .p2align 4,,10 .p2align 3 .L43: movl$1, %eax ret (which is a bit poor -- "je .L42" trivially threads to "je. L38", although that's probably only true after RA). The code for: bool f (wide_int x, unsigned HOST_WIDE_INT y) { return x == y; } still needs some work though... The lts_p sequences of wide_int are similar to the ones Mike posted. Tested on x86_64-linux-gnu. OK for wide-int? Thanks, Richard Index: gcc/rtl.h === --- gcc/rtl.h 2013-10-20 09:38:40.254493991 +0100 +++ gcc/rtl.h 2013-10-20 09:39:28.169894855 +0100 @@ -1410,6 +1410,7 @@ typedef std::pair wide_int; +template struct wide_int_ref_storage; -typedef generic_wide_int wide_int_ref; + +typedef generic_wide_int > wide_int_ref; + +/* This can be used instead of wide_int_ref if the referenced value is + known to have type T. It carries across properties of T's representation, + such as whether excess upper bits in a HWI are defined, and can therefore + help avoid redundant work. + + The macro could be replaced with a template typedef, once we're able + to use those. */ +#define WIDE_INT_REF_FOR(T) \ + generic_wide_int \ +::is_sign_extended> > /* Public functions for querying and operating on integers. */ namespace wi @@ -520,18 +533,6 @@ wi::storage_ref::get_val () const return val; } -namespace wi -{ - template <> - struct int_traits - { -static const enum precision_type precision_type = VAR_PRECISION; -/* wi::storage_ref can be a reference to a primitive type, - so this is the conservatively-correct setting. */ -static const bool host_dependent_precision = true; - }; -} - /* This class defines an integer type using the storage provided by the template argument. The storage class must provide the following functions: @@ -626,6 +627,9 @@ #define INCDEC_OPERATOR(OP, DELTA) \ #undef INCDEC_OPERATOR char *dump (char *) const; + + static const bool is_sign_extended += wi::int_traits >::is_sign_extended; }; template @@ -653,7 +657,11 @@ inline generic_wide_int ::gener generic_wide_int ::to_shwi (unsigned int precision) const { if (precision == 0) -precision = this->get_precision (); +{ + if (is_sign_extended) + return this->get_val ()[0]; + precision = this->get_precision (); +} if (precision < HOST_BITS_PER_WIDE_INT) return sext_hwi (this->get_val ()[0], precision); else @@ -692,11 +700,14 @@ generic_wide_int ::to_short_add generic_wide_int ::sign_mask () const { unsigned int len = this->get_len (); - unsigned int precision = this->get_precision (); unsigned HOST_WIDE_INT high = this->get_val ()[len - 1]; - int excess = len * HOST_BITS_PER_WIDE_INT - precision; - if (excess > 0) -high <<= excess; + if (!is_sign_extended) +{ + unsigned int precision = this->get_precision (); + int excess = len * HOST_BITS_PER_WIDE_INT - precision; + if (excess > 0) + high <<= excess; +} return HOST_WIDE_INT (high) < 0 ? -1 : 0; } @@ -781,6 +792,7 @@ decompose (HOST_WIDE_INT *, unsigned int /* Provide the storage for a wide_int_ref. This acts like a read-only wide_int, with the optimization that VAL is normally a pointer to another integer's storage, so that no array copy is needed. */ +template struct wide_int_ref_storage : public wi::storage_ref { private: @@ -799,17 +811,19 @@ struct wide_int_ref_storage : public wi: /* Create a ref
[wide-int] Add a copy helper routine
When the wi:: patch went in, Kenny rightly complained that there were too many copy loops. This patch gets rid of an unnecessary loop in wi::abs and consolidates the rest into a single function. Again, it helps the ::is_sign_extended patch. Tested on x86_64-linux-gnu. OK to install? Thanks, Richard Index: gcc/wide-int.h === --- gcc/wide-int.h 2013-10-20 09:38:41.175501696 +0100 +++ gcc/wide-int.h 2013-10-20 11:32:47.425353384 +0100 @@ -347,6 +347,9 @@ typedef generic_wide_int unsigned int get_binary_precision (const T1 &, const T2 &); + template + void copy (T1 &, const T2 &); + #define UNARY_PREDICATE \ template bool #define UNARY_FUNCTION \ @@ -870,10 +873,7 @@ inline wide_int_storage::wide_int_storag STATIC_ASSERT (!wi::int_traits::host_dependent_precision); wide_int_ref xi (x); precision = xi.precision; - unsigned int l = xi.len; - for (unsigned int i = 0; i < l; ++i) -val[i] = xi.val[i]; - set_len (l); + wi::copy (*this, xi); } inline unsigned int @@ -969,6 +969,21 @@ wi::int_traits ::get_b return wide_int::create (wi::get_precision (x)); } +/* Copy the contents of Y to X, but keeping X's current precision. */ +template +void +wi::copy (T1 &x, const T2 &y) +{ + HOST_WIDE_INT *xval = x.write_val (); + const HOST_WIDE_INT *yval = y.get_val (); + unsigned int len = y.get_len (); + unsigned int i = 0; + do +xval[i] = yval[i]; + while (++i < len); + x.set_len (len); +} + /* An N-bit integer. Until we can use typedef templates, use this instead. */ #define FIXED_WIDE_INT(N) \ generic_wide_int < fixed_wide_int_storage > @@ -1012,10 +1027,7 @@ inline fixed_wide_int_storage ::fixed /* Check for type compatibility. We don't want to initialize a fixed-width integer from something like a wide_int. */ WI_BINARY_RESULT (T, FIXED_WIDE_INT (N)) *assertion ATTRIBUTE_UNUSED; - wide_int_ref xi (x, N); - len = xi.len; - for (unsigned int i = 0; i < len; ++i) -val[i] = xi.val[i]; + wi::copy (*this, wide_int_ref (x, N)); } template @@ -1716,16 +1728,7 @@ wi::neg (const T &x, bool *overflow) inline WI_UNARY_RESULT (T) wi::abs (const T &x) { - if (neg_p (x)) -return neg (x); - - WI_UNARY_RESULT_VAR (result, val, T, x); - wide_int_ref xi (x, get_precision (result)); - for (unsigned int i = 0; i < xi.len; ++i) -val[i] = xi.val[i]; - result.set_len (xi.len); - - return result; + return neg_p (x) ? neg (x) : x; } /* Return the result of sign-extending the low OFFSET bits of X. */ @@ -1760,9 +1763,7 @@ wi::zext (const T &x, unsigned int offse maintain canonization. */ if (offset >= precision) { - for (unsigned int i = 0; i < xi.len; ++i) - val[i] = xi.val[i]; - result.set_len (xi.len); + wi::copy (result, xi); return result; } @@ -1809,22 +1810,12 @@ wi::set_bit (const T &x, unsigned int bi inline WI_BINARY_RESULT (T1, T2) wi::min (const T1 &x, const T2 &y, signop sgn) { - WI_BINARY_RESULT_VAR (result, val, T1, x, T2, y); + WI_BINARY_RESULT_VAR (result, val ATTRIBUTE_UNUSED, T1, x, T2, y); unsigned int precision = get_precision (result); if (wi::le_p (x, y, sgn)) -{ - wide_int_ref xi (x, precision); - for (unsigned int i = 0; i < xi.len; ++i) - val[i] = xi.val[i]; - result.set_len (xi.len); -} +wi::copy (result, wide_int_ref (x, precision)); else -{ - wide_int_ref yi (y, precision); - for (unsigned int i = 0; i < yi.len; ++i) - val[i] = yi.val[i]; - result.set_len (yi.len); -} +wi::copy (result, wide_int_ref (y, precision)); return result; } @@ -1850,22 +1841,12 @@ wi::umin (const T1 &x, const T2 &y) inline WI_BINARY_RESULT (T1, T2) wi::max (const T1 &x, const T2 &y, signop sgn) { - WI_BINARY_RESULT_VAR (result, val, T1, x, T2, y); + WI_BINARY_RESULT_VAR (result, val ATTRIBUTE_UNUSED, T1, x, T2, y); unsigned int precision = get_precision (result); if (wi::ge_p (x, y, sgn)) -{ - wide_int_ref xi (x, precision); - for (unsigned int i = 0; i < xi.len; ++i) - val[i] = xi.val[i]; - result.set_len (xi.len); -} +wi::copy (result, wide_int_ref (x, precision)); else -{ - wide_int_ref yi (y, precision); - for (unsigned int i = 0; i < yi.len; ++i) - val[i] = yi.val[i]; - result.set_len (yi.len); -} +wi::copy (result, wide_int_ref (y, precision)); return result; }
[wide-int] Reduce the size of the scratch arrays
If yesterday's patch goes in, we'll only need the scratch array for HWI inputs. We can therefore shrink it to 2 HWIs. Tested on x86_64-linux-gnu. OK for wide-int? Thanks, Richard Index: gcc/wide-int.h === --- gcc/wide-int.h 2013-10-19 20:08:15.571987579 +0100 +++ gcc/wide-int.h 2013-10-19 20:09:12.299436972 +0100 @@ -778,7 +778,7 @@ struct wide_int_ref_storage : public wi: private: /* Scratch space that can be used when decomposing the original integer. It must live as long as this object. */ - HOST_WIDE_INT scratch[WIDE_INT_MAX_ELTS]; + HOST_WIDE_INT scratch[2]; public: template
[wide-int] Turn lts_p & co. back into template functions
This patch just converts some functions back to template functions, for the benefit of the upcoming ::is_sign_extended patch. There's no behavioural change. Tested on x86_64-linux-gnu. OK for wide-int? Thanks, Richard Index: gcc/gcc/wide-int.h === --- gcc.orig/gcc/wide-int.h +++ gcc/gcc/wide-int.h @@ -347,42 +347,48 @@ namespace wi template unsigned int get_binary_precision (const T1 &, const T2 &); - bool fits_shwi_p (const wide_int_ref &); - bool fits_uhwi_p (const wide_int_ref &); - bool neg_p (const wide_int_ref &, signop = SIGNED); - bool only_sign_bit_p (const wide_int_ref &, unsigned int); - bool only_sign_bit_p (const wide_int_ref &); - HOST_WIDE_INT sign_mask (const wide_int_ref &); - - template - bool eq_p (const T1 &, const T2 &); - - template - bool ne_p (const T1 &, const T2 &); - - bool lt_p (const wide_int_ref &, const wide_int_ref &, signop); - bool lts_p (const wide_int_ref &, const wide_int_ref &); - bool ltu_p (const wide_int_ref &, const wide_int_ref &); - bool le_p (const wide_int_ref &, const wide_int_ref &, signop); - bool les_p (const wide_int_ref &, const wide_int_ref &); - bool leu_p (const wide_int_ref &, const wide_int_ref &); - bool gt_p (const wide_int_ref &, const wide_int_ref &, signop); - bool gts_p (const wide_int_ref &, const wide_int_ref &); - bool gtu_p (const wide_int_ref &, const wide_int_ref &); - bool ge_p (const wide_int_ref &, const wide_int_ref &, signop); - bool ges_p (const wide_int_ref &, const wide_int_ref &); - bool geu_p (const wide_int_ref &, const wide_int_ref &); - int cmp (const wide_int_ref &, const wide_int_ref &, signop); - int cmps (const wide_int_ref &, const wide_int_ref &); - int cmpu (const wide_int_ref &, const wide_int_ref &); - +#define UNARY_PREDICATE \ + template bool #define UNARY_FUNCTION \ template WI_UNARY_RESULT (T) +#define BINARY_PREDICATE \ + template bool #define BINARY_FUNCTION \ template WI_BINARY_RESULT (T1, T2) #define SHIFT_FUNCTION \ template WI_UNARY_RESULT (T) + UNARY_PREDICATE fits_shwi_p (const T &); + UNARY_PREDICATE fits_uhwi_p (const T &); + UNARY_PREDICATE neg_p (const T &, signop = SIGNED); + + template + HOST_WIDE_INT sign_mask (const T &); + + BINARY_PREDICATE eq_p (const T1 &, const T2 &); + BINARY_PREDICATE ne_p (const T1 &, const T2 &); + BINARY_PREDICATE lt_p (const T1 &, const T2 &, signop); + BINARY_PREDICATE lts_p (const T1 &, const T2 &); + BINARY_PREDICATE ltu_p (const T1 &, const T2 &); + BINARY_PREDICATE le_p (const T1 &, const T2 &, signop); + BINARY_PREDICATE les_p (const T1 &, const T2 &); + BINARY_PREDICATE leu_p (const T1 &, const T2 &); + BINARY_PREDICATE gt_p (const T1 &, const T2 &, signop); + BINARY_PREDICATE gts_p (const T1 &, const T2 &); + BINARY_PREDICATE gtu_p (const T1 &, const T2 &); + BINARY_PREDICATE ge_p (const T1 &, const T2 &, signop); + BINARY_PREDICATE ges_p (const T1 &, const T2 &); + BINARY_PREDICATE geu_p (const T1 &, const T2 &); + + template + int cmp (const T1 &, const T2 &, signop); + + template + int cmps (const T1 &, const T2 &); + + template + int cmpu (const T1 &, const T2 &); + UNARY_FUNCTION bit_not (const T &); UNARY_FUNCTION neg (const T &); UNARY_FUNCTION neg (const T &, bool *); @@ -446,9 +452,13 @@ namespace wi SHIFT_FUNCTION rrotate (const T &, const wide_int_ref &, unsigned int = 0); #undef SHIFT_FUNCTION +#undef BINARY_PREDICATE #undef BINARY_FUNCTION +#undef UNARY_PREDICATE #undef UNARY_FUNCTION + bool only_sign_bit_p (const wide_int_ref &, unsigned int); + bool only_sign_bit_p (const wide_int_ref &); int clz (const wide_int_ref &); int clrsb (const wide_int_ref &); int ctz (const wide_int_ref &); @@ -1401,40 +1411,48 @@ wi::get_binary_precision (const T1 &x, c /* Return true if X fits in a HOST_WIDE_INT with no loss of precision. */ +template inline bool -wi::fits_shwi_p (const wide_int_ref &x) +wi::fits_shwi_p (const T &x) { - return x.len == 1; + wide_int_ref xi (x); + return xi.len == 1; } /* Return true if X fits in an unsigned HOST_WIDE_INT with no loss of precision. */ +template inline bool -wi::fits_uhwi_p (const wide_int_ref &x) +wi::fits_uhwi_p (const T &x) { - if (x.precision <= HOST_BITS_PER_WIDE_INT) + wide_int_ref xi (x); + if (xi.precision <= HOST_BITS_PER_WIDE_INT) return true; - if (x.len == 1) -return x.slow () >= 0; - return x.len == 2 && x.uhigh () == 0; + if (xi.len == 1) +return xi.slow () >= 0; + return xi.len == 2 && xi.uhigh () == 0; } /* Return true if X is negative based on the interpretation of SGN. For UNSIGNED, this is always false. */ +template inline bool -wi::neg_p (const wide_int_ref &x, signop sgn) +wi::neg_p (const T &x, signop sgn) { + wide_int_ref xi (x); if (sgn == UNSIGNED) return false; - return x.sign_mask () < 0; + return xi.sign_mask () < 0; } /* Return -1 if the
Always treat sext_hwi and zext_hwi as inline
At the moment there are two copies of sext_hwi and zext_hwi, one inline for !ENABLE_CHECKING and one out-of-line for ENABLE_CHECKING. However, there are several wide-int callers where it's obvious from context that the precision is <= HOST_BITS_PER_WIDE_INT, so if the functions are inline for ENABLE_CHECKING, the assert can often be compiled away. This improves the ENABLE_CHECKING code for some query functions on wide-int branch. Tested on x86_64-linux-gnu. OK for mainline? Thanks, Richard gcc/ * system.h: Move hwint.h include further down. * hwint.h (sext_hwi, zext_hwi): Define unconditionally. Add gcc_checking_asserts. * hwint.c (sext_hwi, zext_hwi): Delete ENABLE_CHECKING versions. Index: gcc/hwint.c === --- gcc/hwint.c 2013-08-21 19:28:49.560621645 +0100 +++ gcc/hwint.c 2013-10-19 20:05:43.399978400 +0100 @@ -204,35 +204,3 @@ least_common_multiple (HOST_WIDE_INT a, { return mul_hwi (abs_hwi (a) / gcd (a, b), abs_hwi (b)); } - -#ifdef ENABLE_CHECKING -/* Sign extend SRC starting from PREC. */ - -HOST_WIDE_INT -sext_hwi (HOST_WIDE_INT src, unsigned int prec) -{ - gcc_checking_assert (prec <= HOST_BITS_PER_WIDE_INT); - - if (prec == HOST_BITS_PER_WIDE_INT) -return src; - else -{ - int shift = HOST_BITS_PER_WIDE_INT - prec; - return (src << shift) >> shift; -} -} - -/* Zero extend SRC starting from PREC. */ - -unsigned HOST_WIDE_INT -zext_hwi (unsigned HOST_WIDE_INT src, unsigned int prec) -{ - gcc_checking_assert (prec <= HOST_BITS_PER_WIDE_INT); - - if (prec == HOST_BITS_PER_WIDE_INT) -return src; - else -return src & (((HOST_WIDE_INT)1 << prec) - 1); -} - -#endif Index: gcc/hwint.h === --- gcc/hwint.h 2013-09-05 20:55:31.192518091 +0100 +++ gcc/hwint.h 2013-10-19 20:05:23.469855942 +0100 @@ -322,9 +322,6 @@ extern HOST_WIDE_INT least_common_multip /* Sign extend SRC starting from PREC. */ -#ifdef ENABLE_CHECKING -extern HOST_WIDE_INT sext_hwi (HOST_WIDE_INT, unsigned int); -#else static inline HOST_WIDE_INT sext_hwi (HOST_WIDE_INT src, unsigned int prec) { @@ -332,24 +329,23 @@ sext_hwi (HOST_WIDE_INT src, unsigned in return src; else { + gcc_checking_assert (prec < HOST_BITS_PER_WIDE_INT); int shift = HOST_BITS_PER_WIDE_INT - prec; return (src << shift) >> shift; } } -#endif /* Zero extend SRC starting from PREC. */ -#ifdef ENABLE_CHECKING -extern unsigned HOST_WIDE_INT zext_hwi (unsigned HOST_WIDE_INT, unsigned int); -#else static inline unsigned HOST_WIDE_INT zext_hwi (unsigned HOST_WIDE_INT src, unsigned int prec) { if (prec == HOST_BITS_PER_WIDE_INT) return src; else -return src & (((HOST_WIDE_INT)1 << prec) - 1); +{ + gcc_checking_assert (prec < HOST_BITS_PER_WIDE_INT); + return src & (((HOST_WIDE_INT) 1 << prec) - 1); +} } -#endif #endif /* ! GCC_HWINT_H */ Index: gcc/system.h === --- gcc/system.h2013-09-07 10:19:22.797432289 +0100 +++ gcc/system.h2013-10-19 20:06:14.417170260 +0100 @@ -272,9 +272,6 @@ #define ICE_EXIT_CODE 4 # include #endif -/* Get definitions of HOST_WIDE_INT and HOST_WIDEST_INT. */ -#include "hwint.h" - /* A macro to determine whether a VALUE lies inclusively within a certain range without evaluating the VALUE more than once. This macro won't warn if the VALUE is unsigned and the LOWER bound is @@ -1066,4 +1063,7 @@ #define DEBUG_FUNCTION #define DEBUG_VARIABLE #endif +/* Get definitions of HOST_WIDE_INT and HOST_WIDEST_INT. */ +#include "hwint.h" + #endif /* ! GCC_SYSTEM_H */
Re: [Patch] Change default executor to DFS in regex
On Sun, Oct 20, 2013 at 4:42 AM, Paolo Carlini wrote: > Ok with those changes. Committed. Thanks! -- Tim Shen a.patch Description: Binary data
Re: [PATCH i386 3/8] [AVX512] [19/n] Add AVX-512 patterns: Extracts and converts.
On Thu, Oct 17, 2013 at 4:16 PM, Kirill Yukhin wrote: >> >> > I suspect gen_lowpart is bad turn when reload is completed, as >> >> > far as it can create new pseudo. gen_lowpart () may call >> >> > gen_reg_rtx (), which contain corresponging gcc_assert (). >> >> >> >> False. gen_lowpart is perfectly safe post-reload. >> >> Indeed, taking the subreg of a hard register should arrive >> >> >> >> x = gen_rtx_REG_offset (op, outermode, final_regno, >> >> final_offset); >> >> >> >> in simplify_subreg. >> >> >> >> Have you encountered some specific problem with gen_lowpart? >> > Maybe the code in the pattern is buggy? Or is it a gen_lowpart? >> >> I think that original approach with gen_rtx_REG is correct and follows >> established practice in sse.md (please grep for gen_reg_RTX in >> sse.md). If this approach is necessary due to the deficiency of >> gen_lowpart, then the fix to gen_lowpart should be proposed in a >> follow-up patch. > > So, I've reverted changes in mult_vect patterns and > added "%" to constraints. Please also add back expanders with operand fixups and insn constraints, as is the case with other commutative operators. They are needed to hoist operand loads out of the loops (reload and later passes won't hoist memory loads out of the loops when fixing up operands). The patch is OK with this change, but please wait for rths final approval. Thanks, Uros.
Re: [PATCH][i386] Enable vector_loop in memset expanding and merge expanders for memset and memmov
> That's a good point. I added a check for this case - so if CONST_INT is > passed > we assume that mode is QI. But usually promoted value resides in a register, > so > we don't go over-conservative here. Hmm, so if we use broadcast constant, then we won't end up having CONST_INT here? It is OK then. > > > > + if (size_to_move < GET_MODE_SIZE (move_mode)) > > > +{ > > > + move_mode = mode_for_size (size_to_move * BITS_PER_UNIT, MODE_INT, > > > 0); > > > + promoted_val = gen_lowpart (move_mode, promoted_val); > > > +} > > > + piece_size = GET_MODE_SIZE (move_mode); > > > + code = optab_handler (mov_optab, move_mode); > > > + gcc_assert (code != CODE_FOR_nothing && promoted_val != NULL_RTX); > > > + > > > + dst = adjust_automodify_address_nv (dst, move_mode, destptr, 0); > > > + > > > + /* Emit moves. We'll need SIZE_TO_MOVE/PIECE_SIZES moves. */ > > > + gcc_assert (size_to_move % piece_size == 0); > > > + adjust = GEN_INT (piece_size); > > > + for (i = 0; i < size_to_move; i += piece_size) > > > +{ > > > + if (piece_size <= GET_MODE_SIZE (word_mode)) > > > + { > > > + emit_insn (gen_strset (destptr, dst, promoted_val)); > > > + continue; > > > + } > > Don't we want to go for strmov here, too? > I don't quite get it. How could strmov help us here? strset/strmov expanders results in stos/movs instructions on some targets. The code is supposed to use them when profitable - i.e. for constant sized prologues and known offset, you can set up registers for rep; movsq and then use use movb/movsw/movsl to get the required alignment (or to copy the tail). It seems to me that whenever we can use strset we can also use strmov, so the code should be symmetric. It is currently enabled only for P4, but perhaps it makes sense elsewhere, too (and to expand compact move by pieces/store by pieces). I do not know. Those single stringop insns are always fishy. > > How do we ensure that value is sufficiently promoted? (for example for > > pentiumpro that requires bigger alignment than promotion)? > > Also don't we need to handle vec_value for AVX? > I don't see a problem here actually. I tried to keep logic for promoted_val > unchanged, and just add VEC_PROMOTED_VAL, which doesn't use alignment info at > all. VEC_PROMOTED_VAL is promoted to move_mode, which is the biggest mode we > could use for storing, so we always have enough-promoted value. OK. > > > @@ -23015,7 +23074,8 @@ ix86_expand_movmem (rtx dst, rtx src, rtx > > > count_exp, rtx align_exp, > > > align = INTVAL (expected_align_exp); > > >/* ALIGN is the minimum of destination and source alignment, but we > > > care here > > > just about destination alignment. */ > > > - else if (MEM_ALIGN (dst) > (unsigned HOST_WIDE_INT) align * > > > BITS_PER_UNIT) > > > + else if (!issetmem > > > +&& MEM_ALIGN (dst) > (unsigned HOST_WIDE_INT) align * BITS_PER_UNIT) > > > align = MEM_ALIGN (dst) / BITS_PER_UNIT; > > > > Hmm, this really should be pushed up into expansion and the alignment > > lookup should be improved... > Well, for now I just to keep some things, like this, unchanged. We could > continue cleaning this all up. Yep, lets handle other things incrementally. > > > > + /* For now vector-version of memset is generated only for memory > > > zeroing, as > > > + creating of promoted vector value is very cheap in this case. */ > > > + if (issetmem && alg == vector_loop && val_exp != const0_rtx) > > > +alg = unrolled_loop; > > > > I believe we had vector version of promote_reg once. Load of constant is > > easy, too. > > Broadcasting a value is bit different story. > Most of the cases where we need vector memset is zeroing a memory, that's why > I > focused on this case. But you are right, promoting constant value would be > simple as well. > > > Thanks, the patch is OK with those changes. It really seems like an > > improvement in maintainability. Thanks for working on it! > Could you look at this version of the patch? Didn't I forget about anything? > > The patch was bootstrapped, make-checked and tested on specs 2k/2k6 (stability > only). The same testing was performed with -minline-all-stringops. The patch is OK. Honza
Re: [PATCH][i386] Enable vector_loop in memset expanding and merge expanders for memset and memmov
Hi Jan, Thanks for the review, please find my answers below. > > +/* Output "rep; mov" or "rep; stos" instruction depending on ISSETMEM > > argument. > > + When ISSETMEM is true, arguments SRCMEM and SRCPTR are ignored. > > + When ISSETMEM is false, arguments VALUE and ORIG_VALUE are ignored. > > + Other arguments have same meaning as for previous function. */ > > You need to document ORIG_VALUE Fixed. > > +expand_movmem_or_setmem_via_rep (rtx destmem, rtx srcmem, > > lets go with the same name as in first function, expand_set_or_movmem_via_rep. > Either I would add issetmem into expand_set_or_movmem_via_loop > or I would go for the same scheme - if srcptr/srcmem is NULL, use value. Done. > > @@ -22233,82 +22236,65 @@ expand_movmem_via_rep_mov (rtx destmem, rtx > > srcmem, > >HOST_WIDE_INT rounded_count; > > > >/* If the size is known, it is shorter to use rep movs. */ > > - if (mode == QImode && CONST_INT_P (count) > > + if (!issetmem && mode == QImode && CONST_INT_P (count) > > Instead of !issetmem allow this also for value being 0. Add comment that for > memset we would need to promote value and also that the logic should be moved > to decide_alg (I believe I have it there in my proposed patch). Done. > > +/* This function emits moves to fill SIZE_TO_MOVE bytes starting from > > DESTMEM > > + with value PROMOTED_VAL. > > + SRC is passed by pointer to be updated on return. > > + Return value is updated DST. */ > > So it is same as store_by_pieces only taking promoted_val instead of doing > promotion itself? Yes, that's very similar to store_by_pieces (as emit_memmov is very similar to move_by_pieces). Later we could think of use of this functions in store/move_by_pieces, or vice versa. I added it now because at the moment we are not able to emit vector moves in move/store_by_pieces. And yes, as you notices, it also takes promoted_val, as it's intended to be used along with other emit_memmov/memset calls (as opposed to move_by_pieces which is assumed to perform the entire copying). > > +static rtx > > +emit_memset (rtx destmem, rtx destptr, rtx promoted_val, > > +HOST_WIDE_INT size_to_move) > > +{ > > + rtx dst = destmem, adjust; > > + enum insn_code code; > > + enum machine_mode move_mode; > > + int piece_size, i; > > + > > + /* Find the widest mode in which we could perform moves. > > + Start with the biggest power of 2 less than SIZE_TO_MOVE and half > > + it until move of such size is supported. */ > > + move_mode = GET_MODE (promoted_val); > > Won't this be VOIDmode for CONST_INT? That's a good point. I added a check for this case - so if CONST_INT is passed we assume that mode is QI. But usually promoted value resides in a register, so we don't go over-conservative here. > > + if (size_to_move < GET_MODE_SIZE (move_mode)) > > +{ > > + move_mode = mode_for_size (size_to_move * BITS_PER_UNIT, MODE_INT, > > 0); > > + promoted_val = gen_lowpart (move_mode, promoted_val); > > +} > > + piece_size = GET_MODE_SIZE (move_mode); > > + code = optab_handler (mov_optab, move_mode); > > + gcc_assert (code != CODE_FOR_nothing && promoted_val != NULL_RTX); > > + > > + dst = adjust_automodify_address_nv (dst, move_mode, destptr, 0); > > + > > + /* Emit moves. We'll need SIZE_TO_MOVE/PIECE_SIZES moves. */ > > + gcc_assert (size_to_move % piece_size == 0); > > + adjust = GEN_INT (piece_size); > > + for (i = 0; i < size_to_move; i += piece_size) > > +{ > > + if (piece_size <= GET_MODE_SIZE (word_mode)) > > + { > > + emit_insn (gen_strset (destptr, dst, promoted_val)); > > + continue; > > + } > Don't we want to go for strmov here, too? I don't quite get it. How could strmov help us here? > > +expand_movmem_or_setmem_prologue (rtx destmem, rtx srcmem, > > + rtx destptr, rtx srcptr, rtx value, > > + rtx vec_value, rtx count, int align, > > + int desired_alignment, bool issetmem) > > Again, I would go with consistent names. Fixed. > How do we ensure that value is sufficiently promoted? (for example for > pentiumpro that requires bigger alignment than promotion)? > Also don't we need to handle vec_value for AVX? I don't see a problem here actually. I tried to keep logic for promoted_val unchanged, and just add VEC_PROMOTED_VAL, which doesn't use alignment info at all. VEC_PROMOTED_VAL is promoted to move_mode, which is the biggest mode we could use for storing, so we always have enough-promoted value. > > +expand_constant_movmem_or_setmem_prologue (rtx dst, rtx *srcp, rtx destreg, > > Again I would make names consistent Done. > > +ix86_expand_movmem_or_setmem (rtx dst, rtx src, rtx count_exp, rtx val_exp, > > Name ;) Fixed:) > > @@ -23015,7 +23074,8 @@ ix86_expand_movmem (rtx dst, rtx src, rtx > > count_exp, rtx align_exp, > > align = INTVAL (expected_align_exp); > >/* ALIGN is the minimum of destin
Re: libstdc++ Testsuite extension for sort, partial_sort, partial_sort_copy, nth_element
On 10/19/2013 11:10 PM, Christopher Jefferson wrote: I plan to write my tests if these are accepted (although writing this reminded me why gcc is the worst open source project for submitting code to I have ever worked with). Why? If you really think something is wrong and have constructive proposals, please take the discussion outside this mailing list, to g...@gcc.gnu.org, I suppose. Paolo.
[Patch, fortran] PR58793 - Wrong value for _vtab for intrinsic types with CLASS(*): storage_size of class(*) gives wrong result
Dear All, This patch is fairly obvious and follows a suggestion from Tobias to use gfc_element_size. He even wrote the testcase! Bootstrapped and regested on FC17/x86_64 - OK for trunk? Cheers Paul PS In writing this, I have just noted that I need to trap ts->type == BT_HOLLERITH. It would be a rather bizarre thing to try to do but I just know that somebody, somewhere will eventually try it :-) Consider that it will be done in the committed version. 2013-10-20 Paul Thomas PR fortran 57893 * class.c : Include target-memory.h. (gfc_find_intrinsic_vtab) Build a minimal expression so that gfc_element_size can be used to obtain the storage size, rather that the kind value. 2013-10-20 Paul Thomas PR fortran 57893 * gfortran.dg/unlimited_polymorphic_12.f90 : New test. pr58793.diff Description: Binary data
Re: nth_element fix [58800]
Hi, On 10/19/2013 11:02 PM, Christopher Jefferson wrote: A previous fix to a performance problem in std::sort unfortunately has lead to a crashing bug in std::nth_element. The following minimal patch fixes the problem (there are a couple of different ways of fixing it, this is the shortest and safest). 2013-09-19 Chris Jefferson PR libstdc++/58800 * include/bits/stl_algo.h (__unguarded_partition_pivot) : Change __last - 2 to __last - 1, so we correctly work on arrays of length 4. * testsuite/25_algorithms/nth_element/58800.cc : New I have this already applied in my local svn branches and can apply it later today (I fixed the Copyright years to only 2013 of course and reformatted a bit the rest of the testcase) Thanks! Paolo.
Re: [Patch] Change default executor to DFS in regex
On 10/19/2013 11:28 PM, Tim Shen wrote: I see. Here's the macro version. Thanks! Mostly Ok, but the macro is completely undocumented. Please change the comment which says: "... more than certain number of quantifiers..." to actually mention the macro, thus, say: "... more than _GLIBCXX_REGEX_DFS_QUANTIFIERS_LIMIT quantifiers...". I would also add, before the #ifndef, a comment, say "// See below about its meaning. The default value of 1 appears to be a good compromise. Ok with those changes. Thanks, Paolo.
Re: [wide-int] int_traits
Kenneth Zadeck writes: > On 10/19/2013 10:18 AM, Richard Sandiford wrote: >> Kenneth Zadeck writes: >>> On 10/19/2013 05:01 AM, Richard Sandiford wrote: Mike Stump writes: > + // We optimize x < y, where y is 64 or fewer bits. > + // We have to be careful to not allow comparison to a large positive > + // unsigned value like 0x8000, those would be encoded > + // with a y.len == 2. > + if (y.precision <= HOST_BITS_PER_WIDE_INT > + && y.len == 1) I don't get this. If y.precision <= HOST_BITS_PER_WIDE_INT then y.len must be 1. I realise that tree constants can be stored with TREE_INT_CST_NUNITS > TYPE_PRECISION / HOST_BITS_PER_WIDE_INT (so that extensions beyond TYPE_PRECISION are free). But the wide-int code is shielded from that by the ::decompose routine. A wide int never has len > precision / HOST_BITS_PER_WIDE_INT. Thanks, Richard >>> I think that part of this is that neither mike or i really understand >>> how this stuff works anymore. >>> >>> in the old version where we had precision 0, we would wait to >>> canonicalize a constant or a simple integer until we saw what the >>> precision of the other operand was. That was what precison 0 meant. >>> it was kind of like what you are now proposing with this new trait, but >>> for the reason that we actually did not know what to do than some >>> concern about escapement. >>> >>> What i do not understand is what happens what do you get when you bring >>> in an integer variable that is an unsigned HOST_WIDE_INT with the top >>> bit set. In the precision 0 days, if the prec of the other side was 64 >>> or less, the incoming integer took 1 hwi and if the precision was >>> larger, it took two hwis. The canonicalization happened late enough so >>> that there was never a question. >> Ah, I think I know what you mean. The original implementation was: >> >>template >>static inline const HOST_WIDE_INT* >>to_shwi1 (HOST_WIDE_INT *s, unsigned int *l, unsigned int *p, const T& x) >>{ >> s[0] = x; >> if (signedp(x) >> || sizeof (T) < sizeof (HOST_WIDE_INT) >> || ! top_bit_set (x)) >>{ >> *l = 1; >>} >> else >>{ >> s[1] = 0; >> *l = 2; >>} >> *p = 0; >> return s; >>} >> >>void >>wide_int_ro::check_precision (unsigned int *p1, unsigned int *p2, >> bool check_equal ATTRIBUTE_UNUSED, >> bool check_zero ATTRIBUTE_UNUSED) >>{ >> gcc_checking_assert ((!check_zero) || *p1 != 0 || *p2 != 0); >> >> if (*p1 == 0) >>*p1 = *p2; >> >> if (*p2 == 0) >>*p2 = *p1; >> >> gcc_checking_assert ((!check_equal) || *p1 == *p2); >>} >> >>/* Return true if C1 < C2 using signed comparisons. */ >>template >>static inline bool >>lts_p (const T1 &c1, const T2 &c2) >>{ >> bool result; >> HOST_WIDE_INT ws1[WIDE_INT_MAX_ELTS]; >> HOST_WIDE_INT ws2[WIDE_INT_MAX_ELTS]; >> const HOST_WIDE_INT *s1, *s2; /* Returned data */ >> unsigned int cl1, cl2; /* array lengths */ >> unsigned int p1, p2; /* precisions */ >> >> s1 = to_shwi1 (ws1, &cl1, &p1, c1); >> s2 = to_shwi1 (ws2, &cl2, &p2, c2); >> check_precision (&p1, &p2, false, true); >> >> if (p1 <= HOST_BITS_PER_WIDE_INT >> && p2 <= HOST_BITS_PER_WIDE_INT) >>{ >> HOST_WIDE_INT x0 = sext_hwi (s1[0], p1); >> HOST_WIDE_INT x1 = sext_hwi (s2[0], p2); >> result = x0 < x1; >>} >> else >>result = lts_p_large (s1, cl1, p1, s2, cl2, p2); >> >> #ifdef DEBUG_WIDE_INT >> debug_vaa ("wide_int_ro:: %d = (%s lts_p %s\n", result, s1, cl1, p1, >> s2, cl2, p2); >> #endif >> return result; >>} > you need to be careful about asserting too much from the old code. the > time line was: > > 1) we developed the stuff on x86-64 > 2) you did your patch > 3) we ported everything to ppc and our private port. > > i really only became very sensitive to this issue during step 3 because > the x86 does not exhibit these bugs. > > >> So if you had a 128-bit wide_int and T == unsigned HOST_WIDE_INT, >> this lts_p would zero-extend the unsigned HOST_WIDE_INT to 128 bits and >> then do a signed comparison. >> >> The thing here is that the "check_equal" argument is false. >> So if instead you were comparing a 128-bit wide_int with a 64-bit tree >> constant, lts_p would treat that tree constant as a signed 64-bit number, >> even if it was TYPE_UNSIGNED. Similarly if you were comparing a 128-bit >> tree constant and a 64-bit tree constant. You also allowed a comparison >> of a 128-bit wide_int with a 64-bit rtx, again treating the 64-bit rtx >> as signed. > I do not think that this is what check_equal meant because the 0 > precision was a wild card. The 0 precision
[MIPS] Fix mips-ps-[57].c tests after cost model changes
I could just have added an extra command-line option, but I went for this instead. Tested on mips64-linux-gnu and applied. Richard gcc/testsuite/ * gcc.target/mips/mips-ps-5.c: Add alignment attributes. * gcc.target/mips/mips-ps-7.c: Likewise. Index: gcc/testsuite/gcc.target/mips/mips-ps-5.c === --- gcc/testsuite/gcc.target/mips/mips-ps-5.c 2012-08-27 17:31:22.0 +0100 +++ gcc/testsuite/gcc.target/mips/mips-ps-5.c 2013-10-19 18:12:17.462075229 +0100 @@ -2,7 +2,9 @@ /* { dg-options "-mpaired-single -mgp64 -ftree-vectorize" } */ /* { dg-skip-if "requires vectorization" { *-*-* } { "-O0" "-Os" } { "" } } */ -extern float a[], b[], c[]; +extern float a[] __attribute__ ((aligned (8))); +extern float b[] __attribute__ ((aligned (8))); +extern float c[] __attribute__ ((aligned (8))); NOMIPS16 void foo (void) Index: gcc/testsuite/gcc.target/mips/mips-ps-7.c === --- gcc/testsuite/gcc.target/mips/mips-ps-7.c 2012-08-27 17:31:22.0 +0100 +++ gcc/testsuite/gcc.target/mips/mips-ps-7.c 2013-10-19 18:12:31.762186608 +0100 @@ -3,7 +3,9 @@ /* { dg-options "-mgp32 -mpaired-single -ftree-vectorize" } */ /* { dg-skip-if "requires vectorization" { *-*-* } { "-O0" "-Os" } { "" } } */ -extern float a[], b[], c[]; +extern float a[] __attribute__ ((aligned (8))); +extern float b[] __attribute__ ((aligned (8))); +extern float c[] __attribute__ ((aligned (8))); NOMIPS16 void foo (void)
[MIPS] Add bswap patterns
Noticed the other day that we'd never added bswap patterns for WSBH & co. Tested on mips64-linux-gnu with --with-arch=mips64r2 and applied. Richard gcc/ * config/mips/mips.h (ISA_HAS_WSBH): Define. * config/mips/mips.md (UNSPEC_WSBH, UNSPEC_DSBH, UNSPEC_DSHD): New constants. (bswaphi2, bswapsi2, bswapdi2, wsbh, dsbh, dshd): New patterns. gcc/testsuite/ * gcc.target/mips/bswap-1.c, gcc.target/mips/bswap-2.c, gcc.target/mips/bswap-3.c, gcc.target/mips/bswap-4.c, gcc.target/mips/bswap-5.c, gcc.target/mips/bswap-6.c: New tests. Index: gcc/config/mips/mips.h === --- gcc/config/mips/mips.h 2013-10-16 20:38:57.977088830 +0100 +++ gcc/config/mips/mips.h 2013-10-19 11:32:47.636342587 +0100 @@ -972,6 +972,11 @@ #define ISA_HAS_ROR((ISA_MIPS32R2 \ || TARGET_SMARTMIPS) \ && !TARGET_MIPS16) +/* ISA has the WSBH (word swap bytes within halfwords) instruction. + 64-bit targets also provide DSBH and DSHD. */ +#define ISA_HAS_WSBH ((ISA_MIPS32R2 || ISA_MIPS64R2) \ +&& !TARGET_MIPS16) + /* ISA has data prefetch instructions. This controls use of 'pref'. */ #define ISA_HAS_PREFETCH ((ISA_MIPS4 \ || TARGET_LOONGSON_2EF\ Index: gcc/config/mips/mips.md === --- gcc/config/mips/mips.md 2013-10-16 20:38:57.977088830 +0100 +++ gcc/config/mips/mips.md 2013-10-19 11:36:21.984285902 +0100 @@ -74,6 +74,11 @@ (define_c_enum "unspec" [ UNSPEC_STORE_LEFT UNSPEC_STORE_RIGHT + ;; Integer operations that are too cumbersome to describe directly. + UNSPEC_WSBH + UNSPEC_DSBH + UNSPEC_DSHD + ;; Floating-point moves. UNSPEC_LOAD_LOW UNSPEC_LOAD_HIGH @@ -5358,6 +5363,56 @@ (define_insn "rotr3" } [(set_attr "type" "shift") (set_attr "mode" "")]) + +(define_insn "bswaphi2" + [(set (match_operand:HI 0 "register_operand" "=d") + (bswap:HI (match_operand:HI 1 "register_operand" "d")))] + "ISA_HAS_WSBH" + "wsbh\t%0,%1" + [(set_attr "type" "shift")]) + +(define_insn_and_split "bswapsi2" + [(set (match_operand:SI 0 "register_operand" "=d") + (bswap:SI (match_operand:SI 1 "register_operand" "d")))] + "ISA_HAS_WSBH && ISA_HAS_ROR" + "#" + "" + [(set (match_dup 0) (unspec:SI [(match_dup 1)] UNSPEC_WSBH)) + (set (match_dup 0) (rotatert:SI (match_dup 0) (const_int 16)))] + "" + [(set_attr "insn_count" "2")]) + +(define_insn_and_split "bswapdi2" + [(set (match_operand:DI 0 "register_operand" "=d") + (bswap:DI (match_operand:DI 1 "register_operand" "d")))] + "TARGET_64BIT && ISA_HAS_WSBH" + "#" + "" + [(set (match_dup 0) (unspec:DI [(match_dup 1)] UNSPEC_DSBH)) + (set (match_dup 0) (unspec:DI [(match_dup 0)] UNSPEC_DSHD))] + "" + [(set_attr "insn_count" "2")]) + +(define_insn "wsbh" + [(set (match_operand:SI 0 "register_operand" "=d") + (unspec:SI [(match_operand:SI 1 "register_operand" "d")] UNSPEC_WSBH))] + "ISA_HAS_WSBH" + "wsbh\t%0,%1" + [(set_attr "type" "shift")]) + +(define_insn "dsbh" + [(set (match_operand:DI 0 "register_operand" "=d") + (unspec:DI [(match_operand:DI 1 "register_operand" "d")] UNSPEC_DSBH))] + "TARGET_64BIT && ISA_HAS_WSBH" + "dsbh\t%0,%1" + [(set_attr "type" "shift")]) + +(define_insn "dshd" + [(set (match_operand:DI 0 "register_operand" "=d") + (unspec:DI [(match_operand:DI 1 "register_operand" "d")] UNSPEC_DSHD))] + "TARGET_64BIT && ISA_HAS_WSBH" + "dshd\t%0,%1" + [(set_attr "type" "shift")]) ;; ;; Index: gcc/testsuite/gcc.target/mips/bswap-1.c === --- /dev/null 2013-10-13 08:29:45.608935301 +0100 +++ gcc/testsuite/gcc.target/mips/bswap-1.c 2013-10-19 11:30:01.551867196 +0100 @@ -0,0 +1,10 @@ +/* { dg-options "isa_rev>=2" } */ +/* { dg-skip-if "bswap recognition needs expensive optimizations" { *-*-* } { "-O0" "-O1" } { "" } } */ + +NOMIPS16 unsigned short +foo (unsigned short x) +{ + return ((x << 8) & 0xff00) | ((x >> 8) & 0xff); +} + +/* { dg-final { scan-assembler "\twsbh\t" } } */ Index: gcc/testsuite/gcc.target/mips/bswap-2.c === --- /dev/null 2013-10-13 08:29:45.608935301 +0100 +++ gcc/testsuite/gcc.target/mips/bswap-2.c 2013-10-19 11:30:01.551867196 +0100 @@ -0,0 +1,9 @@ +/* { dg-options "isa_rev>=2" } */ + +NOMIPS16 unsigned short +foo (unsigned short x) +{ + return __builtin_bswap16 (x); +} + +/* { dg-final { scan-assembler "\twsbh\t" } } */ Index: gcc/testsuite/gcc.target/mips/bswap-3.c === --- /dev/null 2013-10-13 08:29:45.