Re: automatic dependencies
> In this particular case it looked easy to reimplement using $(if). > > Could you please try this patch with make 3.80? It works fine, thanks. -- Eric Botcazou
Getting the ARC port reviewed and accepted
Hi all, You've probably seen that Joern Rennecke (amylaar) has been pinging repeatedly for help reviewing the ARC port: http://gcc.gnu.org/ml/gcc-patches/2013-09/msg02072.html Joern is approved as a maintainer, and the tests have been reviewed and approved (thanks to Mike Stump). However approximately a year since the original submission, after making various changes suggested at that time, the port itself still awaits review of acceptance. We are in the curious position of a port that has a maintainer and testsuite accepted, but no actual port. What can we do to move this to completion for 4.9 stage 1? It is not the smallest port (the ARC is a complex reconfigurable processor family), but it has been in use for a long time, causes no regression errors in other targets, and has been submitted by a long-standing contributor to GCC. Advice on how to move this forward much appreciated. Thanks, Jeremy Bennett -- Tel: +44 (1590) 610184 Cell: +44 (7970) 676050 SkypeID: jeremybennett Twitter: @jeremypbennett Email: jeremy.benn...@embecosm.com Web: www.embecosm.com
Re: where to insert a new Simple IPA [measuring] pass from a plugin ?
On Mon, 2013-09-30 at 15:49 +0200, Basile Starynkevitch wrote: > Hello, > > I want to insert, thru a plugin, a new IPA pass which won't change any > internal representation but will just count Gimples and functions at the IPA > level. > > (for what it is worth, the plugin is MELT http://gcc-melt.org/ > and the IPA pass is coded in MELT; but we can safely pretend > all this is C++ code - which in fact it is, > since MELT generates C++ code). > > > BTW, -fdump-passes show me notably: > > >tree-cfg: ON >*warn_function_return : ON >tree-ompexp : OFF >*build_cgraph_edges : ON >*free_lang_data : ON >ipa-visibility : ON >ipa-early_local_cleanups: ON > > I have some issues finding the right place for such a pass. I was thinking of > passing > as the reference_pass_name of some struct register_pass_info either the > "ipa-visibility" string or the "visibility" string, but somehow that does not > work. > > It is a pity that, AFAIK, a plugin cannot insert its pass in front of > the all_small_ipa_passes (the variable inside passes.c) - or of the > all_regular_ipa_passes; it would be > nice if we had some plugin API for such things. What do you think? I had a go at implementing this using the python plugin, and I was successful: it worked for me (with gcc 4.7.2 fwiw) as a SIMPLE_IPA_PASS, registering before "*free_lang_data", or as an IPA_PASS, registering before "whole-program". I'm attaching the script I wrote, though obviously it will need translating from Python to MELT. Hope this is helpful. Here's the output: it's tallying the kinds of gimple statements since, by function, within "demo.c": $ LD_LIBRARY_PATH=gcc-c-api ./gcc-with-python show-stats.py demo.c -I/usr/include/python2.7 -c make_a_list_of_random_ints_badly: [(, 6), (, 5), (, 2), (, 1), (, 1)] buggy_converter: [(, 6), (, 1), (, 1), (, 1), (, 1)] kwargs_example: [(, 10), (, 1), (, 1), (, 1), (, 1)] too_many_varargs: [(, 7), (, 1), (, 1), (, 1), (, 1)] not_enough_varargs: [(, 5), (, 1), (, 1), (, 1), (, 1)] socket_htons: [(, 7), (, 3), (, 1), (, 1), (, 1)] > BTW, I also have a suggestion. We have some (relatively new) code > (perhaps from 4.8) which suggest names in error messages when an > identifier is misspelled.. > > Can't we use the same code (or at least same algorithm) to suggest > a pass name when given a mispelled pass name from a plugin? > > > In general, I find that we are a bit lacking documentation about where > and how a plugin can insert its own passes. Agreed. FWIW I find this map very helpful for this kind of thing: https://gcc-python-plugin.readthedocs.org/en/latest/tables-of-passes.html Perhaps it can somehow be integrated into GCC's own documentation. > Regards. > > PS. See also http://gcc.gnu.org/ml/gcc/2010-11/msg00638.html Hope this is constructive Dave from collections import Counter import gcc # We'll implement this as a custom pass, to be called directly before # 'whole-program' # See https://gcc-python-plugin.readthedocs.org/en/latest/tables-of-passes.html # for a map showing how GCC's passes (actually for GCC 4.6) class CountingPass(gcc.SimpleIpaPass): def execute(self): for node in gcc.get_callgraph_nodes(): # Tally gimple statements by type: stmt_kinds = Counter() fun = node.decl.function if fun: for bb in fun.cfg.basic_blocks: if bb.gimple: for stmt in bb.gimple: stmt_kinds[type(stmt)] += 1 print('%s: %s' % (fun.decl.name, stmt_kinds.most_common(5))) ps = CountingPass(name='counting-pass') ps.register_before('*free_lang_data') # This also works registering before "whole-program", but one has to change the # base class to a gcc.IpaPass, rather than a gcc.SimpleIpaPass
Re: Invalid store semantics
On Mon, Sep 30, 2013 at 7:46 AM, Umesh Kalappa wrote: > > With the optimisation (-O3) enabled ,the above rtl has been transformed to > > (insn 7 6 8 2 (set (reg:SI 24) > (unspec:SI [ > (mem/c/i:SI (symbol_ref:SI ("lsucCnt2.1746") [flags 0x2] ) [2 > lsucCnt2+0 S4 A16]) > ] 1)) algt_001.c:41 59 {tx03_movw} > (nil)) > > (insn 8 7 0 2 (set (mem:SI (reg:SI 24) [0 S4 A16]) > (const_int 10 [0xa])) algt_001.c:41 42 {storesi} > (expr_list:REG_DEAD (reg:SI 24) > (nil))) > > > Where insn-6 has been deleted and constant 10 is propagated to insn 8 > an d finally ended emitting instruction like str 10 ,[mem] ,which is > invalid syntax for store where constant is not allowed. If your processor does not permit storing a constant to memory, then the constraints on your storesi insn should reject a constant. > I'm trying to handle the above problem ,by introducing scratch > register in the store template and peephole/split it ,where force the > constant to the scratch register. Before I do the same. It may be necessary to do this in your movsi pattern, I don't know. But your storesi pattern doesn't need to accept a constant, and the constraints should reflect that. Ian
Re: automatic dependencies
Tom> 2013-09-30 Tom Tromey Tom>* Makefile.in (-DTOOLDIR_BASE_PREFIX): Use $(if), not $(and). I didn't look at this until later and saw that Emacs guessed wrong. Here's the corrected ChangeLog entry. 2013-09-30 Tom Tromey * Makefile.in (DRIVER_DEFINES): Use $(if), not $(and). Tom
Invalid store semantics
Dear All, I'm looking up the below problem in our private backend. During the RTL expansion the below rtl has been emitted.. (insn 6 5 7 (set (reg:SI 23) (const_int 10 [0xa])) algt_001.c:41 -1 (nil)) (insn 7 6 8 (set (reg:SI 24) (unspec:SI [ (mem/c/i:SI (symbol_ref:SI ("lsucCnt2.1746") [flags 0x2] [2 lsucCnt2+0 S4 A16]) ] 1)) algt_001.c:41 -1 (nil)) (insn 8 7 0 (set (mem:SI (plus:SI (reg:SI 24) (const_int 0 [0])) [0 S4 A16]) (reg:SI 23)) algt_001.c:41 -1 (nil)) With the optimisation (-O3) enabled ,the above rtl has been transformed to (insn 7 6 8 2 (set (reg:SI 24) (unspec:SI [ (mem/c/i:SI (symbol_ref:SI ("lsucCnt2.1746") [flags 0x2] ) [2 lsucCnt2+0 S4 A16]) ] 1)) algt_001.c:41 59 {tx03_movw} (nil)) (insn 8 7 0 2 (set (mem:SI (reg:SI 24) [0 S4 A16]) (const_int 10 [0xa])) algt_001.c:41 42 {storesi} (expr_list:REG_DEAD (reg:SI 24) (nil))) Where insn-6 has been deleted and constant 10 is propagated to insn 8 an d finally ended emitting instruction like str 10 ,[mem] ,which is invalid syntax for store where constant is not allowed. I'm trying to handle the above problem ,by introducing scratch register in the store template and peephole/split it ,where force the constant to the scratch register. Before I do the same. Would like to know the proposed solution is do able or there exist any feasible solution out there ??? Looking for some suggestions here Thanks ~Umesh
Re: automatic dependencies
Eric> Are there any additional prerequisites on the GNU make version? Eric> On a machine with GNU make 3.80 installed, the bootstrap Eric> consistently fails with: Sorry about this. Eric> $(and $(SHLIB),$(filter yes,yes),-DENABLE_SHARED_LIBGCC) \ I looked in the GNU make NEWS file and found that $(and ..) was added in 3.81. In this particular case it looked easy to reimplement using $(if). Could you please try this patch with make 3.80? thanks, Tom 2013-09-30 Tom Tromey * Makefile.in (-DTOOLDIR_BASE_PREFIX): Use $(if), not $(and). Index: Makefile.in === --- Makefile.in (revision 202912) +++ Makefile.in (working copy) @@ -1924,7 +1924,7 @@ -DTOOLDIR_BASE_PREFIX=\"$(libsubdir_to_prefix)$(prefix_to_exec_prefix)\" \ @TARGET_SYSTEM_ROOT_DEFINE@ \ $(VALGRIND_DRIVER_DEFINES) \ - $(and $(SHLIB),$(filter yes,@enable_shared@),-DENABLE_SHARED_LIBGCC) \ + $(if $(SHLIB),$(if $(filter yes,@enable_shared@),-DENABLE_SHARED_LIBGCC)) \ -DCONFIGURE_SPECS="\"@CONFIGURE_SPECS@\"" CFLAGS-gcc.o += $(DRIVER_DEFINES)
Re: [RFC] Vectorization of indexed elements
On Mon, Sep 30, 2013 at 02:19:32PM +0100, Richard Biener wrote: > On Mon, 30 Sep 2013, Vidya Praveen wrote: > > > On Fri, Sep 27, 2013 at 04:19:45PM +0100, Vidya Praveen wrote: > > > On Fri, Sep 27, 2013 at 03:50:08PM +0100, Vidya Praveen wrote: > > > [...] > > > > > > I can't really insist on the single lane load.. something like: > > > > > > > > > > > > vc:V4SI[0] = c > > > > > > vt:V4SI = vec_duplicate:V4SI (vec_select:SI vc:V4SI 0) > > > > > > va:V4SI = vb:V4SI vt:V4SI > > > > > > > > > > > > Or is there any other way to do this? > > > > > > > > > > Can you elaborate on "I can't really insist on the single lane load"? > > > > > What's the single lane load in your example? > > > > > > > > Loading just one lane of the vector like this: > > > > > > > > vc:V4SI[0] = c // from the above scalar example > > > > > > > > or > > > > > > > > vc:V4SI[0] = c[2] > > > > > > > > is what I meant by single lane load. In this example: > > > > > > > > t = c[2] > > > > ... > > > > vb:v4si = b[0:3] > > > > vc:v4si = { t, t, t, t } > > > > va:v4si = vb:v4si vc:v4si > > > > > > > > If we are expanding the CONSTRUCTOR as vec_duplicate at vec_init, I > > > > cannot > > > > insist 't' to be vector and t = c[2] to be vect_t[0] = c[2] (which > > > > could be > > > > seen as vec_select:SI (vect_t 0) ). > > > > > > > > > I'd expect the instruction > > > > > pattern as quoted to just work (and I hope we expand an uniform > > > > > constructor { a, a, a, a } properly using vec_duplicate). > > > > > > > > As much as I went through the code, this is only done using vect_init. > > > > It is > > > > not expanded as vec_duplicate from, for example, store_constructor() of > > > > expr.c > > > > > > Do you see any issues if we expand such constructor as vec_duplicate > > > directly > > > instead of going through vect_init way? > > > > Sorry, that was a bad question. > > > > But here's what I would like to propose as a first step. Please tell me if > > this > > is acceptable or if it makes sense: > > > > - Introduce standard pattern names > > > > "vmulim4" - vector muliply with second operand as indexed operand > > > > Example: > > > > (define_insn "vmuliv4si4" > >[set (match_operand:V4SI 0 "register_operand") > > (mul:V4SI (match_operand:V4SI 1 "register_operand") > > (vec_duplicate:V4SI > > (vec_select:SI > > (match_operand:V4SI 2 "register_operand") > > (match_operand:V4SI 3 "immediate_operand)] > > ... > > ) > > We could factor this with providing a standard pattern name for > > (define_insn "vdupi" > [set (match_operand: 0 "register_operand") >(vec_duplicate: > (vec_select: > (match_operand: 1 "register_operand") > (match_operand:SI 2 "immediate_operand] This is good. I did think about this but then I thought of avoiding the need for combiner patterns :-) But do you find the lane specific mov pattern I proposed, acceptable? > (you use V4SI for the immediate? Sorry typo again!! It should've been SI. > Ideally vdupi has another custom > mode for the vector index). > > Note that this factored pattern is already available as vec_perm_const! > It is simply (vec_perm_const:V4SI ). > > Which means that on the GIMPLE level we should try to combine > > el_4 = BIT_FIELD_REF ; > v_5 = { el_4, el_4, ... }; I don't think we reach this state at all for the scenarios in discussion. what we generally have is: el_4 = MEM_REF < array + index*size > v_5 = { el_4, ... } Or am I missing something? > > into > > v_5 = VEC_PERM_EXPR ; > > which it should already do with simplify_permutation. > > But I'm not sure what you are after at then end ;) > > Richard. > Regards VP
where to insert a new Simple IPA [measuring] pass from a plugin ?
Hello, I want to insert, thru a plugin, a new IPA pass which won't change any internal representation but will just count Gimples and functions at the IPA level. (for what it is worth, the plugin is MELT http://gcc-melt.org/ and the IPA pass is coded in MELT; but we can safely pretend all this is C++ code - which in fact it is, since MELT generates C++ code). BTW, -fdump-passes show me notably: tree-cfg: ON *warn_function_return : ON tree-ompexp : OFF *build_cgraph_edges : ON *free_lang_data : ON ipa-visibility : ON ipa-early_local_cleanups: ON I have some issues finding the right place for such a pass. I was thinking of passing as the reference_pass_name of some struct register_pass_info either the "ipa-visibility" string or the "visibility" string, but somehow that does not work. It is a pity that, AFAIK, a plugin cannot insert its pass in front of the all_small_ipa_passes (the variable inside passes.c) - or of the all_regular_ipa_passes; it would be nice if we had some plugin API for such things. What do you think? BTW, I also have a suggestion. We have some (relatively new) code (perhaps from 4.8) which suggest names in error messages when an identifier is misspelled.. Can't we use the same code (or at least same algorithm) to suggest a pass name when given a mispelled pass name from a plugin? In general, I find that we are a bit lacking documentation about where and how a plugin can insert its own passes. Regards. PS. See also http://gcc.gnu.org/ml/gcc/2010-11/msg00638.html -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basilestarynkevitchnet mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mines, sont seulement les miennes} ***
Re: [RFC] Vectorization of indexed elements
On Mon, 30 Sep 2013, Vidya Praveen wrote: > On Fri, Sep 27, 2013 at 04:19:45PM +0100, Vidya Praveen wrote: > > On Fri, Sep 27, 2013 at 03:50:08PM +0100, Vidya Praveen wrote: > > [...] > > > > > I can't really insist on the single lane load.. something like: > > > > > > > > > > vc:V4SI[0] = c > > > > > vt:V4SI = vec_duplicate:V4SI (vec_select:SI vc:V4SI 0) > > > > > va:V4SI = vb:V4SI vt:V4SI > > > > > > > > > > Or is there any other way to do this? > > > > > > > > Can you elaborate on "I can't really insist on the single lane load"? > > > > What's the single lane load in your example? > > > > > > Loading just one lane of the vector like this: > > > > > > vc:V4SI[0] = c // from the above scalar example > > > > > > or > > > > > > vc:V4SI[0] = c[2] > > > > > > is what I meant by single lane load. In this example: > > > > > > t = c[2] > > > ... > > > vb:v4si = b[0:3] > > > vc:v4si = { t, t, t, t } > > > va:v4si = vb:v4si vc:v4si > > > > > > If we are expanding the CONSTRUCTOR as vec_duplicate at vec_init, I cannot > > > insist 't' to be vector and t = c[2] to be vect_t[0] = c[2] (which could > > > be > > > seen as vec_select:SI (vect_t 0) ). > > > > > > > I'd expect the instruction > > > > pattern as quoted to just work (and I hope we expand an uniform > > > > constructor { a, a, a, a } properly using vec_duplicate). > > > > > > As much as I went through the code, this is only done using vect_init. It > > > is > > > not expanded as vec_duplicate from, for example, store_constructor() of > > > expr.c > > > > Do you see any issues if we expand such constructor as vec_duplicate > > directly > > instead of going through vect_init way? > > Sorry, that was a bad question. > > But here's what I would like to propose as a first step. Please tell me if > this > is acceptable or if it makes sense: > > - Introduce standard pattern names > > "vmulim4" - vector muliply with second operand as indexed operand > > Example: > > (define_insn "vmuliv4si4" >[set (match_operand:V4SI 0 "register_operand") > (mul:V4SI (match_operand:V4SI 1 "register_operand") > (vec_duplicate:V4SI > (vec_select:SI > (match_operand:V4SI 2 "register_operand") > (match_operand:V4SI 3 "immediate_operand)] > ... > ) We could factor this with providing a standard pattern name for (define_insn "vdupi" [set (match_operand: 0 "register_operand") (vec_duplicate: (vec_select: (match_operand: 1 "register_operand") (match_operand:SI 2 "immediate_operand] (you use V4SI for the immediate? Ideally vdupi has another custom mode for the vector index). Note that this factored pattern is already available as vec_perm_const! It is simply (vec_perm_const:V4SI ). Which means that on the GIMPLE level we should try to combine el_4 = BIT_FIELD_REF ; v_5 = { el_4, el_4, ... }; into v_5 = VEC_PERM_EXPR ; which it should already do with simplify_permutation. But I'm not sure what you are after at then end ;) Richard.
Re: [RFC] Vectorization of indexed elements
On Wed, Sep 25, 2013 at 10:22:05AM +0100, Richard Biener wrote: > On Tue, 24 Sep 2013, Vidya Praveen wrote: > > > On Tue, Sep 10, 2013 at 09:25:32AM +0100, Richard Biener wrote: > > > On Mon, 9 Sep 2013, Marc Glisse wrote: > > > > > > > On Mon, 9 Sep 2013, Vidya Praveen wrote: > > > > > > > > > Hello, > > > > > > > > > > This post details some thoughts on an enhancement to the vectorizer > > > > > that > > > > > could take advantage of the SIMD instructions that allows indexed > > > > > element > > > > > as an operand thus reducing the need for duplication and possibly > > > > > improve > > > > > reuse of previously loaded data. > > > > > > > > > > Appreciate your opinion on this. > > > > > > > > > > --- > > > > > > > > > > A phrase like this: > > > > > > > > > > for(i=0;i<4;i++) > > > > > a[i] = b[i] c[2]; > > > > > > > > > > is usually vectorized as: > > > > > > > > > > va:V4SI = a[0:3] > > > > > vb:V4SI = b[0:3] > > > > > t = c[2] > > > > > vc:V4SI = { t, t, t, t } // typically expanded as vec_duplicate at > > > > > vec_init > > > > > ... > > > > > va:V4SI = vb:V4SI vc:V4SI > > > > > > > > > > But this could be simplified further if a target has instructions that > > > > > support > > > > > indexed element as a parameter. For example an instruction like this: > > > > > > > > > > mul v0.4s, v1.4s, v2.4s[2] > > > > > > > > > > can perform multiplication of each element of v2.4s with the third > > > > > element > > > > > of > > > > > v2.4s (specified as v2.4s[2]) and store the results in the > > > > > corresponding > > > > > elements of v0.4s. > > > > > > > > > > For this to happen, vectorizer needs to understand this idiom and > > > > > treat the > > > > > operand c[2] specially (and by taking in to consideration if the > > > > > machine > > > > > supports indexed element as an operand for through a target hook > > > > > or > > > > > macro) > > > > > and consider this as vectorizable statement without having to > > > > > duplicate the > > > > > elements explicitly. > > > > > > > > > > There are fews ways this could be represented at gimple: > > > > > > > > > > ... > > > > > va:V4SI = vb:V4SI VEC_DUPLICATE_EXPR (VEC_SELECT_EXPR (vc:V4SI > > > > > 2)) > > > > > ... > > > > > > > > > > or by allowing a vectorizer treat an indexed element as a valid > > > > > operand in a > > > > > vectorizable statement: > > > > > > > > Might as well allow any scalar then... > > > > > > I agree. The VEC_DUPLICATE_EXPR (VEC_SELECT_EXPR (vc:V4SI 2)) form > > > would necessarily be two extra separate statements and thus subject > > > to CSE obfuscating it enough for RTL expansion to no longer notice it. > > > > I also thought about having a specialized expression like > > > > VEC_INDEXED__EXPR < arg0, arg1, arg2, index> > > > > to mean: > > > > arg0 = arg1 arg2[index] > > > > and handle it directly in the expander, like (for eg.) how VEC_LSHIFT_EXPR > > is handled in expr.c. But I dropped this idea since we may need to introduce > > many such nodes. > > > > > > > > That said, allowing mixed scalar/vector ops isn't very nice and > > > your scheme can be simplified by just using > > > > > > vc:V4SI = VEC_DUPLICATE_EXPR <...> > > > va:V4SI = vb:V4SI vc:V4SI > > > > > > where the expander only has to see that vc:V4SI is defined by > > > a duplicate. > > > > I did try out something like this quickly before I posted this RFC, though > > I called it VEC_DUP to mean a equivalent of vec_duplicate(vec_select()) > > > > for: > > > > for(i=0;i<8;i++) > > a[i] = b[2] * c[i]; > > > > I could generate: > > > > ... > > : > > _88 = prolog_loop_adjusted_niters.6_60 * 4; > > vectp_c.13_87 = c_10(D) + _88; > > vect_ldidx_.16_92 = MEM[(int *)b_8(D) + 8B]; > > vect_idxed_.17_93 = (vect_ldidx_.16_92) <<< ??? >>> (0); > > _96 = prolog_loop_adjusted_niters.6_60 * 4; > > vectp_a.19_95 = a_6(D) + _96; > > vect__12.14_115 = MEM[(int *)vectp_c.13_87]; > > vect_patt_40.15_116 = vect__12.14_115 * vect_idxed_.17_93; > > MEM[(int *)vectp_a.19_95] = vect_patt_40.15_116; > > vectp_c.12_118 = vectp_c.13_87 + 16; > > vectp_a.18_119 = vectp_a.19_95 + 16; > > ivtmp_120 = 1; > > if (ivtmp_120 < bnd.8_62) > > goto ; > > else > > goto ; > > > > : > > # vectp_c.12_89 = PHI > > # vectp_a.18_97 = PHI > > # ivtmp_14 = PHI > > vect__12.14_91 = MEM[(int *)vectp_c.12_89]; > > vect_patt_40.15_94 = vect__12.14_91 * vect_idxed_.17_93; > > MEM[(int *)vectp_a.18_97] = vect_patt_40.15_94; > > ... > > > > It's a crude implementation so VEC_DUP is printed as: > > > > (vect_ldidx_.16_92) <<< ??? >>> (0); > > > > > > > > > ... > > > > > va:V4SI = vb:V4SI VEC_SELECT_EXPR (vc:V4SI 2) > > > > > ... > > > > > > > > > > For the sake of explanation, the above two representations assumes > > > > > that > > > > > c[0:3
Re: [RFC] Vectorization of indexed elements
On Fri, Sep 27, 2013 at 04:19:45PM +0100, Vidya Praveen wrote: > On Fri, Sep 27, 2013 at 03:50:08PM +0100, Vidya Praveen wrote: > [...] > > > > I can't really insist on the single lane load.. something like: > > > > > > > > vc:V4SI[0] = c > > > > vt:V4SI = vec_duplicate:V4SI (vec_select:SI vc:V4SI 0) > > > > va:V4SI = vb:V4SI vt:V4SI > > > > > > > > Or is there any other way to do this? > > > > > > Can you elaborate on "I can't really insist on the single lane load"? > > > What's the single lane load in your example? > > > > Loading just one lane of the vector like this: > > > > vc:V4SI[0] = c // from the above scalar example > > > > or > > > > vc:V4SI[0] = c[2] > > > > is what I meant by single lane load. In this example: > > > > t = c[2] > > ... > > vb:v4si = b[0:3] > > vc:v4si = { t, t, t, t } > > va:v4si = vb:v4si vc:v4si > > > > If we are expanding the CONSTRUCTOR as vec_duplicate at vec_init, I cannot > > insist 't' to be vector and t = c[2] to be vect_t[0] = c[2] (which could be > > seen as vec_select:SI (vect_t 0) ). > > > > > I'd expect the instruction > > > pattern as quoted to just work (and I hope we expand an uniform > > > constructor { a, a, a, a } properly using vec_duplicate). > > > > As much as I went through the code, this is only done using vect_init. It is > > not expanded as vec_duplicate from, for example, store_constructor() of > > expr.c > > Do you see any issues if we expand such constructor as vec_duplicate directly > instead of going through vect_init way? Sorry, that was a bad question. But here's what I would like to propose as a first step. Please tell me if this is acceptable or if it makes sense: - Introduce standard pattern names "vmulim4" - vector muliply with second operand as indexed operand Example: (define_insn "vmuliv4si4" [set (match_operand:V4SI 0 "register_operand") (mul:V4SI (match_operand:V4SI 1 "register_operand") (vec_duplicate:V4SI (vec_select:SI (match_operand:V4SI 2 "register_operand") (match_operand:V4SI 3 "immediate_operand)] ... ) "vlmovmn3" - move where one of the operands is specific lane of a vector and other is a scalar. Example: (define_insn "vlmovv4sisi3" [set (vec_select:SI (match_operand:V4SI 0 "register_operand") (match_operand:SI 1 "immediate_operand")) (match_operand:SI 2 "memory_operand")] ... ) - Identify the following idiom and expand through the above standard patterns: t = c[m] vc[0:n] = { t, t, t, t} a[0:n] = b[0:n] * vc[0:n] as (insn (set (vec_select:SI (reg:V4SI 0) 0) (mem:SI ... ))) (insn (set (reg:V4SI 1) (mult:V4SI (reg:V4SI 2) (vec_duplicate:V4SI (vec_select:SI (reg:V4SI 0) 0) If this path is acceptable, then I can extend this to support "vmaddim4" - multiply and add (with indexed element as multiplier) "vmsubim4" - multiply and subtract (with indexed element as multiplier) Please let me know your thoughts. Cheers VP
Re: [gomp4, openacc-1_0-branch] Re: OpenACC branch
On Mon, Sep 30, 2013 at 12:05:55AM +0200, Thomas Schwinge wrote: > Is my understanding correct that the GCC policy regarding extensions such > as support for OpenACC or OpenMP 4 is: first develop and polish this on a > branch (such as openacc-1_0-branch or gomp-4_0-branch), and once > *everything* of the respective standard has been implemented, the > development branch is then merged into mainline (closing it at the same > time), instead of committing individual sub-features (such as support for > only »#pragme acc parallel« but not yet covering the whole respective > standard) directly to trunk? The issue with the latter, I assume, is > that such half-finished implementations in trunk might delay/disturb GCC > releases. My actual plan with gomp-4_0-branch is to merge the branch to the trunk in a week or two. The missing parts of OpenMP 4.0 support right now are: 1) Fortran support - didn't have spare cycles for it yet, but I think initially we can just support C/C++ OpenMP 4.0 and Fortran OpenMP 3.1, and as time permits add the missing Fortran support 2) OMP_PLACES/affinity - library only side, plan to work on that this week 3) target support ICV handling - library side only, plan to work on that this week 4) elemental functions - this is currently parsed, but ignored, I'd prefer this being developed on the gomp-4_0-branch after the branch is merged with trunk, then committed 5) offloading - once 3) is supported, we should be hopefully OpenMP 4.0 compliant with regards to target* constructs, just always do host fallback, further development of actual offloading should continue on gomp-4_0-branch Jakub