Re: automatic dependencies

2013-09-30 Thread Eric Botcazou
> In this particular case it looked easy to reimplement using $(if).
> 
> Could you please try this patch with make 3.80?

It works fine, thanks.

-- 
Eric Botcazou


Getting the ARC port reviewed and accepted

2013-09-30 Thread Jeremy Bennett

Hi all,

You've probably seen that Joern Rennecke (amylaar) has been pinging 
repeatedly for help reviewing the ARC port:


  http://gcc.gnu.org/ml/gcc-patches/2013-09/msg02072.html

Joern is approved as a maintainer, and the tests have been reviewed and 
approved (thanks to Mike Stump). However approximately a year since the 
original submission, after making various changes suggested at that 
time, the port itself still awaits review of acceptance.


We are in the curious position of a port that has a maintainer and 
testsuite accepted, but no actual port.


What can we do to move this to completion for 4.9 stage 1? It is not the 
smallest port (the ARC is a complex reconfigurable processor family), 
but it has been in use for a long time, causes no regression errors in 
other targets, and has been submitted by a long-standing contributor to GCC.


Advice on how to move this forward much appreciated.

Thanks,


Jeremy Bennett

--
Tel:  +44 (1590) 610184
Cell: +44 (7970) 676050
SkypeID: jeremybennett
Twitter: @jeremypbennett
Email:   jeremy.benn...@embecosm.com
Web: www.embecosm.com


Re: where to insert a new Simple IPA [measuring] pass from a plugin ?

2013-09-30 Thread David Malcolm
On Mon, 2013-09-30 at 15:49 +0200, Basile Starynkevitch wrote:
> Hello,
> 
> I want to insert, thru a plugin, a new IPA pass which won't change any 
> internal representation but will just count Gimples and functions at the IPA 
> level.
> 
> (for what it is worth, the plugin is MELT http://gcc-melt.org/
> and the IPA pass is coded in MELT; but  we can safely pretend 
> all this is C++ code - which in fact it is, 
> since MELT generates C++ code).
> 
> 
> BTW, -fdump-passes show me notably:
> 
> 
>tree-cfg:  ON
>*warn_function_return   :  ON
>tree-ompexp :  OFF
>*build_cgraph_edges :  ON
>*free_lang_data :  ON
>ipa-visibility  :  ON
>ipa-early_local_cleanups:  ON
> 
> I have some issues finding the right place for such a pass. I was thinking of 
> passing 
> as the reference_pass_name of some struct register_pass_info either the 
> "ipa-visibility" string or the "visibility" string, but somehow that does not 
> work.
> 
> It is a pity that, AFAIK, a plugin cannot insert its pass in front of 
> the all_small_ipa_passes (the variable inside passes.c) - or of the 
> all_regular_ipa_passes; it would be 
> nice if we had some plugin API for such things. What do you think?

I had a go at implementing this using the python plugin, and I was
successful:  it worked for me (with gcc 4.7.2 fwiw) as a
SIMPLE_IPA_PASS, registering before "*free_lang_data", or as an
IPA_PASS, registering before "whole-program".

I'm attaching the script I wrote, though obviously it will need
translating from Python to MELT.  Hope this is helpful.

Here's the output: it's tallying the kinds of gimple statements since,
by function, within "demo.c":

$ LD_LIBRARY_PATH=gcc-c-api ./gcc-with-python show-stats.py demo.c 
-I/usr/include/python2.7 -c
make_a_list_of_random_ints_badly: [(, 6), (, 5), (, 2), (, 1), (, 1)]
buggy_converter: [(, 6), (, 1), 
(, 1), (, 1), (, 1)]
kwargs_example: [(, 10), (, 1), 
(, 1), (, 1), (, 1)]
too_many_varargs: [(, 7), (, 
1), (, 1), (, 1), (, 1)]
not_enough_varargs: [(, 5), (, 
1), (, 1), (, 1), (, 1)]
socket_htons: [(, 7), (, 3), 
(, 1), (, 1), (, 1)]



> BTW, I also have a suggestion. We have some (relatively new) code 
> (perhaps from 4.8) which suggest names in error messages when an 
> identifier is misspelled.. 
> 
> Can't we use the same code (or at least same algorithm) to suggest 
> a pass name when given a mispelled pass name from a plugin?
> 
> 
> In general, I find that we are a bit lacking documentation about where 
> and how a plugin can insert its own passes.

Agreed.   FWIW I find this map very helpful for this kind of thing:
https://gcc-python-plugin.readthedocs.org/en/latest/tables-of-passes.html
Perhaps it can somehow be integrated into GCC's own documentation.


> Regards.
> 
> PS. See also http://gcc.gnu.org/ml/gcc/2010-11/msg00638.html

Hope this is constructive
Dave
from collections import Counter
import gcc

# We'll implement this as a custom pass, to be called directly before
# 'whole-program'
# See https://gcc-python-plugin.readthedocs.org/en/latest/tables-of-passes.html
# for a map showing how GCC's passes (actually for GCC 4.6)

class CountingPass(gcc.SimpleIpaPass):
def execute(self):
for node in gcc.get_callgraph_nodes():
# Tally gimple statements by type:
stmt_kinds = Counter()
fun = node.decl.function
if fun:
for bb in fun.cfg.basic_blocks:
if bb.gimple:
for stmt in bb.gimple:
stmt_kinds[type(stmt)] += 1
print('%s: %s' % (fun.decl.name, stmt_kinds.most_common(5)))

ps = CountingPass(name='counting-pass')
ps.register_before('*free_lang_data')

# This also works registering before "whole-program", but one has to change the
# base class to a gcc.IpaPass, rather than a gcc.SimpleIpaPass


Re: Invalid store semantics

2013-09-30 Thread Ian Lance Taylor
On Mon, Sep 30, 2013 at 7:46 AM, Umesh Kalappa  wrote:
>
> With the optimisation (-O3) enabled ,the above rtl has been transformed to
>
> (insn 7 6 8 2 (set (reg:SI 24)
> (unspec:SI [
> (mem/c/i:SI (symbol_ref:SI ("lsucCnt2.1746") [flags 0x2] ) [2
> lsucCnt2+0 S4 A16])
> ] 1)) algt_001.c:41 59 {tx03_movw}
> (nil))
>
> (insn 8 7 0 2 (set (mem:SI (reg:SI 24) [0 S4 A16])
> (const_int 10 [0xa])) algt_001.c:41 42 {storesi}
> (expr_list:REG_DEAD (reg:SI 24)
> (nil)))
>
>
> Where insn-6 has been deleted and constant 10 is propagated to insn 8
> an d finally ended emitting instruction like str 10 ,[mem] ,which is
> invalid syntax for store where constant is not allowed.

If your processor does not permit storing a constant to memory, then
the constraints on your storesi insn should reject a constant.

> I'm trying to handle the above problem ,by introducing scratch
> register in the store template and peephole/split it ,where force the
> constant to the scratch register. Before I do the same.

It may be necessary to do this in your movsi pattern, I don't know.
But your storesi pattern doesn't need to accept a constant, and the
constraints should reflect that.

Ian


Re: automatic dependencies

2013-09-30 Thread Tom Tromey
Tom> 2013-09-30  Tom Tromey  
Tom>* Makefile.in (-DTOOLDIR_BASE_PREFIX): Use $(if), not $(and).

I didn't look at this until later and saw that Emacs guessed wrong.
Here's the corrected ChangeLog entry.

2013-09-30  Tom Tromey  

* Makefile.in (DRIVER_DEFINES): Use $(if), not $(and).

Tom


Invalid store semantics

2013-09-30 Thread Umesh Kalappa
Dear All,

I'm looking up the below problem in our private backend.

During the RTL expansion the below rtl has been emitted..

(insn 6 5 7 (set (reg:SI 23)
(const_int 10 [0xa])) algt_001.c:41 -1
(nil))

(insn 7 6 8 (set (reg:SI 24)
(unspec:SI [
(mem/c/i:SI (symbol_ref:SI ("lsucCnt2.1746") [flags 0x2] [2 lsucCnt2+0 S4 A16])
] 1)) algt_001.c:41 -1
(nil))

(insn 8 7 0 (set (mem:SI (plus:SI (reg:SI 24)
(const_int 0 [0])) [0 S4 A16])
(reg:SI 23)) algt_001.c:41 -1
(nil))

With the optimisation (-O3) enabled ,the above rtl has been transformed to

(insn 7 6 8 2 (set (reg:SI 24)
(unspec:SI [
(mem/c/i:SI (symbol_ref:SI ("lsucCnt2.1746") [flags 0x2] ) [2
lsucCnt2+0 S4 A16])
] 1)) algt_001.c:41 59 {tx03_movw}
(nil))

(insn 8 7 0 2 (set (mem:SI (reg:SI 24) [0 S4 A16])
(const_int 10 [0xa])) algt_001.c:41 42 {storesi}
(expr_list:REG_DEAD (reg:SI 24)
(nil)))


Where insn-6 has been deleted and constant 10 is propagated to insn 8
an d finally ended emitting instruction like str 10 ,[mem] ,which is
invalid syntax for store where constant is not allowed.

I'm trying to handle the above problem ,by introducing scratch
register in the store template and peephole/split it ,where force the
constant to the scratch register. Before I do the same.

Would like to know the proposed solution is do able or there exist any
feasible solution out there ???

Looking for some suggestions here

Thanks
~Umesh


Re: automatic dependencies

2013-09-30 Thread Tom Tromey
Eric> Are there any additional prerequisites on the GNU make version?
Eric> On a machine with GNU make 3.80 installed, the bootstrap
Eric> consistently fails with:

Sorry about this.

Eric>   $(and $(SHLIB),$(filter yes,yes),-DENABLE_SHARED_LIBGCC) \

I looked in the GNU make NEWS file and found that $(and ..) was added in
3.81.

In this particular case it looked easy to reimplement using $(if).

Could you please try this patch with make 3.80?

thanks,
Tom

2013-09-30  Tom Tromey  

* Makefile.in (-DTOOLDIR_BASE_PREFIX): Use $(if), not $(and).

Index: Makefile.in
===
--- Makefile.in (revision 202912)
+++ Makefile.in (working copy)
@@ -1924,7 +1924,7 @@
   -DTOOLDIR_BASE_PREFIX=\"$(libsubdir_to_prefix)$(prefix_to_exec_prefix)\" \
   @TARGET_SYSTEM_ROOT_DEFINE@ \
   $(VALGRIND_DRIVER_DEFINES) \
-  $(and $(SHLIB),$(filter yes,@enable_shared@),-DENABLE_SHARED_LIBGCC) \
+  $(if $(SHLIB),$(if $(filter yes,@enable_shared@),-DENABLE_SHARED_LIBGCC)) \
   -DCONFIGURE_SPECS="\"@CONFIGURE_SPECS@\""
 
 CFLAGS-gcc.o += $(DRIVER_DEFINES)


Re: [RFC] Vectorization of indexed elements

2013-09-30 Thread Vidya Praveen
On Mon, Sep 30, 2013 at 02:19:32PM +0100, Richard Biener wrote:
> On Mon, 30 Sep 2013, Vidya Praveen wrote:
> 
> > On Fri, Sep 27, 2013 at 04:19:45PM +0100, Vidya Praveen wrote:
> > > On Fri, Sep 27, 2013 at 03:50:08PM +0100, Vidya Praveen wrote:
> > > [...]
> > > > > > I can't really insist on the single lane load.. something like:
> > > > > > 
> > > > > > vc:V4SI[0] = c
> > > > > > vt:V4SI = vec_duplicate:V4SI (vec_select:SI vc:V4SI 0)
> > > > > > va:V4SI = vb:V4SI  vt:V4SI
> > > > > > 
> > > > > > Or is there any other way to do this?
> > > > > 
> > > > > Can you elaborate on "I can't really insist on the single lane load"?
> > > > > What's the single lane load in your example? 
> > > > 
> > > > Loading just one lane of the vector like this:
> > > > 
> > > > vc:V4SI[0] = c // from the above scalar example
> > > > 
> > > > or 
> > > > 
> > > > vc:V4SI[0] = c[2] 
> > > > 
> > > > is what I meant by single lane load. In this example:
> > > > 
> > > > t = c[2] 
> > > > ...
> > > > vb:v4si = b[0:3] 
> > > > vc:v4si = { t, t, t, t }
> > > > va:v4si = vb:v4si  vc:v4si 
> > > > 
> > > > If we are expanding the CONSTRUCTOR as vec_duplicate at vec_init, I 
> > > > cannot
> > > > insist 't' to be vector and t = c[2] to be vect_t[0] = c[2] (which 
> > > > could be 
> > > > seen as vec_select:SI (vect_t 0) ). 
> > > > 
> > > > > I'd expect the instruction
> > > > > pattern as quoted to just work (and I hope we expand an uniform
> > > > > constructor { a, a, a, a } properly using vec_duplicate).
> > > > 
> > > > As much as I went through the code, this is only done using vect_init. 
> > > > It is
> > > > not expanded as vec_duplicate from, for example, store_constructor() of 
> > > > expr.c
> > > 
> > > Do you see any issues if we expand such constructor as vec_duplicate 
> > > directly 
> > > instead of going through vect_init way? 
> > 
> > Sorry, that was a bad question.
> > 
> > But here's what I would like to propose as a first step. Please tell me if 
> > this
> > is acceptable or if it makes sense:
> > 
> > - Introduce standard pattern names 
> > 
> > "vmulim4" - vector muliply with second operand as indexed operand
> > 
> > Example:
> > 
> > (define_insn "vmuliv4si4"
> >[set (match_operand:V4SI 0 "register_operand")
> > (mul:V4SI (match_operand:V4SI 1 "register_operand")
> >   (vec_duplicate:V4SI
> > (vec_select:SI
> >   (match_operand:V4SI 2 "register_operand")
> >   (match_operand:V4SI 3 "immediate_operand)]
> >  ...
> > )
> 
> We could factor this with providing a standard pattern name for
> 
> (define_insn "vdupi"
>   [set (match_operand: 0 "register_operand")
>(vec_duplicate:
>   (vec_select:
>  (match_operand: 1 "register_operand")
>  (match_operand:SI 2 "immediate_operand]

This is good. I did think about this but then I thought of avoiding the need
for combiner patterns :-) 

But do you find the lane specific mov pattern I proposed, acceptable? 

> (you use V4SI for the immediate?  

Sorry typo again!! It should've been SI.

> Ideally vdupi has another custom
> mode for the vector index).
> 
> Note that this factored pattern is already available as vec_perm_const!
> It is simply (vec_perm_const:V4SI   ).
> 
> Which means that on the GIMPLE level we should try to combine
> 
> el_4 = BIT_FIELD_REF ;
> v_5 = { el_4, el_4, ... };

I don't think we reach this state at all for the scenarios in discussion.
what we generally have is:

 el_4 = MEM_REF < array + index*size >
 v_5 = { el_4, ... }

Or am I missing something?

> 
> into
> 
> v_5 = VEC_PERM_EXPR ;
> 
> which it should already do with simplify_permutation.
> 
> But I'm not sure what you are after at then end ;)
> 
> Richard.
>
 
Regards
VP



where to insert a new Simple IPA [measuring] pass from a plugin ?

2013-09-30 Thread Basile Starynkevitch
Hello,

I want to insert, thru a plugin, a new IPA pass which won't change any 
internal representation but will just count Gimples and functions at the IPA 
level.

(for what it is worth, the plugin is MELT http://gcc-melt.org/
and the IPA pass is coded in MELT; but  we can safely pretend 
all this is C++ code - which in fact it is, 
since MELT generates C++ code).


BTW, -fdump-passes show me notably:


   tree-cfg:  ON
   *warn_function_return   :  ON
   tree-ompexp :  OFF
   *build_cgraph_edges :  ON
   *free_lang_data :  ON
   ipa-visibility  :  ON
   ipa-early_local_cleanups:  ON

I have some issues finding the right place for such a pass. I was thinking of 
passing 
as the reference_pass_name of some struct register_pass_info either the 
"ipa-visibility" string or the "visibility" string, but somehow that does not 
work.


It is a pity that, AFAIK, a plugin cannot insert its pass in front of 
the all_small_ipa_passes (the variable inside passes.c) - or of the 
all_regular_ipa_passes; it would be 
nice if we had some plugin API for such things. What do you think?

BTW, I also have a suggestion. We have some (relatively new) code 
(perhaps from 4.8) which suggest names in error messages when an 
identifier is misspelled.. 

Can't we use the same code (or at least same algorithm) to suggest 
a pass name when given a mispelled pass name from a plugin?


In general, I find that we are a bit lacking documentation about where 
and how a plugin can insert its own passes.

Regards.

PS. See also http://gcc.gnu.org/ml/gcc/2010-11/msg00638.html
-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basilestarynkevitchnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***


Re: [RFC] Vectorization of indexed elements

2013-09-30 Thread Richard Biener
On Mon, 30 Sep 2013, Vidya Praveen wrote:

> On Fri, Sep 27, 2013 at 04:19:45PM +0100, Vidya Praveen wrote:
> > On Fri, Sep 27, 2013 at 03:50:08PM +0100, Vidya Praveen wrote:
> > [...]
> > > > > I can't really insist on the single lane load.. something like:
> > > > > 
> > > > > vc:V4SI[0] = c
> > > > > vt:V4SI = vec_duplicate:V4SI (vec_select:SI vc:V4SI 0)
> > > > > va:V4SI = vb:V4SI  vt:V4SI
> > > > > 
> > > > > Or is there any other way to do this?
> > > > 
> > > > Can you elaborate on "I can't really insist on the single lane load"?
> > > > What's the single lane load in your example? 
> > > 
> > > Loading just one lane of the vector like this:
> > > 
> > > vc:V4SI[0] = c // from the above scalar example
> > > 
> > > or 
> > > 
> > > vc:V4SI[0] = c[2] 
> > > 
> > > is what I meant by single lane load. In this example:
> > > 
> > > t = c[2] 
> > > ...
> > > vb:v4si = b[0:3] 
> > > vc:v4si = { t, t, t, t }
> > > va:v4si = vb:v4si  vc:v4si 
> > > 
> > > If we are expanding the CONSTRUCTOR as vec_duplicate at vec_init, I cannot
> > > insist 't' to be vector and t = c[2] to be vect_t[0] = c[2] (which could 
> > > be 
> > > seen as vec_select:SI (vect_t 0) ). 
> > > 
> > > > I'd expect the instruction
> > > > pattern as quoted to just work (and I hope we expand an uniform
> > > > constructor { a, a, a, a } properly using vec_duplicate).
> > > 
> > > As much as I went through the code, this is only done using vect_init. It 
> > > is
> > > not expanded as vec_duplicate from, for example, store_constructor() of 
> > > expr.c
> > 
> > Do you see any issues if we expand such constructor as vec_duplicate 
> > directly 
> > instead of going through vect_init way? 
> 
> Sorry, that was a bad question.
> 
> But here's what I would like to propose as a first step. Please tell me if 
> this
> is acceptable or if it makes sense:
> 
> - Introduce standard pattern names 
> 
> "vmulim4" - vector muliply with second operand as indexed operand
> 
> Example:
> 
> (define_insn "vmuliv4si4"
>[set (match_operand:V4SI 0 "register_operand")
> (mul:V4SI (match_operand:V4SI 1 "register_operand")
>   (vec_duplicate:V4SI
> (vec_select:SI
>   (match_operand:V4SI 2 "register_operand")
>   (match_operand:V4SI 3 "immediate_operand)]
>  ...
> )

We could factor this with providing a standard pattern name for

(define_insn "vdupi"
  [set (match_operand: 0 "register_operand")
   (vec_duplicate:
  (vec_select:
 (match_operand: 1 "register_operand")
 (match_operand:SI 2 "immediate_operand]

(you use V4SI for the immediate?  Ideally vdupi has another custom
mode for the vector index).

Note that this factored pattern is already available as vec_perm_const!
It is simply (vec_perm_const:V4SI   ).

Which means that on the GIMPLE level we should try to combine

el_4 = BIT_FIELD_REF ;
v_5 = { el_4, el_4, ... };

into

v_5 = VEC_PERM_EXPR ;

which it should already do with simplify_permutation.

But I'm not sure what you are after at then end ;)

Richard.


Re: [RFC] Vectorization of indexed elements

2013-09-30 Thread Vidya Praveen
On Wed, Sep 25, 2013 at 10:22:05AM +0100, Richard Biener wrote:
> On Tue, 24 Sep 2013, Vidya Praveen wrote:
> 
> > On Tue, Sep 10, 2013 at 09:25:32AM +0100, Richard Biener wrote:
> > > On Mon, 9 Sep 2013, Marc Glisse wrote:
> > > 
> > > > On Mon, 9 Sep 2013, Vidya Praveen wrote:
> > > > 
> > > > > Hello,
> > > > > 
> > > > > This post details some thoughts on an enhancement to the vectorizer 
> > > > > that
> > > > > could take advantage of the SIMD instructions that allows indexed 
> > > > > element
> > > > > as an operand thus reducing the need for duplication and possibly 
> > > > > improve
> > > > > reuse of previously loaded data.
> > > > > 
> > > > > Appreciate your opinion on this.
> > > > > 
> > > > > ---
> > > > > 
> > > > > A phrase like this:
> > > > > 
> > > > > for(i=0;i<4;i++)
> > > > >   a[i] = b[i]  c[2];
> > > > > 
> > > > > is usually vectorized as:
> > > > > 
> > > > >  va:V4SI = a[0:3]
> > > > >  vb:V4SI = b[0:3]
> > > > >  t = c[2]
> > > > >  vc:V4SI = { t, t, t, t } // typically expanded as vec_duplicate at 
> > > > > vec_init
> > > > >  ...
> > > > >  va:V4SI = vb:V4SI  vc:V4SI
> > > > > 
> > > > > But this could be simplified further if a target has instructions that
> > > > > support
> > > > > indexed element as a parameter. For example an instruction like this:
> > > > > 
> > > > >  mul v0.4s, v1.4s, v2.4s[2]
> > > > > 
> > > > > can perform multiplication of each element of v2.4s with the third 
> > > > > element
> > > > > of
> > > > > v2.4s (specified as v2.4s[2]) and store the results in the 
> > > > > corresponding
> > > > > elements of v0.4s.
> > > > > 
> > > > > For this to happen, vectorizer needs to understand this idiom and 
> > > > > treat the
> > > > > operand c[2] specially (and by taking in to consideration if the 
> > > > > machine
> > > > > supports indexed element as an operand for  through a target hook 
> > > > > or
> > > > > macro)
> > > > > and consider this as vectorizable statement without having to 
> > > > > duplicate the
> > > > > elements explicitly.
> > > > > 
> > > > > There are fews ways this could be represented at gimple:
> > > > > 
> > > > >  ...
> > > > >  va:V4SI = vb:V4SI  VEC_DUPLICATE_EXPR (VEC_SELECT_EXPR (vc:V4SI 
> > > > > 2))
> > > > >  ...
> > > > > 
> > > > > or by allowing a vectorizer treat an indexed element as a valid 
> > > > > operand in a
> > > > > vectorizable statement:
> > > > 
> > > > Might as well allow any scalar then...
> > > 
> > > I agree.  The VEC_DUPLICATE_EXPR (VEC_SELECT_EXPR (vc:V4SI 2)) form
> > > would necessarily be two extra separate statements and thus subject
> > > to CSE obfuscating it enough for RTL expansion to no longer notice it.
> > 
> > I also thought about having a specialized expression like
> > 
> > VEC_INDEXED__EXPR < arg0, arg1, arg2, index> 
> > 
> > to mean:
> > 
> > arg0 = arg1  arg2[index]
> > 
> > and handle it directly in the expander, like (for eg.) how VEC_LSHIFT_EXPR
> > is handled in expr.c. But I dropped this idea since we may need to introduce
> > many such nodes.
> > 
> > > 
> > > That said, allowing mixed scalar/vector ops isn't very nice and
> > > your scheme can be simplified by just using
> > > 
> > >   vc:V4SI = VEC_DUPLICATE_EXPR <...>
> > >   va:V4SI = vb:V4SI  vc:V4SI
> > > 
> > > where the expander only has to see that vc:V4SI is defined by
> > > a duplicate.
> > 
> > I did try out something like this quickly before I posted this RFC, though
> > I called it VEC_DUP to mean a equivalent of vec_duplicate(vec_select())
> > 
> > for: 
> > 
> >   for(i=0;i<8;i++)
> > a[i] = b[2] * c[i];
> > 
> > I could generate:
> > 
> >   ...
> >   :
> >   _88 = prolog_loop_adjusted_niters.6_60 * 4;
> >   vectp_c.13_87 = c_10(D) + _88;
> >   vect_ldidx_.16_92 = MEM[(int *)b_8(D) + 8B]; 
> >   vect_idxed_.17_93 = (vect_ldidx_.16_92) <<< ??? >>> (0); 
> >   _96 = prolog_loop_adjusted_niters.6_60 * 4;
> >   vectp_a.19_95 = a_6(D) + _96;
> >   vect__12.14_115 = MEM[(int *)vectp_c.13_87];
> >   vect_patt_40.15_116 = vect__12.14_115 * vect_idxed_.17_93;   
> >   MEM[(int *)vectp_a.19_95] = vect_patt_40.15_116; 
> >   vectp_c.12_118 = vectp_c.13_87 + 16;
> >   vectp_a.18_119 = vectp_a.19_95 + 16;
> >   ivtmp_120 = 1;
> >   if (ivtmp_120 < bnd.8_62)
> > goto ;
> >   else
> > goto ;
> > 
> >   :
> >   # vectp_c.12_89 = PHI 
> >   # vectp_a.18_97 = PHI 
> >   # ivtmp_14 = PHI 
> >   vect__12.14_91 = MEM[(int *)vectp_c.12_89];  
> >   vect_patt_40.15_94 = vect__12.14_91 * vect_idxed_.17_93; 
> >   MEM[(int *)vectp_a.18_97] = vect_patt_40.15_94;
> >   ...
> > 
> > It's a crude implementation so VEC_DUP is printed as:
> > 
> >   (vect_ldidx_.16_92) <<< ??? >>> (0);
> > 
> > 
> > > > >  ...
> > > > >  va:V4SI = vb:V4SI  VEC_SELECT_EXPR (vc:V4SI 2)
> > > > >  ...
> > > > >
> > > > > For the sake of explanation, the above two representations assumes 
> > > > > that
> > > > > c[0:3

Re: [RFC] Vectorization of indexed elements

2013-09-30 Thread Vidya Praveen
On Fri, Sep 27, 2013 at 04:19:45PM +0100, Vidya Praveen wrote:
> On Fri, Sep 27, 2013 at 03:50:08PM +0100, Vidya Praveen wrote:
> [...]
> > > > I can't really insist on the single lane load.. something like:
> > > > 
> > > > vc:V4SI[0] = c
> > > > vt:V4SI = vec_duplicate:V4SI (vec_select:SI vc:V4SI 0)
> > > > va:V4SI = vb:V4SI  vt:V4SI
> > > > 
> > > > Or is there any other way to do this?
> > > 
> > > Can you elaborate on "I can't really insist on the single lane load"?
> > > What's the single lane load in your example? 
> > 
> > Loading just one lane of the vector like this:
> > 
> > vc:V4SI[0] = c // from the above scalar example
> > 
> > or 
> > 
> > vc:V4SI[0] = c[2] 
> > 
> > is what I meant by single lane load. In this example:
> > 
> > t = c[2] 
> > ...
> > vb:v4si = b[0:3] 
> > vc:v4si = { t, t, t, t }
> > va:v4si = vb:v4si  vc:v4si 
> > 
> > If we are expanding the CONSTRUCTOR as vec_duplicate at vec_init, I cannot
> > insist 't' to be vector and t = c[2] to be vect_t[0] = c[2] (which could be 
> > seen as vec_select:SI (vect_t 0) ). 
> > 
> > > I'd expect the instruction
> > > pattern as quoted to just work (and I hope we expand an uniform
> > > constructor { a, a, a, a } properly using vec_duplicate).
> > 
> > As much as I went through the code, this is only done using vect_init. It is
> > not expanded as vec_duplicate from, for example, store_constructor() of 
> > expr.c
> 
> Do you see any issues if we expand such constructor as vec_duplicate directly 
> instead of going through vect_init way? 

Sorry, that was a bad question.

But here's what I would like to propose as a first step. Please tell me if this
is acceptable or if it makes sense:

- Introduce standard pattern names 

"vmulim4" - vector muliply with second operand as indexed operand

Example:

(define_insn "vmuliv4si4"
   [set (match_operand:V4SI 0 "register_operand")
(mul:V4SI (match_operand:V4SI 1 "register_operand")
  (vec_duplicate:V4SI
(vec_select:SI
  (match_operand:V4SI 2 "register_operand")
  (match_operand:V4SI 3 "immediate_operand)]
 ...
)

"vlmovmn3" - move where one of the operands is specific lane of a vector and 
 other is a scalar. 

Example:

(define_insn "vlmovv4sisi3"
  [set (vec_select:SI (match_operand:V4SI 0 "register_operand")
  (match_operand:SI 1 "immediate_operand"))
   (match_operand:SI 2 "memory_operand")]
  ...
)

- Identify the following idiom and expand through the above standard patterns:

  t = c[m] 
  vc[0:n] = { t, t, t, t}
  a[0:n] = b[0:n] * vc[0:n] 

as 

 (insn (set (vec_select:SI (reg:V4SI 0) 0) (mem:SI ... )))
 (insn (set (reg:V4SI 1)
(mult:V4SI (reg:V4SI 2)
   (vec_duplicate:V4SI (vec_select:SI (reg:V4SI 0) 0)

If this path is acceptable, then I can extend this to support 

"vmaddim4" - multiply and add (with indexed element as multiplier)
"vmsubim4" - multiply and subtract (with indexed element as multiplier)

Please let me know your thoughts.

Cheers
VP




Re: [gomp4, openacc-1_0-branch] Re: OpenACC branch

2013-09-30 Thread Jakub Jelinek
On Mon, Sep 30, 2013 at 12:05:55AM +0200, Thomas Schwinge wrote:
> Is my understanding correct that the GCC policy regarding extensions such
> as support for OpenACC or OpenMP 4 is: first develop and polish this on a
> branch (such as openacc-1_0-branch or gomp-4_0-branch), and once
> *everything* of the respective standard has been implemented, the
> development branch is then merged into mainline (closing it at the same
> time), instead of committing individual sub-features (such as support for
> only »#pragme acc parallel« but not yet covering the whole respective
> standard) directly to trunk?  The issue with the latter, I assume, is
> that such half-finished implementations in trunk might delay/disturb GCC
> releases.

My actual plan with gomp-4_0-branch is to merge the branch to the trunk
in a week or two.  The missing parts of OpenMP 4.0 support right now are:
1) Fortran support - didn't have spare cycles for it yet, but I think
   initially we can just support C/C++ OpenMP 4.0 and Fortran OpenMP 3.1,
   and as time permits add the missing Fortran support
2) OMP_PLACES/affinity - library only side, plan to work on that this week
3) target support ICV handling - library side only, plan to work on that this 
week
4) elemental functions - this is currently parsed, but ignored, I'd prefer
   this being developed on the gomp-4_0-branch after the branch is merged
   with trunk, then committed
5) offloading - once 3) is supported, we should be hopefully OpenMP 4.0
   compliant with regards to target* constructs, just always do host
   fallback, further development of actual offloading should continue
   on gomp-4_0-branch

Jakub