Re: Proposal for adding splay_tree_find (to find elements without updating the nodes).

2015-03-15 Thread Trevor Saunders
On Mon, Mar 16, 2015 at 03:33:18AM +, Aditya K wrote:
> 
> 
> 
> > Date: Sun, 15 Mar 2015 02:32:23 -0400
> > From: tbsau...@tbsaunde.org
> > To: gcc@gcc.gnu.org
> > Subject: Re: Proposal for adding splay_tree_find (to find elements without 
> > updating the nodes).
> >
> > hi,
> >
> > I'm only commenting on algorithmic stuff at this point, you should make
> > sure this doesn't regress anything in make check. This stuff only
> > effects code using omp stuff so compiling random c++ is unlikely to test
> > this code at all.
> >
> > Also please follow the style in
> > https://gcc.gnu.org/codingconventions.html
> > and usually try to make new code similar to what's around it.
> >
> > @@ -384,7 +386,7 @@ new_omp_context (enum omp_region_type region_type)
> >
> > c = XCNEW (struct gimplify_omp_ctx);
> > c->outer_context = gimplify_omp_ctxp;
> > - c->variables = splay_tree_new (splay_tree_compare_decl_uid, 0, 0);
> > + //c->variables = splay_tree_new (splay_tree_compare_decl_uid, 0, 0);
> >
> > I don't think this is what you want, xcnew is a calloc wrapper and
> > doesn't call the ctor for gimplify_omp_ctx. For now placement new is
> > probably the simplest way to get what you want.
> >
> Thanks for pointing this out. I'll do it the way c->privatized_types has been 
> allocated.
> e.g., by making c->variables a pointer to std::map and   c->variables = new 
> gimplify_tree_t;

that works too, though I'd be a more descriptive name than
gimplify_tree_t, maybe omp_variables_map?

> 
> 
> > -static void
> > -delete_omp_context (struct gimplify_omp_ctx *c)
> > -{
> > - splay_tree_delete (c->variables);
> > - delete c->privatized_types;
> > - XDELETE (c);
> > -}
> >
> > hm, why?
> >
> My bad, I'll restore this.
> 
> > -gimplify_adjust_omp_clauses_1 (splay_tree_node n, void *data)
> > +gimplify_adjust_omp_clauses_1 (std::pair n, void *data)
> >
> > You can now change the type of data from void * to const
> > gimplify_adjust_omp_clauses_data *
> 
> Done!
> 
> 
> Thanks for the feedback, they were really helpful. I have updated the patch. 
> Please review this.
> Also, although I run `make check` while compiling gcc (with bootstrap 
> enabled), I'm not sure if 'omp' related tests were exercised.
> I'm still unfamiliar with several components of gcc. Any pointers on how to 
> ensure all tests were run, would be useful.


https://gcc.gnu.org/install/test.html should help, though unfortunately
you'll probably find the easiest way to check for regressions is to do
one run of straight trunk, then another with your patch.  Saddly a bunch
of people have own scripts to deal with administrivia, but there isn't a
standardized way that's simple.

Trev

> 
> 
> -Aditya
> 
> 
> 
> 
> >
> > thanks!
> >
> > Trev
> >
> > On Fri, Mar 13, 2015 at 07:32:03PM +, Aditya K wrote:
> >> You're right. I'll change this to:
> >>
> >> /* A stable comparison functor to sort trees.  */
> >> struct tree_compare_decl_uid {
> >>   bool  operator ()(const tree &xa, const tree &xb) const
> >>   {
> >> return DECL_UID (xa) < DECL_UID (xb);
> >>   }
> >> };
> >>
> >> New patch attached.
> >>
> >>
> >> Thanks,
> >> -Aditya
> >>
> >>
> >> 
> >>> Date: Fri, 13 Mar 2015 19:02:11 +
> >>> Subject: Re: Proposal for adding splay_tree_find (to find elements 
> >>> without updating the nodes).
> >>> From: jwakely@gmail.com
> >>> To: hiradi...@msn.com
> >>> CC: richard.guent...@gmail.com; stevenb@gmail.com; gcc@gcc.gnu.org
> >>>
> >>> Are you sure your compare_variables functor is correct?
> >>>
> >>> Subtracting the two values seems very strange for a strict weak ordering.
> >>>
> >>> (Also "compare_variables" is a pretty poor name!)
> >>
> >
> >
> 
> 
> 




RE: Proposal for adding splay_tree_find (to find elements without updating the nodes).

2015-03-15 Thread Aditya K



> Date: Sun, 15 Mar 2015 02:32:23 -0400
> From: tbsau...@tbsaunde.org
> To: gcc@gcc.gnu.org
> Subject: Re: Proposal for adding splay_tree_find (to find elements without 
> updating the nodes).
>
> hi,
>
> I'm only commenting on algorithmic stuff at this point, you should make
> sure this doesn't regress anything in make check. This stuff only
> effects code using omp stuff so compiling random c++ is unlikely to test
> this code at all.
>
> Also please follow the style in
> https://gcc.gnu.org/codingconventions.html
> and usually try to make new code similar to what's around it.
>
> @@ -384,7 +386,7 @@ new_omp_context (enum omp_region_type region_type)
>
> c = XCNEW (struct gimplify_omp_ctx);
> c->outer_context = gimplify_omp_ctxp;
> - c->variables = splay_tree_new (splay_tree_compare_decl_uid, 0, 0);
> + //c->variables = splay_tree_new (splay_tree_compare_decl_uid, 0, 0);
>
> I don't think this is what you want, xcnew is a calloc wrapper and
> doesn't call the ctor for gimplify_omp_ctx. For now placement new is
> probably the simplest way to get what you want.
>
Thanks for pointing this out. I'll do it the way c->privatized_types has been 
allocated.
e.g., by making c->variables a pointer to std::map and   c->variables = new 
gimplify_tree_t;


> -static void
> -delete_omp_context (struct gimplify_omp_ctx *c)
> -{
> - splay_tree_delete (c->variables);
> - delete c->privatized_types;
> - XDELETE (c);
> -}
>
> hm, why?
>
My bad, I'll restore this.

> -gimplify_adjust_omp_clauses_1 (splay_tree_node n, void *data)
> +gimplify_adjust_omp_clauses_1 (std::pair n, void *data)
>
> You can now change the type of data from void * to const
> gimplify_adjust_omp_clauses_data *

Done!


Thanks for the feedback, they were really helpful. I have updated the patch. 
Please review this.
Also, although I run `make check` while compiling gcc (with bootstrap enabled), 
I'm not sure if 'omp' related tests were exercised.
I'm still unfamiliar with several components of gcc. Any pointers on how to 
ensure all tests were run, would be useful.


-Aditya




>
> thanks!
>
> Trev
>
> On Fri, Mar 13, 2015 at 07:32:03PM +, Aditya K wrote:
>> You're right. I'll change this to:
>>
>> /* A stable comparison functor to sort trees.  */
>> struct tree_compare_decl_uid {
>>   bool  operator ()(const tree &xa, const tree &xb) const
>>   {
>> return DECL_UID (xa) < DECL_UID (xb);
>>   }
>> };
>>
>> New patch attached.
>>
>>
>> Thanks,
>> -Aditya
>>
>>
>> 
>>> Date: Fri, 13 Mar 2015 19:02:11 +
>>> Subject: Re: Proposal for adding splay_tree_find (to find elements without 
>>> updating the nodes).
>>> From: jwakely@gmail.com
>>> To: hiradi...@msn.com
>>> CC: richard.guent...@gmail.com; stevenb@gmail.com; gcc@gcc.gnu.org
>>>
>>> Are you sure your compare_variables functor is correct?
>>>
>>> Subtracting the two values seems very strange for a strict weak ordering.
>>>
>>> (Also "compare_variables" is a pretty poor name!)
>>
>
>


  

splay.patch
Description: Binary data


Re: How to implement '@' GDB-like operator for libcc1

2015-03-15 Thread Manuel López-Ibáñez
>
> But GDB features a useful custom expression operator '@':
> https://sourceware.org/gdb/onlinedocs/gdb/Arrays.html
>
> I have problems implementing '@' into GCC, could you suggest at which place
> should I call build_array_type_nelts()?  Or is it the right way at all?
>
> Testing it on a sample code - it should return 2:
> int a[]={1,2,3};int main(void){ return (*a@3)[1]; }

Sorry if I'm missing something, but this example does not make sense
to me. You can directly use a[1], no need for the @-operator here. In
my experience (and again, sorry if I'm missing other interesting
usercases), the utility of @ is to be able to do things like:

print a@3
print (*&(something complicated expression))@(some integer expression)

Thus, the question is what info GDB needs from GCC to be able to print
the contents of the array. And it probably just needs the type of
whatever is to the left of @ and the value of whatever is to the right
of @, no?

Also note that given this code and breaking in main()

int a[]={1,2,3};int main(void){ return (a[1]); }

then:

(gdb) p a
$1 = {1, 2, 3}
(gdb) p a@3
$2 = {{1, 2, 3}, {0, 0, 0}, {0, 0, 0}}

This is because:

(gdb) ptype a
type = int [3]
(gdb)  ptype a@3
type = int [3][3]

Thus, what happens currently when you do?
(gdb) compile int a[]={1,2,3}
(gdb) compile print a

Cheers,

Manuel.


Proposal: implement API for registering custom built-in functions from dynamic plugins

2015-03-15 Thread Mikhail Maltsev
Hi, all.

Currently GCC allows dynamic plugins to register custom attributes and
pragmas. I would like to propose to add API for registering custom
built-in functions.

AFAIK, current plugin API can be used to emulate this feature, for
example by creating a function declaration in PLUGIN_START_UNIT and then
replacing calls to such function in a custom lowering pass. But this
solution seems inelegant: it would be better, if user-defined built-in
functions could use the same "machinery" as normal ones, i.e. use common
classification and flags (RTL-level vs GIMPLE-level; external library
equivalents vs compiler-generated entities), and have unified API for
creating declarations, GIMPLE (or even RTL) expansion, inlining
(callback for is_inexpensive_builtin), etc.

Possible applications of this plugin API include:
* Generating optimized pattern matching functions (e.g. precomputing
tables for Boyer-Moore matcher).
* Generating functions for parsing (compile-time regular expressions).
* Marshalling functions with compile-time type checking (something like
binary printf/scanf with custom format specifiers).
* String formatting and conversion. By the way, there is a guy, who has
written a perl wrapper for GCC, which preprocesses calls to snprintf to
make it work faster: https://github.com/h2o/qrintf... though it's
probably worth enhancing snprintf in GCC, rather than reimplementing it
in plugin - but that's a different story.
* (to some extent): lazy evaluation.
* Custom instrumentation of source code (user or framework-specific
fortification or performance profiling).

Any thoughts on this? Is such API useful for GCC?

-- 
Regards,
Mikhail Maltsev


gcc-5-20150315 is now available

2015-03-15 Thread gccadmin
Snapshot gcc-5-20150315 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/5-20150315/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 5 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/trunk revision 221440

You'll find:

 gcc-5-20150315.tar.bz2   Complete GCC

  MD5=e97e13d88d72c62df053058745671ee5
  SHA1=1727ac6e7c1d53f0f3e4ad91d0bdf29b1bd504af

Diffs from 5-20150308 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-5
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


How to implement '@' GDB-like operator for libcc1

2015-03-15 Thread Jan Kratochvil
Hi,

based on the GCC libcc1 plugin GDB will implement GDB command 'compile print'
equivalent to default 'print' but evaluating the expression by GCC:
https://github.com/tromey/gdb/tree/pmuldoon/compile/print

It is currently done by compiling a .c->.o file and checking its DWARF for the
type and size of the resulting object to print.

But GDB features a useful custom expression operator '@':
https://sourceware.org/gdb/onlinedocs/gdb/Arrays.html

I have problems implementing '@' into GCC, could you suggest at which place
should I call build_array_type_nelts()?  Or is it the right way at all?

Testing it on a sample code - it should return 2:
int a[]={1,2,3};int main(void){ return (*a@3)[1]; }

Proper overloading of '@' to keep it working for ObjC and/or enabling '@' only
for libcc1 I find currently off-topic.

With both c_fully_fold_internal() and gimplify_modify_expr_rhs() hooks #if 0-ed
(as they are in the attached patch) GCC crashes on infinite loop:
<-gimplify_expr<-gimplify_modify_expr<-gimplify_expr<-gimplify_stmt<-
<-gimplify_and_add<-internal_get_tmp_var<-get_formal_tmp_var<-



Thanks,
Jan
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 8c23e09..69fa054 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -1289,6 +1289,25 @@ c_fully_fold_internal (tree expr, bool in_init, bool 
*maybe_const_operands,
   ret = fold (ret);
   goto out;
 
+#if 0 // it would crash at array_ref_low_bound()'s first line
+case ATSIGN_EXPR:
+  orig_op0 = op0 = TREE_OPERAND (expr, 0);
+  orig_op1 = op1 = TREE_OPERAND (expr, 1);
+#if 0 // not needed
+  op0 = c_fully_fold_internal (op0, in_init, maybe_const_operands,
+  maybe_const_itself);
+  STRIP_TYPE_NOPS (op0);
+  op1 = c_fully_fold_internal (op1, in_init, maybe_const_operands,
+  maybe_const_itself);
+  STRIP_TYPE_NOPS (op1);
+#endif
+  ret = build_array_type_nelts (TREE_TYPE (op0), tree_to_uhwi (op1));
+#if 0 // not needed
+  ret = fold (ret);
+#endif
+  goto out;
+#endif
+
 case COMPOUND_EXPR:
 case MODIFY_EXPR:
 case PREDECREMENT_EXPR:
@@ -4078,6 +4097,8 @@ binary_op_error (location_t location, enum tree_code code,
   opname = "||"; break;
 case BIT_XOR_EXPR:
   opname = "^"; break;
+case ATSIGN_EXPR:
+  opname = "@"; break;
 default:
   gcc_unreachable ();
 }
diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c
index bb55be8..ad8b82f 100644
--- a/gcc/c-family/c-lex.c
+++ b/gcc/c-family/c-lex.c
@@ -468,6 +468,8 @@ c_lex_with_flags (tree *value, location_t *loc, unsigned 
char *cpp_flags,
   break;
 
 case CPP_ATSIGN:
+  *value = NULL_TREE;
+  break;
   /* An @ may give the next token special significance in Objective-C.  */
   if (c_dialect_objc ())
{
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index ceb9e1a..5d5a0a8 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -1173,6 +1173,7 @@ enum c_parser_prec {
   PREC_EQ,
   PREC_REL,
   PREC_SHIFT,
+  PREC_ATSIGN,
   PREC_ADD,
   PREC_MULT,
   NUM_PRECS
@@ -6283,6 +6284,10 @@ c_parser_binary_expression (c_parser *parser, struct 
c_expr *after,
  oprec = PREC_ADD;
  ocode = MINUS_EXPR;
  break;
+   case CPP_ATSIGN:
+ oprec = PREC_ATSIGN;
+ ocode = ATSIGN_EXPR;
+ break;
case CPP_LSHIFT:
  oprec = PREC_SHIFT;
  ocode = LSHIFT_EXPR;
@@ -7082,6 +7087,10 @@ c_parser_postfix_expression (c_parser *parser)
   expr.original_type = NULL;
   switch (c_parser_peek_token (parser)->type)
 {
+case CPP_ATSIGN:
+  error_at (loc, "gdbjit hit");
+  expr.value = error_mark_node;
+  break;
 case CPP_NUMBER:
   expr.value = c_parser_peek_token (parser)->value;
   loc = c_parser_peek_token (parser)->location;
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index a3a9c77..0c2f1f9 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -10949,6 +10949,14 @@ build_binary_op (location_t location, enum tree_code 
code,
maybe_warn_bool_compare (location, code, orig_op0, orig_op1);
   break;
 
+case ATSIGN_EXPR:
+  if (TREE_CODE (orig_op1) == INTEGER_CST)
+   {
+ result_type = build_array_type_nelts (type0, tree_to_uhwi (orig_op1));
+ converted = 1;
+   }
+  break;
+
 default:
   gcc_unreachable ();
 }
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index d822913..b4027dc 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -4418,6 +4418,12 @@ gimplify_modify_expr_rhs (tree *expr_p, tree *from_p, 
tree *to_p,
return GS_OK;
  }
  }
+ break;
+   
+#if 0 // it would get hit
+   case ATSIGN_EXPR:
+ gcc_unreachable ();
+#endif
 
default:
  break;
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index d7c049f..cf00457 100644
--- a

Re: Proposal for the coarse grain unrolling heuristics and renaming for the enablement of better fine grain Loop transformation.

2015-03-15 Thread Mikhail Maltsev
15.03.2015 17:44, Ajit Kumar Agarwal writes:
> 
> Hello All:
> 
> The below example is taken from the article by Albert Cohen et.al.
> 
> "Deep Jam : Conversion of Coarse grain Parallelism to Fine grain vector 
> parallelism"

It is not clear from this article, whether such optimization can be
implemented in practice (i.e. automatically and within reasonable time).

Authors implemented some proof-of-concept, but from their description it
looks like they had to tweak their algorithm in each particular case
(and they hope, that later this could be automated using FDO). Also
nothing definite is said about compilation time.

> I would like to propose the above heuristics for unroll and jam and renaming 
> which enables the loop fusion and the IF-merging 
> to achieve the optimize code.

AFAIK, most of loop transformations in GCC are applicable to rather
simple loops, which, at least have an induction variable. So, loops like

while (q)
{
   ...
}

for some arbitrary condition q, e.g.

while (*a++ = *b++)
;

are unlikely to be optimized well. The proposed transformation

> For ( I = 0 ; I < 10; i++)
>   a1 = phi(a1,a2);
>   If ( p1 && p2)
>  a1 = ...;
>  a2 = ...;
>  Else
>If (p1)
>   a1= ;
>   Else
>  If (p2)
>a2 = ...;
> 
>While (q1 && q2)
>{
>... = a1 + ...;
>... = a2 + .;
>}
>   While (q1)
>  = a1 + ...;
>   While ( q2)
> . = a2 + ..;

involves some sort of unroll-and-jam, nontrival code motion, loop
fusion. And it obviously requires analysis of loop-carried data
dependence, that can deal with nested loops having complex conditions
inside them, which GCC does not yet provide.

GCC includes Graphite framework - it allows to perform rather complex
transformations of loops (perhaps similar to what you described), but it
only works for loops which have canonical induction variables and
involve only affine memory access (i.e. addresses must be represented as
linear combinations of induction variables and constants) - no if's, no
while's with non-trivial exit conditions.

So, I think this idea is, unfortunately, not very realistic in terms of
required efforts.

P.S. I'm new in GCC community, but I hope that in general terms my
description is close to reality. Please correct me, if something is wrong.

-- 
Regards,
Mikhail Maltsev


RE: Short Circuit compiler transformations!!

2015-03-15 Thread Ajit Kumar Agarwal


-Original Message-
From: Richard Biener [mailto:richard.guent...@gmail.com] 
Sent: Sunday, March 15, 2015 9:30 PM
To: Ajit Kumar Agarwal; Jeff Law; gcc@gcc.gnu.org
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: RE: Short Circuit compiler transformations!!

On March 15, 2015 11:14:59 AM GMT+01:00, Ajit Kumar Agarwal 
 wrote:
>
>
>-Original Message-
>From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of 
>Ajit Kumar Agarwal
>Sent: Sunday, March 15, 2015 3:35 PM
>To: Richard Biener; Jeff Law; gcc@gcc.gnu.org
>Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju 
>Mekala
>Subject: RE: Short Circuit compiler transformations!!
>
>
>
>-Original Message-
>From: Richard Biener [mailto:richard.guent...@gmail.com]
>Sent: Sunday, March 15, 2015 3:05 PM
>To: Ajit Kumar Agarwal; Jeff Law; gcc@gcc.gnu.org
>Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju 
>Mekala
>Subject: Re: Short Circuit compiler transformations!!
>
>On March 15, 2015 9:15:36 AM GMT+01:00, Ajit Kumar Agarwal 
> wrote:
>>Hello All:
>>
>>Short circuit compiler transformation for conditional branches. The 
>>conditional branches based on the conditional Expressions one of the 
>>path is always executed thus short circuiting the path. Certains
>values
>>of the conditional Expressions makes the conditional expressions
>always
>>true or false.
>>This part of the conditions in extracted out In the IF-THEN 
>>cond_express with the check of the value and in the else case the 
>>original expressions is checked.
>>
>>This makes for a given values of the variables of the conditional 
>>expressions makes the conditional expression as True or false always 
>>and the one path is always executed.
>>
>>For the example given below.
>>
>>For( x1 = lb1; x1 < = ub1 ; x1++)
>>{
>>   If ( C1 && C2)
>> S_then
>> }
>>
>>
>>Fig (1).
>>
>>For the input code given in Fig (1) the condition C1 && C2 is always 
>>false if C1 or C2 is zero(false) thus short circuiting The path and 
>>only s_else will be executed if C1 or C2 is false.
>>
>>Thus the input code is transformed as follows.
>>
>>For( x1 = lb1; x1 < = ub1 ; x1++)
>>{
>>   If(!C1)
>>Continue
>>  else
>>S_then
>> }
>
>>>This looks wrong as you miss to check C2.
>
>>>Riichard: Sorry for the typo error. I could see the short circuit
>transformation is implemented in gcc. I could see only Short circuiting 
>with respect to &&/OR. I am thinking in terms of >>complex expressions 
>where any of the variable condition In the expressions Set to to true 
>or false the whole expressions and such conditions can be extracted out 
>into different versions >>of IF with different parameter check( Another 
>way of short circuiting).
>
>>>I would like to know the scope of this optimization in gcc. If not
>implemented with respect to complex expressions with respect To 
>different iteration spaces. I would like to propose the >>same.

>>Just as a general note/question on all you suggestions.  Are you willing to 
>>implement all your proposals or do you have resources to spend to that effort?

>>It's not that we are not aware of loop optimizations GCC does not implement.
>>It's a question of priorities and resources to implement something.

Yes you are right. We are proposing different optimizations based on our 
Analysis and past experience and we would  be interested in and like
to implement based on the suggestions and inputs from the community. 

We may not implement all, but based on the suggestions and inputs from the 
community we would like to take on the priority basis.

Thanks & Regards
Ajit

Richard.

>Richard: Other aspect of the transformation for short circuiting the 
>IF-THEN-ELSE with the loops where the loop is irreducible.
>In this case implementing the short circuit transformations and the CFG 
>transformations to convert  from irreducibility to reducible Loops 
>along with the transformations of short circuiting compiler 
>transformations.
>
>Thanks & Regards
>Ajit
>
>>>Thoughts Please?
>
>Thanks & Regards
>Ajit
>
>Richard.
>
>>This is much simpler code and there could be complex expressions
>inside
>>the FOR Loop that can be extracted out with Different versions of the 
>>conditional IF-THEN-ELSE inside the loops based on short circuiting 
>>executing one of the path always Executed can be extracted in IF-THEN 
>>and other cases of condition will be inside the else conditions.
>>
>>Again this has to be optimized based on the profile Data of hot 
>>conditional branches and the above optimizations is performed Based on
>
>>the hotness of the conditional branches.
>>
>>The short Circuiting optimization also makes the irreducible loops
>with
>>conditional branches as reducible and there is no need for Conversion 
>>of irreducible loops to reducible loop with any of the existing 
>>approach like node splitting for the short circuiting candidate.
>>
>>This optimizations is implemented in LLVM and I

Re: Proposal for the coarse grain unrolling heuristics and renaming for the enablement of better fine grain Loop transformation.

2015-03-15 Thread Richard Biener
On March 15, 2015 3:44:39 PM GMT+01:00, Ajit Kumar Agarwal 
 wrote:
>
>Hello All:
>
>Below examples are the transformation for the given loop in Fig(1).
>Fig(2) unroll and jam and the Fig(3) does the
>Code motion to bring two IF adjacent to each other and two while loops
>adjacent to each other.
>
>The Fig(4 ) does the IF-merging and the Loop fusion on the transformed
>Loop in Fig (3) which bring two IF adjacent
>and two while Loops adjacent to each other.
>
>This shows how the Loop with conditional branches can be unrolled which
>enables the IF-merging and Loop fusion
>after doing code motion to unrolled loops to bring IF and Loops
>adjacent to each other.
>
>This is the powerful optimization which gets enabled with renaming ,
>unrolling.
>
>Such heuristics needs to be added for Loop unrolling and then later the
>code motions can enable the IF-merging
>and the Loop fusion for the unrolled loop.
>
>The below example is taken from the article by Albert Cohen et.al.
>
>"Deep Jam : Conversion of Coarse grain Parallelism to Fine grain vector
>parallelism"
>
>For ( I = 0; I < 10; i++)
>{
>   If(p)
> a = ...;
>  While(q)
> .. = a + ...;
>}
>
>Fig (1)
>
>For ( I = 0; I < 10; i+=2)
>{
>a1 = phi(a,a2);
>If ( p1)
>   a1 = ;
>While (q1)
>... = a1 + ...;
>If (p2)
>  a2  = ...;
>   a2 = phi(a2, a1);
>   While (q2)
>  = a2 + ;
>
>Fig (2)
>
>For ( I = 0; I < 10; i+=2)
>{
>a1 = phi(a,a2);
>If ( p1)
>   a1 = ;
>   If (p2)
>  a2  = ...;
>   a2 = phi(a2, a1);
>  While (q1)
>... = a1 + ...;
>   While (q2)
>  = a2 + ;
>
>Fig (3)
>
>For ( I = 0 ; I < 10; i++)
>   a1 = phi(a1,a2);
>   If ( p1 && p2)
>  a1 = ...;
>  a2 = ...;
>  Else
>If (p1)
>   a1= ;
>   Else
>  If (p2)
>a2 = ...;
>
>While (q1 && q2)
>{
>... = a1 + ...;
>... = a2 + .;
>}
>   While (q1)
>  = a1 + ...;
>   While ( q2)
> . = a2 + ..;
>
>Fig (4).
>
>I would like to propose the above heuristics for unroll and jam and
>renaming which enables the loop fusion and the IF-merging 
>to achieve the optimize code.
>
>This shows how the coarse grain unrolling heuristics enable fine grain
>Loop transformation.
>
>Thoughts Please?

GCC implements neither outer loop unrolling nor loop fusion.

Richard.

>Thanks & Regards
>Ajit




RE: Short Circuit compiler transformations!!

2015-03-15 Thread Richard Biener
On March 15, 2015 11:14:59 AM GMT+01:00, Ajit Kumar Agarwal 
 wrote:
>
>
>-Original Message-
>From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of
>Ajit Kumar Agarwal
>Sent: Sunday, March 15, 2015 3:35 PM
>To: Richard Biener; Jeff Law; gcc@gcc.gnu.org
>Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju
>Mekala
>Subject: RE: Short Circuit compiler transformations!!
>
>
>
>-Original Message-
>From: Richard Biener [mailto:richard.guent...@gmail.com]
>Sent: Sunday, March 15, 2015 3:05 PM
>To: Ajit Kumar Agarwal; Jeff Law; gcc@gcc.gnu.org
>Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju
>Mekala
>Subject: Re: Short Circuit compiler transformations!!
>
>On March 15, 2015 9:15:36 AM GMT+01:00, Ajit Kumar Agarwal
> wrote:
>>Hello All:
>>
>>Short circuit compiler transformation for conditional branches. The 
>>conditional branches based on the conditional Expressions one of the 
>>path is always executed thus short circuiting the path. Certains
>values 
>>of the conditional Expressions makes the conditional expressions
>always 
>>true or false.
>>This part of the conditions in extracted out In the IF-THEN 
>>cond_express with the check of the value and in the else case the 
>>original expressions is checked.
>>
>>This makes for a given values of the variables of the conditional 
>>expressions makes the conditional expression as True or false always 
>>and the one path is always executed.
>>
>>For the example given below.
>>
>>For( x1 = lb1; x1 < = ub1 ; x1++)
>>{
>>   If ( C1 && C2)
>> S_then
>> }
>>
>>
>>Fig (1).
>>
>>For the input code given in Fig (1) the condition C1 && C2 is always 
>>false if C1 or C2 is zero(false) thus short circuiting The path and 
>>only s_else will be executed if C1 or C2 is false.
>>
>>Thus the input code is transformed as follows.
>>
>>For( x1 = lb1; x1 < = ub1 ; x1++)
>>{
>>   If(!C1)
>>Continue
>>  else
>>S_then
>> }
>
>>>This looks wrong as you miss to check C2.
>
>>>Riichard: Sorry for the typo error. I could see the short circuit
>transformation is implemented in gcc. I could see only Short circuiting
>with respect to &&/OR. I am thinking in terms of >>complex expressions
>where any of the variable condition In the expressions Set to to true
>or false the whole expressions and such conditions can be extracted out
>into different versions >>of IF with different parameter check( Another
>way of short circuiting).
>
>>>I would like to know the scope of this optimization in gcc. If not
>implemented with respect to complex expressions with respect To
>different iteration spaces. I would like to propose the >>same.

Just as a general note/question on all you suggestions.  Are you willing to 
implement all your proposals or do you have resources to spend to that effort?

It's not that we are not aware of loop optimizations GCC does not implement.
It's a question of priorities and resources to implement something.

Richard.

>Richard: Other aspect of the transformation for short circuiting the
>IF-THEN-ELSE with the loops where the loop is irreducible. 
>In this case implementing the short circuit transformations and the CFG
>transformations to convert  from irreducibility to reducible
>Loops along with the transformations of short circuiting compiler
>transformations. 
>
>Thanks & Regards
>Ajit
>
>>>Thoughts Please?
>
>Thanks & Regards
>Ajit
>
>Richard.
>
>>This is much simpler code and there could be complex expressions
>inside 
>>the FOR Loop that can be extracted out with Different versions of the 
>>conditional IF-THEN-ELSE inside the loops based on short circuiting 
>>executing one of the path always Executed can be extracted in IF-THEN 
>>and other cases of condition will be inside the else conditions.
>>
>>Again this has to be optimized based on the profile Data of hot 
>>conditional branches and the above optimizations is performed Based on
>
>>the hotness of the conditional branches.
>>
>>The short Circuiting optimization also makes the irreducible loops
>with 
>>conditional branches as reducible and there is no need for Conversion 
>>of irreducible loops to reducible loop with any of the existing 
>>approach like node splitting for the short circuiting candidate.
>>
>>This optimizations is implemented in LLVM and I would like to know if 
>>this is implemented in GCC. If not implemented I would like To propose
>
>>this short circuit compiler transformation.
>>
>>Thoughts Please ?
>>
>>Thanks & Regards
>>Ajit




Proposal for the coarse grain unrolling heuristics and renaming for the enablement of better fine grain Loop transformation.

2015-03-15 Thread Ajit Kumar Agarwal

Hello All:

Below examples are the transformation for the given loop in Fig(1). Fig(2) 
unroll and jam and the Fig(3) does the
Code motion to bring two IF adjacent to each other and two while loops adjacent 
to each other.

The Fig(4 ) does the IF-merging and the Loop fusion on the transformed Loop in 
Fig (3) which bring two IF adjacent
and two while Loops adjacent to each other.

This shows how the Loop with conditional branches can be unrolled which enables 
the IF-merging and Loop fusion
after doing code motion to unrolled loops to bring IF and Loops adjacent to 
each other.

This is the powerful optimization which gets enabled with renaming , unrolling.

Such heuristics needs to be added for Loop unrolling and then later the code 
motions can enable the IF-merging
and the Loop fusion for the unrolled loop.

The below example is taken from the article by Albert Cohen et.al.

"Deep Jam : Conversion of Coarse grain Parallelism to Fine grain vector 
parallelism"

For ( I = 0; I < 10; i++)
{
   If(p)
 a = ...;
  While(q)
 .. = a + ...;
}

Fig (1)

For ( I = 0; I < 10; i+=2)
{
a1 = phi(a,a2);
If ( p1)
   a1 = ;
While (q1)
... = a1 + ...;
If (p2)
  a2  = ...;
   a2 = phi(a2, a1);
   While (q2)
  = a2 + ;

Fig (2)

For ( I = 0; I < 10; i+=2)
{
a1 = phi(a,a2);
If ( p1)
   a1 = ;
   If (p2)
  a2  = ...;
   a2 = phi(a2, a1);
  While (q1)
... = a1 + ...;
   While (q2)
  = a2 + ;

Fig (3)

For ( I = 0 ; I < 10; i++)
   a1 = phi(a1,a2);
   If ( p1 && p2)
  a1 = ...;
  a2 = ...;
  Else
If (p1)
   a1= ;
   Else
  If (p2)
a2 = ...;

While (q1 && q2)
{
... = a1 + ...;
... = a2 + .;
}
   While (q1)
  = a1 + ...;
   While ( q2)
 . = a2 + ..;

Fig (4).

I would like to propose the above heuristics for unroll and jam and renaming 
which enables the loop fusion and the IF-merging 
to achieve the optimize code.

This shows how the coarse grain unrolling heuristics enable fine grain Loop 
transformation.

Thoughts Please?

Thanks & Regards
Ajit


RE: Function outlining and partial Inlining

2015-03-15 Thread Ajit Kumar Agarwal


-Original Message-
From: Jan Hubicka [mailto:hubi...@ucw.cz] 
Sent: Thursday, February 12, 2015 10:34 PM
To: Ajit Kumar Agarwal
Cc: hubi...@ucw.cz; gcc@gcc.gnu.org; Vinod Kathail; Shail Aditya Gupta; 
Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: Function outlining and partial Inlining

> Hello All:
> 
> The large functions are the important part of high performance 
> application. They contribute to performance bottleneck with many 
> respect. Some of the large hot functions are frequently executed but many 
> regions inside the functions are cold regions. The large Function blocks the 
> function inlining  to happen before of the code size constraints.
> 
> Such cold regions inside the hot large functions can be extracted out 
> and form the function outlining. Thus breaking the large functions Into 
> smaller function segments which causes the functions to be inlined at the 
> caller site or helps in partial inlining.
> 
> LLVM Compiler has the functionality and the optimizations for function 
> outlining based on regions like basic blocks, superblocks and 
> Hyperblocks which gets extracted out into smaller function segments and thus 
> enabling the partial inlining and function inlining to happen At the caller 
> site.
> 
> This optimization is the good case of profile guided optimizations and based 
> on the profile feedback data by the Compiler.
> Without profile information the above function outlining optimizations will 
> not be useful.
> 
> We are doing lot of optimization regarding polymorphism and also the 
> indirect icall promotion based on the profile feedback on the Callgraph 
> profile.
> 
> Are we doing the function outlining optimization in GCC with respect 
> to function inline and partial inline based on profile feedback Data. 
> If not this optimization can be implemented. If already implemented in GCC  
> Can I  know any pointer for such code in GCC and the Scope of this function 
> outlining optimization.

>>The outlining pass is called ipa-split.  The heuristic used is however quite 
>>simplistic and it looks for very specific case where you have small header of 
>>a function containing conditional and >>splits after that.  It does use 
>>profile.

>>Any work on improving the heuristics or providing interesting testcases to 
>>consider would be welcome.

>>I think LLVM pass is doing pretty much the same analysis minus the profile 
>>feedback considerations.  After splitting, LLVm will inline the header into 
>>all callers while GCC leaves this on the >>decision of inliner heuristics 
>>that may just merge the function back into one block.

>>The actual outlining logic is contained in tree-inline.c and also used by 
>>OpenMP.

Honza: LLVM has the logic of extraction of Loops using -loop-extract flags  
where the Loop code region is extracted into a function. The LLVM  has 
the heuristics that the header of the Loop is the entry block and there is a 
single entry to the Loop which does by Loop Simplification
pass in LLVM. Also the heuristics in LLVM has the heuristics that there should 
not be a branch from the entry block to the header of the Loop.

You have mentioned the GCC has the heuristics of having the conditional in the 
header after the splits point. Does GCC has the heuristics for
Conditional to split in the header or the heuristics can be augmented with the 
Loops as done in the LLVM. 

Thanks & Regards
Ajit

Honza
> 
> If not implemented , Can I propose to have the optimization like function 
> outlining in GCC.
> 
> Thoughts Please?
> 
> Thanks & Regards
> Ajit


RE: Short Circuit compiler transformations!!

2015-03-15 Thread Ajit Kumar Agarwal


-Original Message-
From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of Ajit 
Kumar Agarwal
Sent: Sunday, March 15, 2015 3:35 PM
To: Richard Biener; Jeff Law; gcc@gcc.gnu.org
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: RE: Short Circuit compiler transformations!!



-Original Message-
From: Richard Biener [mailto:richard.guent...@gmail.com]
Sent: Sunday, March 15, 2015 3:05 PM
To: Ajit Kumar Agarwal; Jeff Law; gcc@gcc.gnu.org
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: Short Circuit compiler transformations!!

On March 15, 2015 9:15:36 AM GMT+01:00, Ajit Kumar Agarwal 
 wrote:
>Hello All:
>
>Short circuit compiler transformation for conditional branches. The 
>conditional branches based on the conditional Expressions one of the 
>path is always executed thus short circuiting the path. Certains values 
>of the conditional Expressions makes the conditional expressions always 
>true or false.
>This part of the conditions in extracted out In the IF-THEN 
>cond_express with the check of the value and in the else case the 
>original expressions is checked.
>
>This makes for a given values of the variables of the conditional 
>expressions makes the conditional expression as True or false always 
>and the one path is always executed.
>
>For the example given below.
>
>For( x1 = lb1; x1 < = ub1 ; x1++)
>{
>   If ( C1 && C2)
> S_then
> }
>
>
>Fig (1).
>
>For the input code given in Fig (1) the condition C1 && C2 is always 
>false if C1 or C2 is zero(false) thus short circuiting The path and 
>only s_else will be executed if C1 or C2 is false.
>
>Thus the input code is transformed as follows.
>
>For( x1 = lb1; x1 < = ub1 ; x1++)
>{
>   If(!C1)
>Continue
>  else
>S_then
> }

>>This looks wrong as you miss to check C2.

>>Riichard: Sorry for the typo error. I could see the short circuit 
>>transformation is implemented in gcc. I could see only Short circuiting with 
>>respect to &&/OR. I am thinking in terms of >>complex expressions where any 
>>of the variable condition In the expressions Set to to true or false the 
>>whole expressions and such conditions can be extracted out into different 
>>versions >>of IF with different parameter check( Another way of short 
>>circuiting).

>>I would like to know the scope of this optimization in gcc. If not 
>>implemented with respect to complex expressions with respect To different 
>>iteration spaces. I would like to propose the >>same.

Richard: Other aspect of the transformation for short circuiting the 
IF-THEN-ELSE with the loops where the loop is irreducible. 
In this case implementing the short circuit transformations and the CFG 
transformations to convert  from irreducibility to reducible
Loops along with the transformations of short circuiting compiler 
transformations. 

Thanks & Regards
Ajit

>>Thoughts Please?

Thanks & Regards
Ajit

Richard.

>This is much simpler code and there could be complex expressions inside 
>the FOR Loop that can be extracted out with Different versions of the 
>conditional IF-THEN-ELSE inside the loops based on short circuiting 
>executing one of the path always Executed can be extracted in IF-THEN 
>and other cases of condition will be inside the else conditions.
>
>Again this has to be optimized based on the profile Data of hot 
>conditional branches and the above optimizations is performed Based on 
>the hotness of the conditional branches.
>
>The short Circuiting optimization also makes the irreducible loops with 
>conditional branches as reducible and there is no need for Conversion 
>of irreducible loops to reducible loop with any of the existing 
>approach like node splitting for the short circuiting candidate.
>
>This optimizations is implemented in LLVM and I would like to know if 
>this is implemented in GCC. If not implemented I would like To propose 
>this short circuit compiler transformation.
>
>Thoughts Please ?
>
>Thanks & Regards
>Ajit




RE: Short Circuit compiler transformations!!

2015-03-15 Thread Ajit Kumar Agarwal


-Original Message-
From: Richard Biener [mailto:richard.guent...@gmail.com] 
Sent: Sunday, March 15, 2015 3:05 PM
To: Ajit Kumar Agarwal; Jeff Law; gcc@gcc.gnu.org
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: Short Circuit compiler transformations!!

On March 15, 2015 9:15:36 AM GMT+01:00, Ajit Kumar Agarwal 
 wrote:
>Hello All:
>
>Short circuit compiler transformation for conditional branches. The 
>conditional branches based on the conditional Expressions one of the 
>path is always executed thus short circuiting the path. Certains values 
>of the conditional Expressions makes the conditional expressions always 
>true or false.
>This part of the conditions in extracted out In the IF-THEN 
>cond_express with the check of the value and in the else case the 
>original expressions is checked.
>
>This makes for a given values of the variables of the conditional 
>expressions makes the conditional expression as True or false always 
>and the one path is always executed.
>
>For the example given below.
>
>For( x1 = lb1; x1 < = ub1 ; x1++)
>{
>   If ( C1 && C2)
> S_then
> }
>
>
>Fig (1).
>
>For the input code given in Fig (1) the condition C1 && C2 is always 
>false if C1 or C2 is zero(false) thus short circuiting The path and 
>only s_else will be executed if C1 or C2 is false.
>
>Thus the input code is transformed as follows.
>
>For( x1 = lb1; x1 < = ub1 ; x1++)
>{
>   If(!C1)
>Continue
>  else
>S_then
> }

>>This looks wrong as you miss to check C2.

Riichard: Sorry for the typo error. I could see the short circuit 
transformation is implemented in gcc. I could see only
Short circuiting with respect to &&/OR. I am thinking in terms of complex 
expressions where any of the variable condition
In the expressions Set to to true or false the whole expressions and such 
conditions can be extracted out into different versions
of IF with different parameter check( Another way of short circuiting).

I would like to know the scope of this optimization in gcc. If not implemented 
with respect to complex expressions with respect
To different iteration spaces. I would like to propose the same.

Thoughts Please?

Thanks & Regards
Ajit

Richard.

>This is much simpler code and there could be complex expressions inside 
>the FOR Loop that can be extracted out with Different versions of the 
>conditional IF-THEN-ELSE inside the loops based on short circuiting 
>executing one of the path always Executed can be extracted in IF-THEN 
>and other cases of condition will be inside the else conditions.
>
>Again this has to be optimized based on the profile Data of hot 
>conditional branches and the above optimizations is performed Based on 
>the hotness of the conditional branches.
>
>The short Circuiting optimization also makes the irreducible loops with 
>conditional branches as reducible and there is no need for Conversion 
>of irreducible loops to reducible loop with any of the existing 
>approach like node splitting for the short circuiting candidate.
>
>This optimizations is implemented in LLVM and I would like to know if 
>this is implemented in GCC. If not implemented I would like To propose 
>this short circuit compiler transformation.
>
>Thoughts Please ?
>
>Thanks & Regards
>Ajit




Re: Short Circuit compiler transformations!!

2015-03-15 Thread Richard Biener
On March 15, 2015 9:15:36 AM GMT+01:00, Ajit Kumar Agarwal 
 wrote:
>Hello All:
>
>Short circuit compiler transformation for conditional branches. The
>conditional branches based on the conditional
>Expressions one of the path is always executed thus short circuiting
>the path. Certains values of the conditional
>Expressions makes the conditional expressions always true or false.
>This part of the conditions in extracted out
>In the IF-THEN cond_express with the check of the value and in the else
>case the original expressions is checked.
>
>This makes for a given values of the variables of the conditional
>expressions makes the conditional expression as
>True or false always and the one path is always executed.
>
>For the example given below.
>
>For( x1 = lb1; x1 < = ub1 ; x1++)
>{
>   If ( C1 && C2)
> S_then
> }
>
>
>Fig (1).
>
>For the input code given in Fig (1) the condition C1 && C2 is always
>false if C1 or C2 is zero(false) thus short circuiting
>The path and only s_else will be executed if C1 or C2 is false.
>
>Thus the input code is transformed as follows.
>
>For( x1 = lb1; x1 < = ub1 ; x1++)
>{
>   If(!C1)
>Continue
>  else
>S_then
> }

This looks wrong as you miss to check C2.

Richard.

>This is much simpler code and there could be complex expressions inside
>the FOR Loop that can be extracted out with 
>Different versions of the conditional IF-THEN-ELSE inside the loops
>based on short circuiting executing one of the path always
>Executed can be extracted in IF-THEN and other cases of condition will
>be inside the else conditions.
>
>Again this has to be optimized based on the profile Data of hot
>conditional branches and the above optimizations is performed
>Based on the hotness of the conditional branches.
>
>The short Circuiting optimization also makes the irreducible loops with
>conditional branches as reducible and there is no need for
>Conversion of irreducible loops to reducible loop with any of the
>existing approach like node splitting for the short circuiting
>candidate.
>
>This optimizations is implemented in LLVM and I would like to know if
>this is implemented in GCC. If not implemented I would like
>To propose this short circuit compiler transformation.
>
>Thoughts Please ?
>
>Thanks & Regards
>Ajit




Short Circuit compiler transformations!!

2015-03-15 Thread Ajit Kumar Agarwal
Hello All:

Short circuit compiler transformation for conditional branches. The conditional 
branches based on the conditional
Expressions one of the path is always executed thus short circuiting the path. 
Certains values of the conditional
Expressions makes the conditional expressions always true or false. This part 
of the conditions in extracted out
In the IF-THEN cond_express with the check of the value and in the else case 
the original expressions is checked.

This makes for a given values of the variables of the conditional expressions 
makes the conditional expression as
True or false always and the one path is always executed.

For the example given below.

For( x1 = lb1; x1 < = ub1 ; x1++)
{
   If ( C1 && C2)
 S_then
 }


Fig (1).

For the input code given in Fig (1) the condition C1 && C2 is always false if 
C1 or C2 is zero(false) thus short circuiting
The path and only s_else will be executed if C1 or C2 is false.

Thus the input code is transformed as follows.

For( x1 = lb1; x1 < = ub1 ; x1++)
{
   If(!C1)
Continue
  else
S_then
 }

This is much simpler code and there could be complex expressions inside the FOR 
Loop that can be extracted out with 
Different versions of the conditional IF-THEN-ELSE inside the loops based on 
short circuiting executing one of the path always
Executed can be extracted in IF-THEN and other cases of condition will be 
inside the else conditions.

Again this has to be optimized based on the profile Data of hot conditional 
branches and the above optimizations is performed
Based on the hotness of the conditional branches.

The short Circuiting optimization also makes the irreducible loops with 
conditional branches as reducible and there is no need for
Conversion of irreducible loops to reducible loop with any of the existing 
approach like node splitting for the short circuiting candidate.

This optimizations is implemented in LLVM and I would like to know if this is 
implemented in GCC. If not implemented I would like
To propose this short circuit compiler transformation.

Thoughts Please ?

Thanks & Regards
Ajit