Re: Proposal for adding splay_tree_find (to find elements without updating the nodes).
On Mon, Mar 16, 2015 at 03:33:18AM +, Aditya K wrote: > > > > > Date: Sun, 15 Mar 2015 02:32:23 -0400 > > From: tbsau...@tbsaunde.org > > To: gcc@gcc.gnu.org > > Subject: Re: Proposal for adding splay_tree_find (to find elements without > > updating the nodes). > > > > hi, > > > > I'm only commenting on algorithmic stuff at this point, you should make > > sure this doesn't regress anything in make check. This stuff only > > effects code using omp stuff so compiling random c++ is unlikely to test > > this code at all. > > > > Also please follow the style in > > https://gcc.gnu.org/codingconventions.html > > and usually try to make new code similar to what's around it. > > > > @@ -384,7 +386,7 @@ new_omp_context (enum omp_region_type region_type) > > > > c = XCNEW (struct gimplify_omp_ctx); > > c->outer_context = gimplify_omp_ctxp; > > - c->variables = splay_tree_new (splay_tree_compare_decl_uid, 0, 0); > > + //c->variables = splay_tree_new (splay_tree_compare_decl_uid, 0, 0); > > > > I don't think this is what you want, xcnew is a calloc wrapper and > > doesn't call the ctor for gimplify_omp_ctx. For now placement new is > > probably the simplest way to get what you want. > > > Thanks for pointing this out. I'll do it the way c->privatized_types has been > allocated. > e.g., by making c->variables a pointer to std::map and c->variables = new > gimplify_tree_t; that works too, though I'd be a more descriptive name than gimplify_tree_t, maybe omp_variables_map? > > > > -static void > > -delete_omp_context (struct gimplify_omp_ctx *c) > > -{ > > - splay_tree_delete (c->variables); > > - delete c->privatized_types; > > - XDELETE (c); > > -} > > > > hm, why? > > > My bad, I'll restore this. > > > -gimplify_adjust_omp_clauses_1 (splay_tree_node n, void *data) > > +gimplify_adjust_omp_clauses_1 (std::pair n, void *data) > > > > You can now change the type of data from void * to const > > gimplify_adjust_omp_clauses_data * > > Done! > > > Thanks for the feedback, they were really helpful. I have updated the patch. > Please review this. > Also, although I run `make check` while compiling gcc (with bootstrap > enabled), I'm not sure if 'omp' related tests were exercised. > I'm still unfamiliar with several components of gcc. Any pointers on how to > ensure all tests were run, would be useful. https://gcc.gnu.org/install/test.html should help, though unfortunately you'll probably find the easiest way to check for regressions is to do one run of straight trunk, then another with your patch. Saddly a bunch of people have own scripts to deal with administrivia, but there isn't a standardized way that's simple. Trev > > > -Aditya > > > > > > > > thanks! > > > > Trev > > > > On Fri, Mar 13, 2015 at 07:32:03PM +, Aditya K wrote: > >> You're right. I'll change this to: > >> > >> /* A stable comparison functor to sort trees. */ > >> struct tree_compare_decl_uid { > >> bool operator ()(const tree &xa, const tree &xb) const > >> { > >> return DECL_UID (xa) < DECL_UID (xb); > >> } > >> }; > >> > >> New patch attached. > >> > >> > >> Thanks, > >> -Aditya > >> > >> > >> > >>> Date: Fri, 13 Mar 2015 19:02:11 + > >>> Subject: Re: Proposal for adding splay_tree_find (to find elements > >>> without updating the nodes). > >>> From: jwakely@gmail.com > >>> To: hiradi...@msn.com > >>> CC: richard.guent...@gmail.com; stevenb@gmail.com; gcc@gcc.gnu.org > >>> > >>> Are you sure your compare_variables functor is correct? > >>> > >>> Subtracting the two values seems very strange for a strict weak ordering. > >>> > >>> (Also "compare_variables" is a pretty poor name!) > >> > > > > > > >
RE: Proposal for adding splay_tree_find (to find elements without updating the nodes).
> Date: Sun, 15 Mar 2015 02:32:23 -0400 > From: tbsau...@tbsaunde.org > To: gcc@gcc.gnu.org > Subject: Re: Proposal for adding splay_tree_find (to find elements without > updating the nodes). > > hi, > > I'm only commenting on algorithmic stuff at this point, you should make > sure this doesn't regress anything in make check. This stuff only > effects code using omp stuff so compiling random c++ is unlikely to test > this code at all. > > Also please follow the style in > https://gcc.gnu.org/codingconventions.html > and usually try to make new code similar to what's around it. > > @@ -384,7 +386,7 @@ new_omp_context (enum omp_region_type region_type) > > c = XCNEW (struct gimplify_omp_ctx); > c->outer_context = gimplify_omp_ctxp; > - c->variables = splay_tree_new (splay_tree_compare_decl_uid, 0, 0); > + //c->variables = splay_tree_new (splay_tree_compare_decl_uid, 0, 0); > > I don't think this is what you want, xcnew is a calloc wrapper and > doesn't call the ctor for gimplify_omp_ctx. For now placement new is > probably the simplest way to get what you want. > Thanks for pointing this out. I'll do it the way c->privatized_types has been allocated. e.g., by making c->variables a pointer to std::map and c->variables = new gimplify_tree_t; > -static void > -delete_omp_context (struct gimplify_omp_ctx *c) > -{ > - splay_tree_delete (c->variables); > - delete c->privatized_types; > - XDELETE (c); > -} > > hm, why? > My bad, I'll restore this. > -gimplify_adjust_omp_clauses_1 (splay_tree_node n, void *data) > +gimplify_adjust_omp_clauses_1 (std::pair n, void *data) > > You can now change the type of data from void * to const > gimplify_adjust_omp_clauses_data * Done! Thanks for the feedback, they were really helpful. I have updated the patch. Please review this. Also, although I run `make check` while compiling gcc (with bootstrap enabled), I'm not sure if 'omp' related tests were exercised. I'm still unfamiliar with several components of gcc. Any pointers on how to ensure all tests were run, would be useful. -Aditya > > thanks! > > Trev > > On Fri, Mar 13, 2015 at 07:32:03PM +, Aditya K wrote: >> You're right. I'll change this to: >> >> /* A stable comparison functor to sort trees. */ >> struct tree_compare_decl_uid { >> bool operator ()(const tree &xa, const tree &xb) const >> { >> return DECL_UID (xa) < DECL_UID (xb); >> } >> }; >> >> New patch attached. >> >> >> Thanks, >> -Aditya >> >> >> >>> Date: Fri, 13 Mar 2015 19:02:11 + >>> Subject: Re: Proposal for adding splay_tree_find (to find elements without >>> updating the nodes). >>> From: jwakely@gmail.com >>> To: hiradi...@msn.com >>> CC: richard.guent...@gmail.com; stevenb@gmail.com; gcc@gcc.gnu.org >>> >>> Are you sure your compare_variables functor is correct? >>> >>> Subtracting the two values seems very strange for a strict weak ordering. >>> >>> (Also "compare_variables" is a pretty poor name!) >> > > splay.patch Description: Binary data
Re: How to implement '@' GDB-like operator for libcc1
> > But GDB features a useful custom expression operator '@': > https://sourceware.org/gdb/onlinedocs/gdb/Arrays.html > > I have problems implementing '@' into GCC, could you suggest at which place > should I call build_array_type_nelts()? Or is it the right way at all? > > Testing it on a sample code - it should return 2: > int a[]={1,2,3};int main(void){ return (*a@3)[1]; } Sorry if I'm missing something, but this example does not make sense to me. You can directly use a[1], no need for the @-operator here. In my experience (and again, sorry if I'm missing other interesting usercases), the utility of @ is to be able to do things like: print a@3 print (*&(something complicated expression))@(some integer expression) Thus, the question is what info GDB needs from GCC to be able to print the contents of the array. And it probably just needs the type of whatever is to the left of @ and the value of whatever is to the right of @, no? Also note that given this code and breaking in main() int a[]={1,2,3};int main(void){ return (a[1]); } then: (gdb) p a $1 = {1, 2, 3} (gdb) p a@3 $2 = {{1, 2, 3}, {0, 0, 0}, {0, 0, 0}} This is because: (gdb) ptype a type = int [3] (gdb) ptype a@3 type = int [3][3] Thus, what happens currently when you do? (gdb) compile int a[]={1,2,3} (gdb) compile print a Cheers, Manuel.
Proposal: implement API for registering custom built-in functions from dynamic plugins
Hi, all. Currently GCC allows dynamic plugins to register custom attributes and pragmas. I would like to propose to add API for registering custom built-in functions. AFAIK, current plugin API can be used to emulate this feature, for example by creating a function declaration in PLUGIN_START_UNIT and then replacing calls to such function in a custom lowering pass. But this solution seems inelegant: it would be better, if user-defined built-in functions could use the same "machinery" as normal ones, i.e. use common classification and flags (RTL-level vs GIMPLE-level; external library equivalents vs compiler-generated entities), and have unified API for creating declarations, GIMPLE (or even RTL) expansion, inlining (callback for is_inexpensive_builtin), etc. Possible applications of this plugin API include: * Generating optimized pattern matching functions (e.g. precomputing tables for Boyer-Moore matcher). * Generating functions for parsing (compile-time regular expressions). * Marshalling functions with compile-time type checking (something like binary printf/scanf with custom format specifiers). * String formatting and conversion. By the way, there is a guy, who has written a perl wrapper for GCC, which preprocesses calls to snprintf to make it work faster: https://github.com/h2o/qrintf... though it's probably worth enhancing snprintf in GCC, rather than reimplementing it in plugin - but that's a different story. * (to some extent): lazy evaluation. * Custom instrumentation of source code (user or framework-specific fortification or performance profiling). Any thoughts on this? Is such API useful for GCC? -- Regards, Mikhail Maltsev
gcc-5-20150315 is now available
Snapshot gcc-5-20150315 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/5-20150315/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 5 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/trunk revision 221440 You'll find: gcc-5-20150315.tar.bz2 Complete GCC MD5=e97e13d88d72c62df053058745671ee5 SHA1=1727ac6e7c1d53f0f3e4ad91d0bdf29b1bd504af Diffs from 5-20150308 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-5 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
How to implement '@' GDB-like operator for libcc1
Hi, based on the GCC libcc1 plugin GDB will implement GDB command 'compile print' equivalent to default 'print' but evaluating the expression by GCC: https://github.com/tromey/gdb/tree/pmuldoon/compile/print It is currently done by compiling a .c->.o file and checking its DWARF for the type and size of the resulting object to print. But GDB features a useful custom expression operator '@': https://sourceware.org/gdb/onlinedocs/gdb/Arrays.html I have problems implementing '@' into GCC, could you suggest at which place should I call build_array_type_nelts()? Or is it the right way at all? Testing it on a sample code - it should return 2: int a[]={1,2,3};int main(void){ return (*a@3)[1]; } Proper overloading of '@' to keep it working for ObjC and/or enabling '@' only for libcc1 I find currently off-topic. With both c_fully_fold_internal() and gimplify_modify_expr_rhs() hooks #if 0-ed (as they are in the attached patch) GCC crashes on infinite loop: <-gimplify_expr<-gimplify_modify_expr<-gimplify_expr<-gimplify_stmt<- <-gimplify_and_add<-internal_get_tmp_var<-get_formal_tmp_var<- Thanks, Jan diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c index 8c23e09..69fa054 100644 --- a/gcc/c-family/c-common.c +++ b/gcc/c-family/c-common.c @@ -1289,6 +1289,25 @@ c_fully_fold_internal (tree expr, bool in_init, bool *maybe_const_operands, ret = fold (ret); goto out; +#if 0 // it would crash at array_ref_low_bound()'s first line +case ATSIGN_EXPR: + orig_op0 = op0 = TREE_OPERAND (expr, 0); + orig_op1 = op1 = TREE_OPERAND (expr, 1); +#if 0 // not needed + op0 = c_fully_fold_internal (op0, in_init, maybe_const_operands, + maybe_const_itself); + STRIP_TYPE_NOPS (op0); + op1 = c_fully_fold_internal (op1, in_init, maybe_const_operands, + maybe_const_itself); + STRIP_TYPE_NOPS (op1); +#endif + ret = build_array_type_nelts (TREE_TYPE (op0), tree_to_uhwi (op1)); +#if 0 // not needed + ret = fold (ret); +#endif + goto out; +#endif + case COMPOUND_EXPR: case MODIFY_EXPR: case PREDECREMENT_EXPR: @@ -4078,6 +4097,8 @@ binary_op_error (location_t location, enum tree_code code, opname = "||"; break; case BIT_XOR_EXPR: opname = "^"; break; +case ATSIGN_EXPR: + opname = "@"; break; default: gcc_unreachable (); } diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c index bb55be8..ad8b82f 100644 --- a/gcc/c-family/c-lex.c +++ b/gcc/c-family/c-lex.c @@ -468,6 +468,8 @@ c_lex_with_flags (tree *value, location_t *loc, unsigned char *cpp_flags, break; case CPP_ATSIGN: + *value = NULL_TREE; + break; /* An @ may give the next token special significance in Objective-C. */ if (c_dialect_objc ()) { diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c index ceb9e1a..5d5a0a8 100644 --- a/gcc/c/c-parser.c +++ b/gcc/c/c-parser.c @@ -1173,6 +1173,7 @@ enum c_parser_prec { PREC_EQ, PREC_REL, PREC_SHIFT, + PREC_ATSIGN, PREC_ADD, PREC_MULT, NUM_PRECS @@ -6283,6 +6284,10 @@ c_parser_binary_expression (c_parser *parser, struct c_expr *after, oprec = PREC_ADD; ocode = MINUS_EXPR; break; + case CPP_ATSIGN: + oprec = PREC_ATSIGN; + ocode = ATSIGN_EXPR; + break; case CPP_LSHIFT: oprec = PREC_SHIFT; ocode = LSHIFT_EXPR; @@ -7082,6 +7087,10 @@ c_parser_postfix_expression (c_parser *parser) expr.original_type = NULL; switch (c_parser_peek_token (parser)->type) { +case CPP_ATSIGN: + error_at (loc, "gdbjit hit"); + expr.value = error_mark_node; + break; case CPP_NUMBER: expr.value = c_parser_peek_token (parser)->value; loc = c_parser_peek_token (parser)->location; diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c index a3a9c77..0c2f1f9 100644 --- a/gcc/c/c-typeck.c +++ b/gcc/c/c-typeck.c @@ -10949,6 +10949,14 @@ build_binary_op (location_t location, enum tree_code code, maybe_warn_bool_compare (location, code, orig_op0, orig_op1); break; +case ATSIGN_EXPR: + if (TREE_CODE (orig_op1) == INTEGER_CST) + { + result_type = build_array_type_nelts (type0, tree_to_uhwi (orig_op1)); + converted = 1; + } + break; + default: gcc_unreachable (); } diff --git a/gcc/gimplify.c b/gcc/gimplify.c index d822913..b4027dc 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -4418,6 +4418,12 @@ gimplify_modify_expr_rhs (tree *expr_p, tree *from_p, tree *to_p, return GS_OK; } } + break; + +#if 0 // it would get hit + case ATSIGN_EXPR: + gcc_unreachable (); +#endif default: break; diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c index d7c049f..cf00457 100644 --- a
Re: Proposal for the coarse grain unrolling heuristics and renaming for the enablement of better fine grain Loop transformation.
15.03.2015 17:44, Ajit Kumar Agarwal writes: > > Hello All: > > The below example is taken from the article by Albert Cohen et.al. > > "Deep Jam : Conversion of Coarse grain Parallelism to Fine grain vector > parallelism" It is not clear from this article, whether such optimization can be implemented in practice (i.e. automatically and within reasonable time). Authors implemented some proof-of-concept, but from their description it looks like they had to tweak their algorithm in each particular case (and they hope, that later this could be automated using FDO). Also nothing definite is said about compilation time. > I would like to propose the above heuristics for unroll and jam and renaming > which enables the loop fusion and the IF-merging > to achieve the optimize code. AFAIK, most of loop transformations in GCC are applicable to rather simple loops, which, at least have an induction variable. So, loops like while (q) { ... } for some arbitrary condition q, e.g. while (*a++ = *b++) ; are unlikely to be optimized well. The proposed transformation > For ( I = 0 ; I < 10; i++) > a1 = phi(a1,a2); > If ( p1 && p2) > a1 = ...; > a2 = ...; > Else >If (p1) > a1= ; > Else > If (p2) >a2 = ...; > >While (q1 && q2) >{ >... = a1 + ...; >... = a2 + .; >} > While (q1) > = a1 + ...; > While ( q2) > . = a2 + ..; involves some sort of unroll-and-jam, nontrival code motion, loop fusion. And it obviously requires analysis of loop-carried data dependence, that can deal with nested loops having complex conditions inside them, which GCC does not yet provide. GCC includes Graphite framework - it allows to perform rather complex transformations of loops (perhaps similar to what you described), but it only works for loops which have canonical induction variables and involve only affine memory access (i.e. addresses must be represented as linear combinations of induction variables and constants) - no if's, no while's with non-trivial exit conditions. So, I think this idea is, unfortunately, not very realistic in terms of required efforts. P.S. I'm new in GCC community, but I hope that in general terms my description is close to reality. Please correct me, if something is wrong. -- Regards, Mikhail Maltsev
RE: Short Circuit compiler transformations!!
-Original Message- From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: Sunday, March 15, 2015 9:30 PM To: Ajit Kumar Agarwal; Jeff Law; gcc@gcc.gnu.org Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala Subject: RE: Short Circuit compiler transformations!! On March 15, 2015 11:14:59 AM GMT+01:00, Ajit Kumar Agarwal wrote: > > >-Original Message- >From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of >Ajit Kumar Agarwal >Sent: Sunday, March 15, 2015 3:35 PM >To: Richard Biener; Jeff Law; gcc@gcc.gnu.org >Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju >Mekala >Subject: RE: Short Circuit compiler transformations!! > > > >-Original Message- >From: Richard Biener [mailto:richard.guent...@gmail.com] >Sent: Sunday, March 15, 2015 3:05 PM >To: Ajit Kumar Agarwal; Jeff Law; gcc@gcc.gnu.org >Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju >Mekala >Subject: Re: Short Circuit compiler transformations!! > >On March 15, 2015 9:15:36 AM GMT+01:00, Ajit Kumar Agarwal > wrote: >>Hello All: >> >>Short circuit compiler transformation for conditional branches. The >>conditional branches based on the conditional Expressions one of the >>path is always executed thus short circuiting the path. Certains >values >>of the conditional Expressions makes the conditional expressions >always >>true or false. >>This part of the conditions in extracted out In the IF-THEN >>cond_express with the check of the value and in the else case the >>original expressions is checked. >> >>This makes for a given values of the variables of the conditional >>expressions makes the conditional expression as True or false always >>and the one path is always executed. >> >>For the example given below. >> >>For( x1 = lb1; x1 < = ub1 ; x1++) >>{ >> If ( C1 && C2) >> S_then >> } >> >> >>Fig (1). >> >>For the input code given in Fig (1) the condition C1 && C2 is always >>false if C1 or C2 is zero(false) thus short circuiting The path and >>only s_else will be executed if C1 or C2 is false. >> >>Thus the input code is transformed as follows. >> >>For( x1 = lb1; x1 < = ub1 ; x1++) >>{ >> If(!C1) >>Continue >> else >>S_then >> } > >>>This looks wrong as you miss to check C2. > >>>Riichard: Sorry for the typo error. I could see the short circuit >transformation is implemented in gcc. I could see only Short circuiting >with respect to &&/OR. I am thinking in terms of >>complex expressions >where any of the variable condition In the expressions Set to to true >or false the whole expressions and such conditions can be extracted out >into different versions >>of IF with different parameter check( Another >way of short circuiting). > >>>I would like to know the scope of this optimization in gcc. If not >implemented with respect to complex expressions with respect To >different iteration spaces. I would like to propose the >>same. >>Just as a general note/question on all you suggestions. Are you willing to >>implement all your proposals or do you have resources to spend to that effort? >>It's not that we are not aware of loop optimizations GCC does not implement. >>It's a question of priorities and resources to implement something. Yes you are right. We are proposing different optimizations based on our Analysis and past experience and we would be interested in and like to implement based on the suggestions and inputs from the community. We may not implement all, but based on the suggestions and inputs from the community we would like to take on the priority basis. Thanks & Regards Ajit Richard. >Richard: Other aspect of the transformation for short circuiting the >IF-THEN-ELSE with the loops where the loop is irreducible. >In this case implementing the short circuit transformations and the CFG >transformations to convert from irreducibility to reducible Loops >along with the transformations of short circuiting compiler >transformations. > >Thanks & Regards >Ajit > >>>Thoughts Please? > >Thanks & Regards >Ajit > >Richard. > >>This is much simpler code and there could be complex expressions >inside >>the FOR Loop that can be extracted out with Different versions of the >>conditional IF-THEN-ELSE inside the loops based on short circuiting >>executing one of the path always Executed can be extracted in IF-THEN >>and other cases of condition will be inside the else conditions. >> >>Again this has to be optimized based on the profile Data of hot >>conditional branches and the above optimizations is performed Based on > >>the hotness of the conditional branches. >> >>The short Circuiting optimization also makes the irreducible loops >with >>conditional branches as reducible and there is no need for Conversion >>of irreducible loops to reducible loop with any of the existing >>approach like node splitting for the short circuiting candidate. >> >>This optimizations is implemented in LLVM and I
Re: Proposal for the coarse grain unrolling heuristics and renaming for the enablement of better fine grain Loop transformation.
On March 15, 2015 3:44:39 PM GMT+01:00, Ajit Kumar Agarwal wrote: > >Hello All: > >Below examples are the transformation for the given loop in Fig(1). >Fig(2) unroll and jam and the Fig(3) does the >Code motion to bring two IF adjacent to each other and two while loops >adjacent to each other. > >The Fig(4 ) does the IF-merging and the Loop fusion on the transformed >Loop in Fig (3) which bring two IF adjacent >and two while Loops adjacent to each other. > >This shows how the Loop with conditional branches can be unrolled which >enables the IF-merging and Loop fusion >after doing code motion to unrolled loops to bring IF and Loops >adjacent to each other. > >This is the powerful optimization which gets enabled with renaming , >unrolling. > >Such heuristics needs to be added for Loop unrolling and then later the >code motions can enable the IF-merging >and the Loop fusion for the unrolled loop. > >The below example is taken from the article by Albert Cohen et.al. > >"Deep Jam : Conversion of Coarse grain Parallelism to Fine grain vector >parallelism" > >For ( I = 0; I < 10; i++) >{ > If(p) > a = ...; > While(q) > .. = a + ...; >} > >Fig (1) > >For ( I = 0; I < 10; i+=2) >{ >a1 = phi(a,a2); >If ( p1) > a1 = ; >While (q1) >... = a1 + ...; >If (p2) > a2 = ...; > a2 = phi(a2, a1); > While (q2) > = a2 + ; > >Fig (2) > >For ( I = 0; I < 10; i+=2) >{ >a1 = phi(a,a2); >If ( p1) > a1 = ; > If (p2) > a2 = ...; > a2 = phi(a2, a1); > While (q1) >... = a1 + ...; > While (q2) > = a2 + ; > >Fig (3) > >For ( I = 0 ; I < 10; i++) > a1 = phi(a1,a2); > If ( p1 && p2) > a1 = ...; > a2 = ...; > Else >If (p1) > a1= ; > Else > If (p2) >a2 = ...; > >While (q1 && q2) >{ >... = a1 + ...; >... = a2 + .; >} > While (q1) > = a1 + ...; > While ( q2) > . = a2 + ..; > >Fig (4). > >I would like to propose the above heuristics for unroll and jam and >renaming which enables the loop fusion and the IF-merging >to achieve the optimize code. > >This shows how the coarse grain unrolling heuristics enable fine grain >Loop transformation. > >Thoughts Please? GCC implements neither outer loop unrolling nor loop fusion. Richard. >Thanks & Regards >Ajit
RE: Short Circuit compiler transformations!!
On March 15, 2015 11:14:59 AM GMT+01:00, Ajit Kumar Agarwal wrote: > > >-Original Message- >From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of >Ajit Kumar Agarwal >Sent: Sunday, March 15, 2015 3:35 PM >To: Richard Biener; Jeff Law; gcc@gcc.gnu.org >Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju >Mekala >Subject: RE: Short Circuit compiler transformations!! > > > >-Original Message- >From: Richard Biener [mailto:richard.guent...@gmail.com] >Sent: Sunday, March 15, 2015 3:05 PM >To: Ajit Kumar Agarwal; Jeff Law; gcc@gcc.gnu.org >Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju >Mekala >Subject: Re: Short Circuit compiler transformations!! > >On March 15, 2015 9:15:36 AM GMT+01:00, Ajit Kumar Agarwal > wrote: >>Hello All: >> >>Short circuit compiler transformation for conditional branches. The >>conditional branches based on the conditional Expressions one of the >>path is always executed thus short circuiting the path. Certains >values >>of the conditional Expressions makes the conditional expressions >always >>true or false. >>This part of the conditions in extracted out In the IF-THEN >>cond_express with the check of the value and in the else case the >>original expressions is checked. >> >>This makes for a given values of the variables of the conditional >>expressions makes the conditional expression as True or false always >>and the one path is always executed. >> >>For the example given below. >> >>For( x1 = lb1; x1 < = ub1 ; x1++) >>{ >> If ( C1 && C2) >> S_then >> } >> >> >>Fig (1). >> >>For the input code given in Fig (1) the condition C1 && C2 is always >>false if C1 or C2 is zero(false) thus short circuiting The path and >>only s_else will be executed if C1 or C2 is false. >> >>Thus the input code is transformed as follows. >> >>For( x1 = lb1; x1 < = ub1 ; x1++) >>{ >> If(!C1) >>Continue >> else >>S_then >> } > >>>This looks wrong as you miss to check C2. > >>>Riichard: Sorry for the typo error. I could see the short circuit >transformation is implemented in gcc. I could see only Short circuiting >with respect to &&/OR. I am thinking in terms of >>complex expressions >where any of the variable condition In the expressions Set to to true >or false the whole expressions and such conditions can be extracted out >into different versions >>of IF with different parameter check( Another >way of short circuiting). > >>>I would like to know the scope of this optimization in gcc. If not >implemented with respect to complex expressions with respect To >different iteration spaces. I would like to propose the >>same. Just as a general note/question on all you suggestions. Are you willing to implement all your proposals or do you have resources to spend to that effort? It's not that we are not aware of loop optimizations GCC does not implement. It's a question of priorities and resources to implement something. Richard. >Richard: Other aspect of the transformation for short circuiting the >IF-THEN-ELSE with the loops where the loop is irreducible. >In this case implementing the short circuit transformations and the CFG >transformations to convert from irreducibility to reducible >Loops along with the transformations of short circuiting compiler >transformations. > >Thanks & Regards >Ajit > >>>Thoughts Please? > >Thanks & Regards >Ajit > >Richard. > >>This is much simpler code and there could be complex expressions >inside >>the FOR Loop that can be extracted out with Different versions of the >>conditional IF-THEN-ELSE inside the loops based on short circuiting >>executing one of the path always Executed can be extracted in IF-THEN >>and other cases of condition will be inside the else conditions. >> >>Again this has to be optimized based on the profile Data of hot >>conditional branches and the above optimizations is performed Based on > >>the hotness of the conditional branches. >> >>The short Circuiting optimization also makes the irreducible loops >with >>conditional branches as reducible and there is no need for Conversion >>of irreducible loops to reducible loop with any of the existing >>approach like node splitting for the short circuiting candidate. >> >>This optimizations is implemented in LLVM and I would like to know if >>this is implemented in GCC. If not implemented I would like To propose > >>this short circuit compiler transformation. >> >>Thoughts Please ? >> >>Thanks & Regards >>Ajit
Proposal for the coarse grain unrolling heuristics and renaming for the enablement of better fine grain Loop transformation.
Hello All: Below examples are the transformation for the given loop in Fig(1). Fig(2) unroll and jam and the Fig(3) does the Code motion to bring two IF adjacent to each other and two while loops adjacent to each other. The Fig(4 ) does the IF-merging and the Loop fusion on the transformed Loop in Fig (3) which bring two IF adjacent and two while Loops adjacent to each other. This shows how the Loop with conditional branches can be unrolled which enables the IF-merging and Loop fusion after doing code motion to unrolled loops to bring IF and Loops adjacent to each other. This is the powerful optimization which gets enabled with renaming , unrolling. Such heuristics needs to be added for Loop unrolling and then later the code motions can enable the IF-merging and the Loop fusion for the unrolled loop. The below example is taken from the article by Albert Cohen et.al. "Deep Jam : Conversion of Coarse grain Parallelism to Fine grain vector parallelism" For ( I = 0; I < 10; i++) { If(p) a = ...; While(q) .. = a + ...; } Fig (1) For ( I = 0; I < 10; i+=2) { a1 = phi(a,a2); If ( p1) a1 = ; While (q1) ... = a1 + ...; If (p2) a2 = ...; a2 = phi(a2, a1); While (q2) = a2 + ; Fig (2) For ( I = 0; I < 10; i+=2) { a1 = phi(a,a2); If ( p1) a1 = ; If (p2) a2 = ...; a2 = phi(a2, a1); While (q1) ... = a1 + ...; While (q2) = a2 + ; Fig (3) For ( I = 0 ; I < 10; i++) a1 = phi(a1,a2); If ( p1 && p2) a1 = ...; a2 = ...; Else If (p1) a1= ; Else If (p2) a2 = ...; While (q1 && q2) { ... = a1 + ...; ... = a2 + .; } While (q1) = a1 + ...; While ( q2) . = a2 + ..; Fig (4). I would like to propose the above heuristics for unroll and jam and renaming which enables the loop fusion and the IF-merging to achieve the optimize code. This shows how the coarse grain unrolling heuristics enable fine grain Loop transformation. Thoughts Please? Thanks & Regards Ajit
RE: Function outlining and partial Inlining
-Original Message- From: Jan Hubicka [mailto:hubi...@ucw.cz] Sent: Thursday, February 12, 2015 10:34 PM To: Ajit Kumar Agarwal Cc: hubi...@ucw.cz; gcc@gcc.gnu.org; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala Subject: Re: Function outlining and partial Inlining > Hello All: > > The large functions are the important part of high performance > application. They contribute to performance bottleneck with many > respect. Some of the large hot functions are frequently executed but many > regions inside the functions are cold regions. The large Function blocks the > function inlining to happen before of the code size constraints. > > Such cold regions inside the hot large functions can be extracted out > and form the function outlining. Thus breaking the large functions Into > smaller function segments which causes the functions to be inlined at the > caller site or helps in partial inlining. > > LLVM Compiler has the functionality and the optimizations for function > outlining based on regions like basic blocks, superblocks and > Hyperblocks which gets extracted out into smaller function segments and thus > enabling the partial inlining and function inlining to happen At the caller > site. > > This optimization is the good case of profile guided optimizations and based > on the profile feedback data by the Compiler. > Without profile information the above function outlining optimizations will > not be useful. > > We are doing lot of optimization regarding polymorphism and also the > indirect icall promotion based on the profile feedback on the Callgraph > profile. > > Are we doing the function outlining optimization in GCC with respect > to function inline and partial inline based on profile feedback Data. > If not this optimization can be implemented. If already implemented in GCC > Can I know any pointer for such code in GCC and the Scope of this function > outlining optimization. >>The outlining pass is called ipa-split. The heuristic used is however quite >>simplistic and it looks for very specific case where you have small header of >>a function containing conditional and >>splits after that. It does use >>profile. >>Any work on improving the heuristics or providing interesting testcases to >>consider would be welcome. >>I think LLVM pass is doing pretty much the same analysis minus the profile >>feedback considerations. After splitting, LLVm will inline the header into >>all callers while GCC leaves this on the >>decision of inliner heuristics >>that may just merge the function back into one block. >>The actual outlining logic is contained in tree-inline.c and also used by >>OpenMP. Honza: LLVM has the logic of extraction of Loops using -loop-extract flags where the Loop code region is extracted into a function. The LLVM has the heuristics that the header of the Loop is the entry block and there is a single entry to the Loop which does by Loop Simplification pass in LLVM. Also the heuristics in LLVM has the heuristics that there should not be a branch from the entry block to the header of the Loop. You have mentioned the GCC has the heuristics of having the conditional in the header after the splits point. Does GCC has the heuristics for Conditional to split in the header or the heuristics can be augmented with the Loops as done in the LLVM. Thanks & Regards Ajit Honza > > If not implemented , Can I propose to have the optimization like function > outlining in GCC. > > Thoughts Please? > > Thanks & Regards > Ajit
RE: Short Circuit compiler transformations!!
-Original Message- From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of Ajit Kumar Agarwal Sent: Sunday, March 15, 2015 3:35 PM To: Richard Biener; Jeff Law; gcc@gcc.gnu.org Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala Subject: RE: Short Circuit compiler transformations!! -Original Message- From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: Sunday, March 15, 2015 3:05 PM To: Ajit Kumar Agarwal; Jeff Law; gcc@gcc.gnu.org Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala Subject: Re: Short Circuit compiler transformations!! On March 15, 2015 9:15:36 AM GMT+01:00, Ajit Kumar Agarwal wrote: >Hello All: > >Short circuit compiler transformation for conditional branches. The >conditional branches based on the conditional Expressions one of the >path is always executed thus short circuiting the path. Certains values >of the conditional Expressions makes the conditional expressions always >true or false. >This part of the conditions in extracted out In the IF-THEN >cond_express with the check of the value and in the else case the >original expressions is checked. > >This makes for a given values of the variables of the conditional >expressions makes the conditional expression as True or false always >and the one path is always executed. > >For the example given below. > >For( x1 = lb1; x1 < = ub1 ; x1++) >{ > If ( C1 && C2) > S_then > } > > >Fig (1). > >For the input code given in Fig (1) the condition C1 && C2 is always >false if C1 or C2 is zero(false) thus short circuiting The path and >only s_else will be executed if C1 or C2 is false. > >Thus the input code is transformed as follows. > >For( x1 = lb1; x1 < = ub1 ; x1++) >{ > If(!C1) >Continue > else >S_then > } >>This looks wrong as you miss to check C2. >>Riichard: Sorry for the typo error. I could see the short circuit >>transformation is implemented in gcc. I could see only Short circuiting with >>respect to &&/OR. I am thinking in terms of >>complex expressions where any >>of the variable condition In the expressions Set to to true or false the >>whole expressions and such conditions can be extracted out into different >>versions >>of IF with different parameter check( Another way of short >>circuiting). >>I would like to know the scope of this optimization in gcc. If not >>implemented with respect to complex expressions with respect To different >>iteration spaces. I would like to propose the >>same. Richard: Other aspect of the transformation for short circuiting the IF-THEN-ELSE with the loops where the loop is irreducible. In this case implementing the short circuit transformations and the CFG transformations to convert from irreducibility to reducible Loops along with the transformations of short circuiting compiler transformations. Thanks & Regards Ajit >>Thoughts Please? Thanks & Regards Ajit Richard. >This is much simpler code and there could be complex expressions inside >the FOR Loop that can be extracted out with Different versions of the >conditional IF-THEN-ELSE inside the loops based on short circuiting >executing one of the path always Executed can be extracted in IF-THEN >and other cases of condition will be inside the else conditions. > >Again this has to be optimized based on the profile Data of hot >conditional branches and the above optimizations is performed Based on >the hotness of the conditional branches. > >The short Circuiting optimization also makes the irreducible loops with >conditional branches as reducible and there is no need for Conversion >of irreducible loops to reducible loop with any of the existing >approach like node splitting for the short circuiting candidate. > >This optimizations is implemented in LLVM and I would like to know if >this is implemented in GCC. If not implemented I would like To propose >this short circuit compiler transformation. > >Thoughts Please ? > >Thanks & Regards >Ajit
RE: Short Circuit compiler transformations!!
-Original Message- From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: Sunday, March 15, 2015 3:05 PM To: Ajit Kumar Agarwal; Jeff Law; gcc@gcc.gnu.org Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala Subject: Re: Short Circuit compiler transformations!! On March 15, 2015 9:15:36 AM GMT+01:00, Ajit Kumar Agarwal wrote: >Hello All: > >Short circuit compiler transformation for conditional branches. The >conditional branches based on the conditional Expressions one of the >path is always executed thus short circuiting the path. Certains values >of the conditional Expressions makes the conditional expressions always >true or false. >This part of the conditions in extracted out In the IF-THEN >cond_express with the check of the value and in the else case the >original expressions is checked. > >This makes for a given values of the variables of the conditional >expressions makes the conditional expression as True or false always >and the one path is always executed. > >For the example given below. > >For( x1 = lb1; x1 < = ub1 ; x1++) >{ > If ( C1 && C2) > S_then > } > > >Fig (1). > >For the input code given in Fig (1) the condition C1 && C2 is always >false if C1 or C2 is zero(false) thus short circuiting The path and >only s_else will be executed if C1 or C2 is false. > >Thus the input code is transformed as follows. > >For( x1 = lb1; x1 < = ub1 ; x1++) >{ > If(!C1) >Continue > else >S_then > } >>This looks wrong as you miss to check C2. Riichard: Sorry for the typo error. I could see the short circuit transformation is implemented in gcc. I could see only Short circuiting with respect to &&/OR. I am thinking in terms of complex expressions where any of the variable condition In the expressions Set to to true or false the whole expressions and such conditions can be extracted out into different versions of IF with different parameter check( Another way of short circuiting). I would like to know the scope of this optimization in gcc. If not implemented with respect to complex expressions with respect To different iteration spaces. I would like to propose the same. Thoughts Please? Thanks & Regards Ajit Richard. >This is much simpler code and there could be complex expressions inside >the FOR Loop that can be extracted out with Different versions of the >conditional IF-THEN-ELSE inside the loops based on short circuiting >executing one of the path always Executed can be extracted in IF-THEN >and other cases of condition will be inside the else conditions. > >Again this has to be optimized based on the profile Data of hot >conditional branches and the above optimizations is performed Based on >the hotness of the conditional branches. > >The short Circuiting optimization also makes the irreducible loops with >conditional branches as reducible and there is no need for Conversion >of irreducible loops to reducible loop with any of the existing >approach like node splitting for the short circuiting candidate. > >This optimizations is implemented in LLVM and I would like to know if >this is implemented in GCC. If not implemented I would like To propose >this short circuit compiler transformation. > >Thoughts Please ? > >Thanks & Regards >Ajit
Re: Short Circuit compiler transformations!!
On March 15, 2015 9:15:36 AM GMT+01:00, Ajit Kumar Agarwal wrote: >Hello All: > >Short circuit compiler transformation for conditional branches. The >conditional branches based on the conditional >Expressions one of the path is always executed thus short circuiting >the path. Certains values of the conditional >Expressions makes the conditional expressions always true or false. >This part of the conditions in extracted out >In the IF-THEN cond_express with the check of the value and in the else >case the original expressions is checked. > >This makes for a given values of the variables of the conditional >expressions makes the conditional expression as >True or false always and the one path is always executed. > >For the example given below. > >For( x1 = lb1; x1 < = ub1 ; x1++) >{ > If ( C1 && C2) > S_then > } > > >Fig (1). > >For the input code given in Fig (1) the condition C1 && C2 is always >false if C1 or C2 is zero(false) thus short circuiting >The path and only s_else will be executed if C1 or C2 is false. > >Thus the input code is transformed as follows. > >For( x1 = lb1; x1 < = ub1 ; x1++) >{ > If(!C1) >Continue > else >S_then > } This looks wrong as you miss to check C2. Richard. >This is much simpler code and there could be complex expressions inside >the FOR Loop that can be extracted out with >Different versions of the conditional IF-THEN-ELSE inside the loops >based on short circuiting executing one of the path always >Executed can be extracted in IF-THEN and other cases of condition will >be inside the else conditions. > >Again this has to be optimized based on the profile Data of hot >conditional branches and the above optimizations is performed >Based on the hotness of the conditional branches. > >The short Circuiting optimization also makes the irreducible loops with >conditional branches as reducible and there is no need for >Conversion of irreducible loops to reducible loop with any of the >existing approach like node splitting for the short circuiting >candidate. > >This optimizations is implemented in LLVM and I would like to know if >this is implemented in GCC. If not implemented I would like >To propose this short circuit compiler transformation. > >Thoughts Please ? > >Thanks & Regards >Ajit
Short Circuit compiler transformations!!
Hello All: Short circuit compiler transformation for conditional branches. The conditional branches based on the conditional Expressions one of the path is always executed thus short circuiting the path. Certains values of the conditional Expressions makes the conditional expressions always true or false. This part of the conditions in extracted out In the IF-THEN cond_express with the check of the value and in the else case the original expressions is checked. This makes for a given values of the variables of the conditional expressions makes the conditional expression as True or false always and the one path is always executed. For the example given below. For( x1 = lb1; x1 < = ub1 ; x1++) { If ( C1 && C2) S_then } Fig (1). For the input code given in Fig (1) the condition C1 && C2 is always false if C1 or C2 is zero(false) thus short circuiting The path and only s_else will be executed if C1 or C2 is false. Thus the input code is transformed as follows. For( x1 = lb1; x1 < = ub1 ; x1++) { If(!C1) Continue else S_then } This is much simpler code and there could be complex expressions inside the FOR Loop that can be extracted out with Different versions of the conditional IF-THEN-ELSE inside the loops based on short circuiting executing one of the path always Executed can be extracted in IF-THEN and other cases of condition will be inside the else conditions. Again this has to be optimized based on the profile Data of hot conditional branches and the above optimizations is performed Based on the hotness of the conditional branches. The short Circuiting optimization also makes the irreducible loops with conditional branches as reducible and there is no need for Conversion of irreducible loops to reducible loop with any of the existing approach like node splitting for the short circuiting candidate. This optimizations is implemented in LLVM and I would like to know if this is implemented in GCC. If not implemented I would like To propose this short circuit compiler transformation. Thoughts Please ? Thanks & Regards Ajit