FW: [PING] [PATCH] _Cilk_for for C and C++
I mis-spelled the org as og and thus the email got bounced. So, here it is again. Thanks, Balaji V. Iyer. -Original Message- From: Iyer, Balaji V Sent: Thursday, March 20, 2014 4:34 PM To: 'Jakub Jelinek' Cc: gcc-patc...@gcc.gnu.og Subject: RE: [PING] [PATCH] _Cilk_for for C and C++ -Original Message- From: Jakub Jelinek [mailto:ja...@redhat.com] Sent: Thursday, March 6, 2014 6:55 AM To: Iyer, Balaji V Cc: gcc-patc...@gcc.gnu.og Subject: Re: [PING] [PATCH] _Cilk_for for C and C++ On Tue, Mar 04, 2014 at 03:35:32PM +, Iyer, Balaji V wrote: Did you get a chance to look at this? Please look what you are emitting for cf3.cc, e.g. in *.gimple: D.2883 = Jint::begin (j); Iint::I (i, D.2883); D.2885 = Jint::end (j); retval.0 = operator-int (D.2885, i); D.2886 = retval.0 /[cl] 2; #pragma omp parallel firstprivate(i) if(D.2886) shared(D.2865) lastprivate(D.2864) { const difference_type D.2866; long int D.2887; struct I D.2888; try { _Cilk_for (D.2864 = 0; D.2864 0; D.2864 = D.2864 + 2) linear(D.2864:2) schedule(cilk-for grain,0) That is just plain wrong, you aren't iterating zero times, but D.2886 times, so there should be D.2864 retval.0. In *.original that is: cleanup_point Unknown tree: expr_stmt (void) (i = TARGET_EXPR D.2854, Unknown tree: aggr_init_expr 5 __comp_ctor D.2854 0B (const struct I ) (const struct I *) Jint::begin ((struct J *) j) ) ; #pragma omp parallel firstprivate(i) schedule(cilk-for grain,0) if(cleanup_point operator-int ((const struct I ) (const struct I *) Jint::end ((struct J *) j), (const struct I ) (const struct I *) i) /[cl] 2) { { try HERE { difference_type D.2864; difference_type D.2865; { cleanup_point Unknown tree: expr_stmt (void) (D.2865 = 0) _Cilk_for (D.2864 = 0; D.2864 0; D.2864 = D.2864 + 2) schedule(cilk- for grain,0) if(cleanup_point operator-int ((const struct I ) (const struct I *) Jint::end ((struct J *) j), (const struct I ) (const struct I *) i) /[cl] 2) As I've said several times before, don't put the operator- expression into if clause, instead evaluate it into a scalar temporary first (get_temp_regvar), before the #pragma omp parallel, and use the result of get_temp_regvar in the if clause (temp /[cl] 2) and in firstprivate clause on omp parallel and as the high bound of _Cilk_for. Hi Jakub, First off, I am sorry for the delay in response. I had a couple family issues to deal with that tied me up for the past week. Yes, I agree that it looks weird in the intermediate code. I tried to move the operator-(...) value into a variable (in the function handle_omp_for_class_iterator) and it was giving me an ICE during gimplify case. The issue was scoping. The if clause is sitting on #pragma omp parallel, but when I move the operator- to a variable it is inside the try block (marked by HERE =). Also, this only happens when we deal with iterators and overloaded opeators. All the other cases the intermediate code looks OK. This only happens in dump of gimple and original. All the intermediate codes after that (= .012) looks OK. Since this is a specific case, can you accept this as a workaround for now? Thanks, Balaji V. Iyer.
[PING]RE: [PING] [PATCH] _Cilk_for for C and C++
Hi Jakub, Did you get a chance to look at this? Thanks, Balaji V. Iyer. -Original Message- From: Iyer, Balaji V Sent: Monday, February 24, 2014 6:17 PM To: 'Jakub Jelinek' Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org'; 'r...@redhat.com' Subject: RE: [PING] [PATCH] _Cilk_for for C and C++ Hi Jakub, Please see my responses below. -Original Message- From: Iyer, Balaji V Sent: Thursday, February 20, 2014 11:38 PM To: Jakub Jelinek Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org'; 'r...@redhat.com' Subject: RE: [PING] [PATCH] _Cilk_for for C and C++ Hi Jakub, I have attached the fixed patch and have answered your questions below. -Original Message- From: Jakub Jelinek [mailto:ja...@redhat.com] Sent: Wednesday, February 19, 2014 6:24 AM To: Iyer, Balaji V Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org'; 'r...@redhat.com' Subject: Re: [PING] [PATCH] _Cilk_for for C and C++ On Wed, Feb 19, 2014 at 04:43:06AM +, Iyer, Balaji V wrote: Attached, please find a patch with the test case attached (for1.cc). The patch is the same but the cp-changelog has been modified to reflect the new test-case. Is this OK to install? 1) have you tested the patch at all? I see FAIL: g++.dg/gomp/for-1.C -std=c++98 (test for errors, line 27) FAIL: g++.dg/gomp/for-1.C -std=c++98 (test for excess errors) FAIL: g++.dg/gomp/for-1.C -std=c++11 (test for errors, line 27) FAIL: g++.dg/gomp/for-1.C -std=c++11 (test for excess errors) FAIL: g++.dg/gomp/for-19.C -std=gnu++98 (internal compiler error) FAIL: g++.dg/gomp/for-19.C -std=gnu++98 (test for warnings, line 30) FAIL: g++.dg/gomp/for-19.C -std=gnu++98 (test for warnings, line 37) FAIL: g++.dg/gomp/for-19.C -std=gnu++98 (test for warnings, line 40) FAIL: g++.dg/gomp/for-19.C -std=gnu++98 (test for excess errors) FAIL: g++.dg/gomp/for-19.C -std=gnu++11 (internal compiler error) FAIL: g++.dg/gomp/for-19.C -std=gnu++11 (test for warnings, line 30) FAIL: g++.dg/gomp/for-19.C -std=gnu++11 (test for warnings, line 37) FAIL: g++.dg/gomp/for-19.C -std=gnu++11 (test for warnings, line 40) FAIL: g++.dg/gomp/for-19.C -std=gnu++11 (test for excess errors) regressions caused by the patch, that is of course unacceptable. Fixed. I apologize for them. I have confirmed that it is OK now. 2) try this updated cf3.cc, e.g. with -O2 -fcilkplus if you can't find out why calling something multiple times is a bad idea, actually the latest patch is even worse than the older one, you now create 3 calls to the end method and 3 calls to operator-. There should be just one call to that, before the #pragma omp parallel obviously, anything that doesn't do that is just bad. I don't see a point in having if clause on the _Cilk_for, just keep it on the #pragma omp parallel only, at ompexp time you can easily find it there, there is no point to check it again in the parallel body of the function. I have removed the if-clause from the _Cilk_for and now it is just in #pragma omp parallel. I have removed the 3rd operator-, but I am not able to remove the 2nd. I am looking into it, but I am not able to do it. The thing is, first operator- was for the if-clause, which I need to calculate the loop-count for the __cilkrts_cilk_for_64 function. The second one is not necessary because the end-value does not matter for _Cilk_for since they will be replaced with the low and high values. I tried several things such as stopping gimplifcation of the cond value, or replacing it with a constant, etc and those are causing other problems elsewhere. The thing is, I am not able to find a way to automate this. I can't assume the cond's end-value is same as count, because this is only true if we have an iterator and the handle_omp_for_iterator function modifies the cond value correctly. I need to use the count value (retval.0) as retval.1 but count-value is computed at a later time than handle_omp_for_iterator (since it does not have any knowledge about the #pragma omp parallel). It is giving the correct answers for the benchmarks and is removing the 2nd operator- when optimization is turned on for the inlinable operator-. Can you provide me some advice about how to do it? I have fixed the issue. Now the 2nd operator- does not occur. Attached, please fixed a fixed patch. Is this OK for trunk? Thanks, Balaji V. Iyer.
Re: [PING] [PATCH] _Cilk_for for C and C++
On Wed, Feb 19, 2014 at 04:43:06AM +, Iyer, Balaji V wrote: Attached, please find a patch with the test case attached (for1.cc). The patch is the same but the cp-changelog has been modified to reflect the new test-case. Is this OK to install? 1) have you tested the patch at all? I see FAIL: g++.dg/gomp/for-1.C -std=c++98 (test for errors, line 27) FAIL: g++.dg/gomp/for-1.C -std=c++98 (test for excess errors) FAIL: g++.dg/gomp/for-1.C -std=c++11 (test for errors, line 27) FAIL: g++.dg/gomp/for-1.C -std=c++11 (test for excess errors) FAIL: g++.dg/gomp/for-19.C -std=gnu++98 (internal compiler error) FAIL: g++.dg/gomp/for-19.C -std=gnu++98 (test for warnings, line 30) FAIL: g++.dg/gomp/for-19.C -std=gnu++98 (test for warnings, line 37) FAIL: g++.dg/gomp/for-19.C -std=gnu++98 (test for warnings, line 40) FAIL: g++.dg/gomp/for-19.C -std=gnu++98 (test for excess errors) FAIL: g++.dg/gomp/for-19.C -std=gnu++11 (internal compiler error) FAIL: g++.dg/gomp/for-19.C -std=gnu++11 (test for warnings, line 30) FAIL: g++.dg/gomp/for-19.C -std=gnu++11 (test for warnings, line 37) FAIL: g++.dg/gomp/for-19.C -std=gnu++11 (test for warnings, line 40) FAIL: g++.dg/gomp/for-19.C -std=gnu++11 (test for excess errors) regressions caused by the patch, that is of course unacceptable. 2) try this updated cf3.cc, e.g. with -O2 -fcilkplus if you can't find out why calling something multiple times is a bad idea, actually the latest patch is even worse than the older one, you now create 3 calls to the end method and 3 calls to operator-. There should be just one call to that, before the #pragma omp parallel obviously, anything that doesn't do that is just bad. I don't see a point in having if clause on the _Cilk_for, just keep it on the #pragma omp parallel only, at ompexp time you can easily find it there, there is no point to check it again in the parallel body of the function. typedef __PTRDIFF_TYPE__ ptrdiff_t; template typename T class I { public: typedef ptrdiff_t difference_type; I (); ~I (); I (T *); I (const I ); T operator * (); T *operator - (); T operator [] (const difference_type ) const; I operator = (const I ); I operator ++ (); I operator ++ (int); I operator -- (); I operator -- (int); I operator += (const difference_type ); I operator -= (const difference_type ); I operator + (const difference_type ) const; I operator - (const difference_type ) const; template typename S friend bool operator == (IS , IS ); template typename S friend bool operator == (const IS , const IS ); template typename S friend bool operator (IS , IS ); template typename S friend bool operator (const IS , const IS ); template typename S friend bool operator = (IS , IS ); template typename S friend bool operator = (const IS , const IS ); template typename S friend bool operator (IS , IS ); template typename S friend bool operator (const IS , const IS ); template typename S friend bool operator = (IS , IS ); template typename S friend bool operator = (const IS , const IS ); template typename S friend typename IS::difference_type operator - (IS , IS ); template typename S friend typename IS::difference_type operator - (const IS , const IS ); template typename S friend IS operator + (typename IS::difference_type , const IS ); private: T *p; }; template typename T IT::I () : p (0) {} template typename T IT::~I () {} template typename T IT::I (T *x) : p (x) {} template typename T IT::I (const I x) : p (x.p) {} template typename T T IT::operator * () { return *p; } template typename T T *IT::operator - () { return p; } template typename T T IT::operator [] (const difference_type x) const { return p[x]; } template typename T IT IT::operator = (const I x) { p = x.p; return *this; } template typename T IT IT::operator ++ () { ++p; return *this; } template typename T IT IT::operator ++ (int) { return I (p++); } template typename T IT IT::operator -- () { --p; return *this; } template typename T IT IT::operator -- (int) { return I (p--); } template typename T IT IT::operator += (const difference_type x) { p += x; return *this; } template typename T IT IT::operator -= (const difference_type x) { p -= x; return *this; } template typename T IT IT::operator + (const difference_type x) const { return I (p + x); } template typename T __attribute__((noinline)) IT IT::operator - (const difference_type x) const { __asm (); return I (p - x); } template typename T bool operator == (IT x, IT y) { return x.p == y.p; } template typename T bool operator == (const IT x, const IT y) { return x.p == y.p; } template typename T bool operator != (IT x, IT y) { return !(x == y); } template typename T bool operator != (const IT x, const IT y) { return !(x == y); } template typename T bool operator (IT x, IT y) { return x.p y.p; } template typename T bool operator (const IT x, const IT y) { return x.p y.p; } template typename T bool operator = (IT x, IT y) { return x.p = y.p; }
[PING] RE: [PING] [PATCH] _Cilk_for for C and C++
Hi Jakub, Did you get a chance to look at this patch? Thanks, Balaji V. Iyer. -Original Message- From: Iyer, Balaji V Sent: Monday, February 10, 2014 5:07 PM To: Jakub Jelinek Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org'; 'r...@redhat.com' Subject: RE: [PING] [PATCH] _Cilk_for for C and C++ Hi Jakub, -Original Message- From: Jakub Jelinek [mailto:ja...@redhat.com] Sent: Monday, February 10, 2014 12:58 PM To: Iyer, Balaji V Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org'; 'r...@redhat.com' Subject: Re: [PING] [PATCH] _Cilk_for for C and C++ On Fri, Feb 07, 2014 at 10:14:21PM +, Iyer, Balaji V wrote: Attached, please find a fixed patch. Along with it, I have also added 2 changelog files for C and C++ respectively. Have you even looked at the second testcase I've posted? gimplification ICEs on it with your latest patch, because firstprivate clause is added for the same variable multiple times, and it seems parallel still isn't around _Cilk_for. I looked at both but forgot to test them with my implementation. Sorry about this. I have fixed the ICE issue. To make sure this does not happen further, I have added your test cf3.C into test suite (renamed to cf3.cc). I hope that is OK with you. I have attached a fixed patch and Changelogs. Is this OK? Thanks, Balaji V. Iyer. Jakub
Re: [PING] [PATCH] _Cilk_for for C and C++
On Mon, Feb 10, 2014 at 10:07:18PM +, Iyer, Balaji V wrote: I looked at both but forgot to test them with my implementation. Sorry about this. I have fixed the ICE issue. To make sure this does not happen further, I have added your test cf3.C into test suite (renamed to cf3.cc). I hope that is OK with you. The testcase is GPL as the original libgomp.c++/for-1.C testcase, so sure. Perhaps it would be much better though if instead of having a compile time testcase you'd just do what libgomp.c++/for-1.C does, just replace all the #pragma omp parallel for in there with _Cilk_for and turn it into a runtime testcase. I have attached a fixed patch and Changelogs. Is this OK? Looks better (note, still looking just at the dumps), but not completely ok yet. On cf3.cc, I see in *.gimple: D.2883 = Jint::begin (j); Iint::I (i, D.2883); D.2885 = Jint::end (j); retval.0 = operator-int (D.2885, i); D.2886 = retval.0 / 2; #pragma omp parallel firstprivate(i) if(D.2886) shared(D.2865) shared(j) { difference_type retval.1; const struct I D.2888; const difference_type D.2866; long int D.2889; struct I D.2890; try { _Cilk_for (D.2864 = 0; D.2864 retval.1; D.2864 = D.2864 + 2) private(D.2864) { D.2889 = D.2864 - D.2865; D.2866 = D.2889; try { D.2890 = Iint::operator+= (i, D.2866); First a minor nit, there is extra newline before _Cilk_for: newline_and_indent (buffer, spc); if (flag_cilkplus gimple_omp_for_kind (gs) == GF_OMP_FOR_KIND_CILKFOR) pp_string (buffer, _Cilk_for (); else pp_string (buffer, for (); I guess for _Cilk_for collapse is never 1, right? If that is the case, then perhaps you should move the newline_and_indent (buffer, spc); call into the else block. More importantly, what is retval.1? I'd expect you should be using retval.0 there and have it also as firstprivate(retval.0) on the parallel. In *.omplower dump I actually see: retval.0 = operator-int (D.2885, i); ... retval.1 = operator-int (D.2888, i); i.e. operator-int is called twice. Jakub
RE: [PING] [PATCH] _Cilk_for for C and C++
-Original Message- From: Jakub Jelinek [mailto:ja...@redhat.com] Sent: Wednesday, February 12, 2014 9:59 AM To: Iyer, Balaji V Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org'; 'r...@redhat.com' Subject: Re: [PING] [PATCH] _Cilk_for for C and C++ On Mon, Feb 10, 2014 at 10:07:18PM +, Iyer, Balaji V wrote: I looked at both but forgot to test them with my implementation. Sorry about this. I have fixed the ICE issue. To make sure this does not happen further, I have added your test cf3.C into test suite (renamed to cf3.cc). I hope that is OK with you. The testcase is GPL as the original libgomp.c++/for-1.C testcase, so sure. Perhaps it would be much better though if instead of having a compile time testcase you'd just do what libgomp.c++/for-1.C does, just replace all the #pragma omp parallel for in there with _Cilk_for and turn it into a runtime testcase. I really don't want to do that because I don't think there is a 1:1 match-up between the rules of #pragma omp for and _Cilk_for. I have attached a fixed patch and Changelogs. Is this OK? Looks better (note, still looking just at the dumps), but not completely ok yet. On cf3.cc, I see in *.gimple: D.2883 = Jint::begin (j); Iint::I (i, D.2883); D.2885 = Jint::end (j); retval.0 = operator-int (D.2885, i); D.2886 = retval.0 / 2; #pragma omp parallel firstprivate(i) if(D.2886) shared(D.2865) shared(j) { difference_type retval.1; const struct I D.2888; const difference_type D.2866; long int D.2889; struct I D.2890; try { _Cilk_for (D.2864 = 0; D.2864 retval.1; D.2864 = D.2864 + 2) private(D.2864) { D.2889 = D.2864 - D.2865; D.2866 = D.2889; try { D.2890 = Iint::operator+= (i, D.2866); First a minor nit, there is extra newline before _Cilk_for: newline_and_indent (buffer, spc); if (flag_cilkplus gimple_omp_for_kind (gs) == GF_OMP_FOR_KIND_CILKFOR) pp_string (buffer, _Cilk_for (); else pp_string (buffer, for (); I guess for _Cilk_for collapse is never 1, right? If that is the case, then perhaps you should move the newline_and_indent (buffer, spc); call into the else block. OK. I will fix this and send you a patch? More importantly, what is retval.1? I'd expect you should be using retval.0 there and have it also as firstprivate(retval.0) on the parallel. In *.omplower dump I actually see: retval.0 = operator-int (D.2885, i); ... retval.1 = operator-int (D.2888, i); i.e. operator-int is called twice. Yes, one is for the if-clause and one is for the condition. It really doesn't matter because we get of the stuff in the condition and replace with our own for loop with something like the for-loop shown below. So retval1 doesn't come into picture. It is only alive from parser to the expand_cilk_for function. For (i = low; i high; i++) { _Cilk_for_body } So, is there any other changes that you need me to make? Thanks, Balaji V. Iyer. Jakub
Re: [PING] [PATCH] _Cilk_for for C and C++
On Wed, Feb 12, 2014 at 03:14:23PM +, Iyer, Balaji V wrote: The testcase is GPL as the original libgomp.c++/for-1.C testcase, so sure. Perhaps it would be much better though if instead of having a compile time testcase you'd just do what libgomp.c++/for-1.C does, just replace all the #pragma omp parallel for in there with _Cilk_for and turn it into a runtime testcase. I really don't want to do that because I don't think there is a 1:1 match-up between the rules of #pragma omp for and _Cilk_for. But there is nothing OpenMP specific on any of the tests, all could as well be tested by removing all the #pragma omp ... lines and just be tested as normal C+++ loops. Is there anything that _Cilk_for wouldn't handle? IMNSHO if you remove all the #pragma omp parallel for lines (even with any clauses it sometimes has) and replace for with _Cilk_for on the following line, you should have a valid Cilk+ program. Only f11 would need to be changed from: #pragma omp parallel { #pragma omp for nowait for (T i = x; i = y; i += 3) baz (i); #pragma omp single { T j = y + 3; baz (j); } } to say: _Cilk_for (T i = x; i = y; i += 3) baz (i); { T j = y + 3; baz (j); } so that it performs the same thing. First a minor nit, there is extra newline before _Cilk_for: newline_and_indent (buffer, spc); if (flag_cilkplus gimple_omp_for_kind (gs) == GF_OMP_FOR_KIND_CILKFOR) pp_string (buffer, _Cilk_for (); else pp_string (buffer, for (); I guess for _Cilk_for collapse is never 1, right? If that is the case, then perhaps you should move the newline_and_indent (buffer, spc); call into the else block. OK. I will fix this and send you a patch? Sure. More importantly, what is retval.1? I'd expect you should be using retval.0 there and have it also as firstprivate(retval.0) on the parallel. In *.omplower dump I actually see: retval.0 = operator-int (D.2885, i); ... retval.1 = operator-int (D.2888, i); i.e. operator-int is called twice. Yes, one is for the if-clause and one is for the condition. It really doesn't matter because we get of the stuff in the condition and replace with our own for loop with something like the for-loop shown below. So retval1 doesn't come into picture. It is only alive from parser to the expand_cilk_for function. For (i = low; i high; i++) { _Cilk_for_body } No, it really does matter. Just look at the *.optimized dump with -O0 -fcilkplus: retval.0_4 = operator-int (_3, i); in _Z3foo1JIiE function and _4 = .omp_data_i_2(D)-j; _5 = Jint::end (_4); retval.1_6 = operator-int (_5, i); retval.3_7 = retval.1_6; in _Z3foo1JIiE._cilk_for_fn.0. All the 4 statements are dead, you really shouldn't emit them, even when optimizing, if e.g. the operator- isn't inline, g++ won't be able to optimize it away. Jakub
RE: [PING] [PATCH] _Cilk_for for C and C++
More importantly, what is retval.1? I'd expect you should be using retval.0 there and have it also as firstprivate(retval.0) on the parallel. In *.omplower dump I actually see: retval.0 = operator-int (D.2885, i); ... retval.1 = operator-int (D.2888, i); i.e. operator-int is called twice. Yes, one is for the if-clause and one is for the condition. It really doesn't matter because we get of the stuff in the condition and replace with our own for loop with something like the for-loop shown below. So retval1 doesn't come into picture. It is only alive from parser to the expand_cilk_for function. For (i = low; i high; i++) { _Cilk_for_body } No, it really does matter. Just look at the *.optimized dump with -O0 - fcilkplus: retval.0_4 = operator-int (_3, i); in _Z3foo1JIiE function and _4 = .omp_data_i_2(D)-j; _5 = Jint::end (_4); retval.1_6 = operator-int (_5, i); retval.3_7 = retval.1_6; in _Z3foo1JIiE._cilk_for_fn.0. All the 4 statements are dead, you really shouldn't emit them, even when optimizing, if e.g. the operator- isn't inline, g++ won't be able to optimize it away. I looked at the test code you send me (cf3.cc) at -O1 and it is removing all the lines you have shown above. Yes, I would imagine -O0 to have code that can be redundant or unnecessary. Some of it could be the artifact of internal code insertion. But isn't the main job of the instruction scheduler to remove all these redundant work? Besides, it is just a function call. The compiler at -O2, -O and -O3 removes the chunk of code that you mentioned. -Balaji V. Iyer. Jakub
Re: [PING] [PATCH] _Cilk_for for C and C++
On Wed, Feb 12, 2014 at 05:04:38PM +, Iyer, Balaji V wrote: I looked at the test code you send me (cf3.cc) at -O1 and it is removing all the lines you have shown above. Yes, I would imagine -O0 to have code that can be redundant or unnecessary. Some of it could be the artifact of internal code insertion. But isn't the main job of the instruction scheduler to remove all these redundant work? Besides, it is just a function call. The compiler at -O2, -O and -O3 removes the chunk of code that you mentioned. As I said, just change the testcase so that the operator isn't inline, and suddenly even -O3 will not be able to remove the call. Jakub
RE: [PING] [PATCH] _Cilk_for for C and C++
-Original Message- From: Jakub Jelinek [mailto:ja...@redhat.com] Sent: Wednesday, February 12, 2014 12:10 PM To: Iyer, Balaji V Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org'; 'r...@redhat.com' Subject: Re: [PING] [PATCH] _Cilk_for for C and C++ On Wed, Feb 12, 2014 at 05:04:38PM +, Iyer, Balaji V wrote: I looked at the test code you send me (cf3.cc) at -O1 and it is removing all the lines you have shown above. Yes, I would imagine -O0 to have code that can be redundant or unnecessary. Some of it could be the artifact of internal code insertion. But isn't the main job of the instruction scheduler to remove all these redundant work? Besides, it is just a function call. The compiler at -O2, -O and -O3 removes the chunk of code that you mentioned. As I said, just change the testcase so that the operator isn't inline, and suddenly even -O3 will not be able to remove the call. I am sorry, I do not see any operators being asked to inline explicitly.. class I { public: typedef ptrdiff_t difference_type; I (); ~I (); I (T *); I (const I ); T operator * (); T *operator - (); T operator [] (const difference_type ) const; I operator = (const I ); I operator ++ (); I operator ++ (int); I operator -- (); I operator -- (int); I operator += (const difference_type ); I operator -= (const difference_type ); I operator + (const difference_type ) const; I operator - (const difference_type ) const; template typename S friend bool operator == (IS , IS ); template typename S friend bool operator == (const IS , const IS ); template typename S friend bool operator (IS , IS ); template typename S friend bool operator (const IS , const IS ); template typename S friend bool operator = (IS , IS ); template typename S friend bool operator = (const IS , const IS ); template typename S friend bool operator (IS , IS ); template typename S friend bool operator (const IS , const IS ); template typename S friend bool operator = (IS , IS ); template typename S friend bool operator = (const IS , const IS ); template typename S friend typename IS::difference_type operator - (IS , IS ); template typename S friend typename IS::difference_type operator - (const IS , const IS ); template typename S friend IS operator + (typename IS::difference_type , const IS ); private: T *p; }; Jakub
Re: [PING] [PATCH] _Cilk_for for C and C++
On Fri, Feb 07, 2014 at 10:14:21PM +, Iyer, Balaji V wrote: Attached, please find a fixed patch. Along with it, I have also added 2 changelog files for C and C++ respectively. Have you even looked at the second testcase I've posted? gimplification ICEs on it with your latest patch, because firstprivate clause is added for the same variable multiple times, and it seems parallel still isn't around _Cilk_for. Jakub
Re: [PING] [PATCH] _Cilk_for for C and C++
On Wed, Feb 05, 2014 at 05:27:26AM +, Iyer, Balaji V wrote: Attached, please find a fixed patch (diff.txt) that will do as you requested (model _Cilk_for like a #pragma omp parallel for). Along with this, I have also attached two Changelog entries (1 for C and 1 for C++). It passes all the tests on my x86_64 box (both 32 and 64 bit modes) and does not affect any other tests in the testsuite. Is this Ok for trunk? A step in the right direction, but I still see issues just from looking at the *.gimple dump: For the first testcase, I see: iter = std::vectorint::begin (array); [return slot optimization] iter.1 = iter; D.13615 = std::vectorint::end (array); [return slot optimization] try { retval.0 = __gnu_cxx::operator-int*, std::vectorint (D.13615, iter); } finally { D.13615 = {CLOBBER}; } #pragma omp parallel schedule(cilk-for grain,0) if(retval.0) #shared(iter.1) shared(D.13632) shared(D.13615) shared(iter) { difference_type retval.2; const difference_type D.13633; int D.13725; struct __normal_iterator D.13726; bool retval.3; int D.13728; int D.13729; int D.13732; iter = iter.1; private(D.13631) _Cilk_for (D.13631 = 0; D.13631 != retval.2; D.13631 = D.13631 + 1) D.13725 = D.13631 - D.13632; So, the issues I see: 1) what is iter.1, why do you have it at all, and, after all, the iterator is a class that needs to be constructed/destructed in the general way, so creating any further copies of something is both costly and undesirable 2) the schedule clause doesn't belong on the omp parallel, but on the _Cilk_for 3) iter should be firstprivate, and there should be no explicit private var with assignment during gimplification, just handle it like any other firstprivate during omp lowering 4) the printing looks weird for _Cilk_for, as I said earlier, the clauses should probably be printed after the closing ) of _Cilk_for rather than after nothing on the previous line; also, there is no {} printed around the _Cilk_for body and the next line is weirdly indented But more importantly, if I create some testcase with a generic C++ conforming iterator (copied over from libgomp/testsuite/libgomp.c++/for-1.C), as in the second testcase, the *.gimple dump shows that _Cilk_for is still around the #pragma omp parallel. The intent of the second testcase is that you can really eyeball all the ctors/dtors/copy ctors etc. that should happen, and for -O0 shouldn't be really inlined. Jakub #include vector void foo (std::vectorint array) { _Cilk_for (std::vectorint::iterator iter = array.begin(); iter != array.end(); iter++) { if (*iter == 6) *iter = 13; } } typedef __PTRDIFF_TYPE__ ptrdiff_t; template typename T class I { public: typedef ptrdiff_t difference_type; I (); ~I (); I (T *); I (const I ); T operator * (); T *operator - (); T operator [] (const difference_type ) const; I operator = (const I ); I operator ++ (); I operator ++ (int); I operator -- (); I operator -- (int); I operator += (const difference_type ); I operator -= (const difference_type ); I operator + (const difference_type ) const; I operator - (const difference_type ) const; template typename S friend bool operator == (IS , IS ); template typename S friend bool operator == (const IS , const IS ); template typename S friend bool operator (IS , IS ); template typename S friend bool operator (const IS , const IS ); template typename S friend bool operator = (IS , IS ); template typename S friend bool operator = (const IS , const IS ); template typename S friend bool operator (IS , IS ); template typename S friend bool operator (const IS , const IS ); template typename S friend bool operator = (IS , IS ); template typename S friend bool operator = (const IS , const IS ); template typename S friend typename IS::difference_type operator - (IS , IS ); template typename S friend typename IS::difference_type operator - (const IS , const IS ); template typename S friend IS operator + (typename IS::difference_type , const IS ); private: T *p; }; template typename T IT::I () : p (0) {} template typename T IT::~I () {} template typename T IT::I (T *x) : p (x) {} template typename T IT::I (const I x) : p (x.p) {} template typename T T IT::operator * () { return *p; } template typename T T *IT::operator - () { return p; } template typename T T IT::operator [] (const difference_type x) const { return p[x]; } template typename T IT IT::operator = (const I x) { p = x.p; return *this; } template typename T IT IT::operator ++ () { ++p; return
RE: [PING] [PATCH] _Cilk_for for C and C++
-Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Jakub Jelinek Sent: Friday, February 7, 2014 9:03 AM To: Iyer, Balaji V Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org'; 'r...@redhat.com' Subject: Re: [PING] [PATCH] _Cilk_for for C and C++ On Wed, Feb 05, 2014 at 05:27:26AM +, Iyer, Balaji V wrote: Attached, please find a fixed patch (diff.txt) that will do as you requested (model _Cilk_for like a #pragma omp parallel for). Along with this, I have also attached two Changelog entries (1 for C and 1 for C++). It passes all the tests on my x86_64 box (both 32 and 64 bit modes) and does not affect any other tests in the testsuite. Is this Ok for trunk? A step in the right direction, but I still see issues just from looking at the *.gimple dump: For the first testcase, I see: iter = std::vectorint::begin (array); [return slot optimization] iter.1 = iter; D.13615 = std::vectorint::end (array); [return slot optimization] try { retval.0 = __gnu_cxx::operator-int*, std::vectorint (D.13615, iter); } finally { D.13615 = {CLOBBER}; } #pragma omp parallel schedule(cilk-for grain,0) if(retval.0) #shared(iter.1) shared(D.13632) shared(D.13615) shared(iter) { difference_type retval.2; const difference_type D.13633; int D.13725; struct __normal_iterator D.13726; bool retval.3; int D.13728; int D.13729; int D.13732; iter = iter.1; private(D.13631) _Cilk_for (D.13631 = 0; D.13631 != retval.2; D.13631 = D.13631 + 1) D.13725 = D.13631 - D.13632; So, the issues I see: 1) what is iter.1, why do you have it at all, and, after all, the iterator is a class that needs to be constructed/destructed in the general way, so creating any further copies of something is both costly and undesirable Well, to get the loop count, I need to calculate it using operator-(array.end (), iter). Now, if I do that iter is already set. I need to reset iter back to the original one (array.begin ()) in the child function. This is why I used a temporary variable called iter1. 2) the schedule clause doesn't belong on the omp parallel, but on the _Cilk_for What if grain is a variable say x? If I have it in the _Cilk_for, then won't it create omp_data_i-x. That is not correct. It should just emit x. But let me look into this to make sure... 3) iter should be firstprivate, and there should be no explicit private var with assignment during gimplification, just handle it like any other firstprivate during omp lowering Do you mean to say I should manually insert a firstprivate for iter and not the system figure out that it is shared? 4) the printing looks weird for _Cilk_for, as I said earlier, the clauses should probably be printed after the closing ) of _Cilk_for rather than after nothing on the previous line; also, there is no {} printed around the _Cilk_for body and the next line is weirdly indented Ok will look into this. But more importantly, if I create some testcase with a generic C++ conforming iterator (copied over from libgomp/testsuite/libgomp.c++/for- 1.C), as in the second testcase, the *.gimple dump shows that _Cilk_for is still around the #pragma omp parallel. The intent of the second testcase is that you can really eyeball all the ctors/dtors/copy ctors etc. that should happen, and for -O0 shouldn't be really inlined. Jakub
Re: [PING] [PATCH] _Cilk_for for C and C++
On Fri, Feb 07, 2014 at 02:33:41PM +, Iyer, Balaji V wrote: So, the issues I see: 1) what is iter.1, why do you have it at all, and, after all, the iterator is a class that needs to be constructed/destructed in the general way, so creating any further copies of something is both costly and undesirable Well, to get the loop count, I need to calculate it using operator-(array.end (), iter). Now, if I do that iter is already set. I need to reset iter back to the original one (array.begin ()) in the child function. This is why I used a temporary variable called iter1. operator- shouldn't really change iter, if it does, it is purely the user's fault, isn't it? It isn't operator -=, so it shouldn't really change array.end () either. 2) the schedule clause doesn't belong on the omp parallel, but on the _Cilk_for What if grain is a variable say x? If I have it in the _Cilk_for, then won't it create omp_data_i-x. That is not correct. It should just emit x. But let me look into this to make sure... You certainly should gimplify the clause operand before the omp parallel, it must be an integral anyway, right? So just use get_temp_regvar? Then simply use firstprivate on the #pragma omp parallel. When you actually omp expand, you'll still be able to find the original variable and look it up on the parallel. But, if you can't make it work, guess I could live with the clause on the parallel. 3) iter should be firstprivate, and there should be no explicit private var with assignment during gimplification, just handle it like any other firstprivate during omp lowering Do you mean to say I should manually insert a firstprivate for iter and not the system figure out that it is shared? Yes. The class iterator is quite special thing, because already the C++ FE lowers it to an integral iterator instead. And when you make it firstprivate, omp lowering/expansion should take care of running the copy constructor/destructor in the parallel for you. Jakub
RE: [PING] [PATCH] _Cilk_for for C and C++
Hello Everyone, Did anyone get a chance to look at this patch (link to patches: http://gcc.gnu.org/ml/gcc-patches/2014-01/msg01612.html)? I tried to do as Jakub mentioned but it hits a road-block when it comes to iterators due to variable scoping issues. This patch does not disrupt any other parts of the code and all the code here are wrapped inside a check for for Cilk Plus enabling. It also passes all the tests and does not disrupt any existing failing or passing ones for both 32 and 64 bit modes in my x86_64 machine. It is the last feature to make Cilk Plus feature-complete. Is the patch OK for trunk? Thanks, Balaji V. Iyer. -Original Message- From: Iyer, Balaji V Sent: Wednesday, January 29, 2014 10:54 AM To: Jakub Jelinek Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org'; 'r...@redhat.com' Subject: RE: [PING] [PATCH] _Cilk_for for C and C++ Hi Jakub, -Original Message- From: Jakub Jelinek [mailto:ja...@redhat.com] Sent: Wednesday, January 29, 2014 6:31 AM To: Iyer, Balaji V Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org'; 'r...@redhat.com' Subject: Re: [PING] [PATCH] _Cilk_for for C and C++ On Tue, Jan 28, 2014 at 04:55:38PM +, Iyer, Balaji V wrote: I thought about it a bit more, and the main issue here is that we need access to the _Cilk_for loop's components both inside the child function and the parent function. I guess for the C++ iterators, if in the _Cilk_for model you need to provide number of iterations before parallelization, it really depends on what the standard allows and what you want to do exactly. Yes, I need the value before the parallelization context hits. This is why in my last patch I had the parallel around the body and omp for around the _Cilk- for statement. If you need to provide the iteration count before spawning the threads and the standard allows you that, then just lower it in the C++ FE already so that you do: vectorint::iterator temp = array.begin (); sizetype tempcount = (array.end () - temp); before the parallel, and then #pragma omp parallel firstprivate(temp, tempcount) _Cilk_for (sizetype temp2 = 0; temp2 tempcount; temp2++) { vectorint::iterator ii = temp + temp2; body } This is kind of what I did (atlest tried to accomplish what you mentioned above). I can look into doing this, but is it possible for you to accept the patch as-is and we will look into fixing it in the future? Thanks, Balaji V. Iyer. or similar. The C++ FE needs to lower the C++ iterators anyway, the middle- end can really only work with integral or pointer iterators, and it depends on how exactly the Cilk+ standard defines _Cilk_for with iterators (what methods must be implemented on the iterators and what methods and in what order should be called). Jakub
Re: [PING] [PATCH] _Cilk_for for C and C++
On Tue, Jan 28, 2014 at 04:55:38PM +, Iyer, Balaji V wrote: I thought about it a bit more, and the main issue here is that we need access to the _Cilk_for loop's components both inside the child function and the parent function. I guess for the C++ iterators, if in the _Cilk_for model you need to provide number of iterations before parallelization, it really depends on what the standard allows and what you want to do exactly. If you need to provide the iteration count before spawning the threads and the standard allows you that, then just lower it in the C++ FE already so that you do: vectorint::iterator temp = array.begin (); sizetype tempcount = (array.end () - temp); before the parallel, and then #pragma omp parallel firstprivate(temp, tempcount) _Cilk_for (sizetype temp2 = 0; temp2 tempcount; temp2++) { vectorint::iterator ii = temp + temp2; body } or similar. The C++ FE needs to lower the C++ iterators anyway, the middle-end can really only work with integral or pointer iterators, and it depends on how exactly the Cilk+ standard defines _Cilk_for with iterators (what methods must be implemented on the iterators and what methods and in what order should be called). Jakub
RE: [PING] [PATCH] _Cilk_for for C and C++
Hi Jakub, -Original Message- From: Jakub Jelinek [mailto:ja...@redhat.com] Sent: Wednesday, January 29, 2014 6:31 AM To: Iyer, Balaji V Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org'; 'r...@redhat.com' Subject: Re: [PING] [PATCH] _Cilk_for for C and C++ On Tue, Jan 28, 2014 at 04:55:38PM +, Iyer, Balaji V wrote: I thought about it a bit more, and the main issue here is that we need access to the _Cilk_for loop's components both inside the child function and the parent function. I guess for the C++ iterators, if in the _Cilk_for model you need to provide number of iterations before parallelization, it really depends on what the standard allows and what you want to do exactly. Yes, I need the value before the parallelization context hits. This is why in my last patch I had the parallel around the body and omp for around the _Cilk-for statement. If you need to provide the iteration count before spawning the threads and the standard allows you that, then just lower it in the C++ FE already so that you do: vectorint::iterator temp = array.begin (); sizetype tempcount = (array.end () - temp); before the parallel, and then #pragma omp parallel firstprivate(temp, tempcount) _Cilk_for (sizetype temp2 = 0; temp2 tempcount; temp2++) { vectorint::iterator ii = temp + temp2; body } This is kind of what I did (atlest tried to accomplish what you mentioned above). I can look into doing this, but is it possible for you to accept the patch as-is and we will look into fixing it in the future? Thanks, Balaji V. Iyer. or similar. The C++ FE needs to lower the C++ iterators anyway, the middle- end can really only work with integral or pointer iterators, and it depends on how exactly the Cilk+ standard defines _Cilk_for with iterators (what methods must be implemented on the iterators and what methods and in what order should be called). Jakub
RE: [PING] [PATCH] _Cilk_for for C and C++
-Original Message- From: Iyer, Balaji V Sent: Monday, January 27, 2014 4:36 PM To: Jakub Jelinek Cc: Jason Merrill; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org'; 'r...@redhat.com' Subject: RE: [PING] [PATCH] _Cilk_for for C and C++ -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Jakub Jelinek Sent: Monday, January 27, 2014 3:50 PM To: Iyer, Balaji V Cc: Jason Merrill; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org'; 'r...@redhat.com' Subject: Re: [PING] [PATCH] _Cilk_for for C and C++ On Mon, Jan 27, 2014 at 08:41:14PM +, Iyer, Balaji V wrote: Did you get a chance to look at this _Cilk_for patch? IMHO it is not as much work as you are fearing, at most a few hours of work to get it right, and well worth doing. So, please at least try it out and if you get stuck with it, explain why. Hi Jakub, I tried it that way in the original patch submission for C (http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01369.html), but it hit a dead-end when I was trying to get STL iterators working for C++. This is why I re-structured things this way to get them both working. Thanks, Balaji V. Iyer. Hi Jakub, I thought about it a bit more, and the main issue here is that we need access to the _Cilk_for loop's components both inside the child function and the parent function. So, at the moment, I have modelled the _Cilk_for as something like this: #pragma omp for schedule (runtime: grain) _Cilk-for (vectorint::iterator ii = array.begin (); ii != array.end (); ii++) #pragma omp parallel { body } From what I understand, you feel this is a bit ugly and you want this to be modelled something like this? #pragma omp parallel for schedule (runtime: grain) _Cilk_for (vectorint::iterator ii = array.begin (); ii != array.end(); ii++) { body } Am I right? As it stands, doing it the way you suggested did not work when we have iterators since iterator expansion pushed inside the child function and its expanded variables are not accessible outside the child function by gimplify_omp_for. That is, the expansion is put after #pragma omp parallel for and that is all pulled into the child function and thus the information to compute the count is lost for the parent function. There is a hack that I think may get around this. This is a bit ugly and really is not the way I would think of _Cilk_fors. I am OK with trying this if you will accept it. If I do something like this: #pragma omp parallel for schedule (runtime:grain) if ((array.end() - array.begin ())/1) _Cilk_for (vector int::iterator ii = array.begin (); ii != array.end (); ii++) { body } The new addition is if clause where if ((end - start) / step) Then, in the expand_parallel_task, I can extract the if (...) clause and then pass the expression as a parameter for the loop-count. Yes, it's bit ugly but if you are willing to accept it, I can try to implement this. Please let me know. Thanks, Balaji V. Iyer.
[PING] [PATCH] _Cilk_for for C and C++
Hi Jakub et al., Did you get a chance to look at this _Cilk_for patch? Thanks, Balaji V. Iyer. -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Iyer, Balaji V Sent: Friday, January 24, 2014 3:34 PM To: Jakub Jelinek Cc: Jason Merrill; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org'; 'r...@redhat.com' Subject: RE: [PATCH] _Cilk_for for C and C++ -Original Message- From: Jakub Jelinek [mailto:ja...@redhat.com] Sent: Friday, January 24, 2014 2:42 PM To: Iyer, Balaji V Cc: Jason Merrill; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org'; 'r...@redhat.com' Subject: Re: [PATCH] _Cilk_for for C and C++ On Thu, Jan 23, 2014 at 04:38:53PM +, Iyer, Balaji V wrote: This is how I started to think of it at first, but then when I thought about it ... in _Cilk_for unlike the #pragma simd's for, the for statement - not the body - (e.g. _Cilk_for (int ii = 0; ii 10; ii++) doesn't really do anything nor does it belong in the child function. It is really mostly used to calculate the loop count and capture step-size and starting point. The child function has its own loop that will have a step size of 1 regardless of your step size. You use the step-size to find the correct spot. Let me give you an example: _Cilk_for (int ii = 0; ii 10; ii = ii + 2) { Array [ii] = 5; } This is translated to the following (assume grain is something that the user input): data_ptr.start = 0; data_ptr.end = 10; data_ptr.step_size = 2; __cilkrts_cilk_for_32 (child_function, data_ptr, (10-0)/2, grain); Child_function (void *data_ptr, int high, int low) { for (xx = low; xx high; xx++) { Tmp_var = (xx * data_ptr-step_size) + data_ptr-start; // Note: if the _Cilk_for was (ii = 9; ii = 0; ii -= 2), we would have something like this: // Tmp_var = data_ptr-end - (xx * data_ptr-step_size) // The for-loop above won't change. Array[Tmp_var] = 5; } } This isn't really much different from #pragma omp parallel for schedule(runtime, N) (i.e. the combined construct), when it is combined, we also don't emit a call to GOMP_parallel but to some other function to which we pass the number of iterations and chunk size (== grain in Cilk+ terminology), the only (minor) difference is that for OpenMP when you handle the whole low ... high range the child function doesn't exit, but calls a function to give it next pari of low/high and only when that function tells it there is no further work to do, it returns. But, the Cilk+ case is clearly the same thing with just implicit telling there is no further work in the current function. So, I'd strongly prefer if you swap the parallel with Cilk_for, just set the flag that the two are combined like OpenMP already has for tons of constructs, and during expansion you just treat it together. Hi Jakub, What you are suggesting here would require a significant rewrite of the code. This version of _Cilk_for works and it does share significant amount of work with OMP routines as requested by other GCC developers. Given the time constraints, let's try to get this version accepted so that the feature will be available for the users and we will look into moving toward your suggestion when the phase 1 opens again. Thanks, Balaji V. Iyer. Jakub
Re: [PING] [PATCH] _Cilk_for for C and C++
On Mon, Jan 27, 2014 at 08:41:14PM +, Iyer, Balaji V wrote: Did you get a chance to look at this _Cilk_for patch? IMHO it is not as much work as you are fearing, at most a few hours of work to get it right, and well worth doing. So, please at least try it out and if you get stuck with it, explain why. Jakub
RE: [PING] [PATCH] _Cilk_for for C and C++
-Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Jakub Jelinek Sent: Monday, January 27, 2014 3:50 PM To: Iyer, Balaji V Cc: Jason Merrill; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org'; 'r...@redhat.com' Subject: Re: [PING] [PATCH] _Cilk_for for C and C++ On Mon, Jan 27, 2014 at 08:41:14PM +, Iyer, Balaji V wrote: Did you get a chance to look at this _Cilk_for patch? IMHO it is not as much work as you are fearing, at most a few hours of work to get it right, and well worth doing. So, please at least try it out and if you get stuck with it, explain why. Hi Jakub, I tried it that way in the original patch submission for C (http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01369.html), but it hit a dead-end when I was trying to get STL iterators working for C++. This is why I re-structured things this way to get them both working. Thanks, Balaji V. Iyer. Jakub
Re: [PATCH] _Cilk_for for C and C++
On Thu, Jan 23, 2014 at 04:38:53PM +, Iyer, Balaji V wrote: This is how I started to think of it at first, but then when I thought about it ... in _Cilk_for unlike the #pragma simd's for, the for statement - not the body - (e.g. _Cilk_for (int ii = 0; ii 10; ii++) doesn't really do anything nor does it belong in the child function. It is really mostly used to calculate the loop count and capture step-size and starting point. The child function has its own loop that will have a step size of 1 regardless of your step size. You use the step-size to find the correct spot. Let me give you an example: _Cilk_for (int ii = 0; ii 10; ii = ii + 2) { Array [ii] = 5; } This is translated to the following (assume grain is something that the user input): data_ptr.start = 0; data_ptr.end = 10; data_ptr.step_size = 2; __cilkrts_cilk_for_32 (child_function, data_ptr, (10-0)/2, grain); Child_function (void *data_ptr, int high, int low) { for (xx = low; xx high; xx++) { Tmp_var = (xx * data_ptr-step_size) + data_ptr-start; // Note: if the _Cilk_for was (ii = 9; ii = 0; ii -= 2), we would have something like this: // Tmp_var = data_ptr-end - (xx * data_ptr-step_size) // The for-loop above won't change. Array[Tmp_var] = 5; } } This isn't really much different from #pragma omp parallel for schedule(runtime, N) (i.e. the combined construct), when it is combined, we also don't emit a call to GOMP_parallel but to some other function to which we pass the number of iterations and chunk size (== grain in Cilk+ terminology), the only (minor) difference is that for OpenMP when you handle the whole low ... high range the child function doesn't exit, but calls a function to give it next pari of low/high and only when that function tells it there is no further work to do, it returns. But, the Cilk+ case is clearly the same thing with just implicit telling there is no further work in the current function. So, I'd strongly prefer if you swap the parallel with Cilk_for, just set the flag that the two are combined like OpenMP already has for tons of constructs, and during expansion you just treat it together. Jakub
RE: [PATCH] _Cilk_for for C and C++
-Original Message- From: Jakub Jelinek [mailto:ja...@redhat.com] Sent: Friday, January 24, 2014 2:42 PM To: Iyer, Balaji V Cc: Jason Merrill; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org'; 'r...@redhat.com' Subject: Re: [PATCH] _Cilk_for for C and C++ On Thu, Jan 23, 2014 at 04:38:53PM +, Iyer, Balaji V wrote: This is how I started to think of it at first, but then when I thought about it ... in _Cilk_for unlike the #pragma simd's for, the for statement - not the body - (e.g. _Cilk_for (int ii = 0; ii 10; ii++) doesn't really do anything nor does it belong in the child function. It is really mostly used to calculate the loop count and capture step-size and starting point. The child function has its own loop that will have a step size of 1 regardless of your step size. You use the step-size to find the correct spot. Let me give you an example: _Cilk_for (int ii = 0; ii 10; ii = ii + 2) { Array [ii] = 5; } This is translated to the following (assume grain is something that the user input): data_ptr.start = 0; data_ptr.end = 10; data_ptr.step_size = 2; __cilkrts_cilk_for_32 (child_function, data_ptr, (10-0)/2, grain); Child_function (void *data_ptr, int high, int low) { for (xx = low; xx high; xx++) { Tmp_var = (xx * data_ptr-step_size) + data_ptr-start; // Note: if the _Cilk_for was (ii = 9; ii = 0; ii -= 2), we would have something like this: // Tmp_var = data_ptr-end - (xx * data_ptr-step_size) // The for-loop above won't change. Array[Tmp_var] = 5; } } This isn't really much different from #pragma omp parallel for schedule(runtime, N) (i.e. the combined construct), when it is combined, we also don't emit a call to GOMP_parallel but to some other function to which we pass the number of iterations and chunk size (== grain in Cilk+ terminology), the only (minor) difference is that for OpenMP when you handle the whole low ... high range the child function doesn't exit, but calls a function to give it next pari of low/high and only when that function tells it there is no further work to do, it returns. But, the Cilk+ case is clearly the same thing with just implicit telling there is no further work in the current function. So, I'd strongly prefer if you swap the parallel with Cilk_for, just set the flag that the two are combined like OpenMP already has for tons of constructs, and during expansion you just treat it together. Hi Jakub, What you are suggesting here would require a significant rewrite of the code. This version of _Cilk_for works and it does share significant amount of work with OMP routines as requested by other GCC developers. Given the time constraints, let's try to get this version accepted so that the feature will be available for the users and we will look into moving toward your suggestion when the phase 1 opens again. Thanks, Balaji V. Iyer. Jakub
Re: [PATCH] _Cilk_for for C and C++
On Sun, Jan 19, 2014 at 04:50:39AM +, Iyer, Balaji V wrote: I have answered your questions below. In addition to your changes, I have also fixed the issues Aldy pointed out and have answered his questions in that thread. With this email I have attached two patches and 2 change-logs (for C and C++). I have also rebased these patches to the trunk revision (r206756) Haven't looked at the patch yet, just applied and looked what it generates: Say in cilk_fors.c -O2 -fcilkplus -std=c99 -fdump-tree-{original,gimple,omplower,ompexp} I'm seeing in *.original dump: Unknown tree: cilk_for which suggests that tree-pretty-print.c doesn't handle CILK_FOR. Much more important is what is seen in the *.gimple dump though: schedule(runtime,0) private(ii) _Cilk_for (ii = 0; ii = 9; ii = ii + 1) { #pragma omp parallel shared(__cilk_incr.0) shared(__cilk_cond.2) shared(__cilk_init.1) shared(ii) shared(Array) { Array[ii] = 1133; } } Why do you put the parallel inside of _Cilk_for rather than the other way around? That looks just wrong. That would represent runtime scheduling of work across the parallel regions at the _Cilk_for, and then in each iteration running the body in several threads concurrently. You want the parallel around the _Cilk_for, and gimple_omp_parallel_set_combined_p (parallel_stmt, true) so that you can then handle it specially during omp lowering/expansion. Also, the printing of _Cilk_for is weird, the clauses (with space before) look really weird above the _Cilk_for when there is no #pragma or something similar. Perhaps print the clauses after _Cilk_for? _Cilk_for (ii = 0; ii = 9; ii = ii + 1) schedule(runtime,0) private(ii) ? Jakub
RE: [PATCH] _Cilk_for for C and C++
Hi Jakub, -Original Message- From: Jakub Jelinek [mailto:ja...@redhat.com] Sent: Thursday, January 23, 2014 5:13 AM To: Iyer, Balaji V Cc: Jason Merrill; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org'; 'r...@redhat.com' Subject: Re: [PATCH] _Cilk_for for C and C++ On Sun, Jan 19, 2014 at 04:50:39AM +, Iyer, Balaji V wrote: I have answered your questions below. In addition to your changes, I have also fixed the issues Aldy pointed out and have answered his questions in that thread. With this email I have attached two patches and 2 change-logs (for C and C++). I have also rebased these patches to the trunk revision (r206756) Haven't looked at the patch yet, just applied and looked what it generates: Say in cilk_fors.c -O2 -fcilkplus -std=c99 -fdump-tree- {original,gimple,omplower,ompexp} I'm seeing in *.original dump: Unknown tree: cilk_for which suggests that tree-pretty-print.c doesn't handle CILK_FOR. OK. I will work on this and send you a patch soon. Much more important is what is seen in the *.gimple dump though: schedule(runtime,0) private(ii) _Cilk_for (ii = 0; ii = 9; ii = ii + 1) { #pragma omp parallel shared(__cilk_incr.0) shared(__cilk_cond.2) shared(__cilk_init.1) shared(ii) shared(Array) { Array[ii] = 1133; } } Why do you put the parallel inside of _Cilk_for rather than the other way around? That looks just wrong. That would represent runtime scheduling of work across the parallel regions at the _Cilk_for, and then in each iteration running the body in several threads concurrently. You want the parallel around the _Cilk_for, and gimple_omp_parallel_set_combined_p (parallel_stmt, true) so that you can then handle it specially during omp lowering/expansion. This is how I started to think of it at first, but then when I thought about it ... in _Cilk_for unlike the #pragma simd's for, the for statement - not the body - (e.g. _Cilk_for (int ii = 0; ii 10; ii++) doesn't really do anything nor does it belong in the child function. It is really mostly used to calculate the loop count and capture step-size and starting point. The child function has its own loop that will have a step size of 1 regardless of your step size. You use the step-size to find the correct spot. Let me give you an example: _Cilk_for (int ii = 0; ii 10; ii = ii + 2) { Array [ii] = 5; } This is translated to the following (assume grain is something that the user input): data_ptr.start = 0; data_ptr.end = 10; data_ptr.step_size = 2; __cilkrts_cilk_for_32 (child_function, data_ptr, (10-0)/2, grain); Child_function (void *data_ptr, int high, int low) { for (xx = low; xx high; xx++) { Tmp_var = (xx * data_ptr-step_size) + data_ptr-start; // Note: if the _Cilk_for was (ii = 9; ii = 0; ii -= 2), we would have something like this: // Tmp_var = data_ptr-end - (xx * data_ptr-step_size) // The for-loop above won't change. Array[Tmp_var] = 5; } } High and low are passed in by the runtime and thus their range can be any number (in this case between 1 and 5) Now, if we model this like #pragma omp parallel for then all _Cilk_for statement will also be pulled into the child function. This can be circumvented (althrough I feel it is a bit convoluted) using the region-data_arg pointers to pass values back and forth and doing other adjustments in expand_omp_for, and it will work for C, but it gets problematic for STL. This is because during the gimplification, the vector.start() and vector.end () are replaced with the start and end integers and the translation is put in the child function and that messes things up in omp-low.c and the calculation for count in the parent function. Another thing also gets messed up for C++ which I can't recall off the top of my head. On high-level, you can't think of _Cilk_for in terms of Open MP's for. These both are orthogonal technologies that produce parallel code with different starting points and assumptions. The reason why I modeled this way in the compiler is so that I can use OMP's compiler-routines. Some things such as creating a child function, inserting *a* call to the library function are same and thus the routines to do those can be shared but the internals are very different and so this is why it is modelled this way in the compiler. Also, the printing of _Cilk_for is weird, the clauses (with space before) look really weird above the _Cilk_for when there is no #pragma or something similar. Perhaps print the clauses after _Cilk_for? _Cilk_for (ii = 0; ii = 9; ii = ii + 1) schedule(runtime,0) private(ii) ? OK. I will work on this and send you a patch. Jakub
RE: [PATCH] _Cilk_for for C and C++
Hi Aldy, I have answered your questions below. But, I am attaching the patch with the response to Jason Merrill's email. I am not attaching them here to remove unnecessary duplication. I hope that's OK with you. If you would like it otherwise please let me know and I can send them to you. Thanks, Balaji V. Iyer. -Original Message- From: Aldy Hernandez [mailto:al...@redhat.com] Sent: Thursday, January 16, 2014 4:19 PM To: Iyer, Balaji V; Jakub Jelinek Cc: Jason Merrill; 'Jeff Law'; 'gcc-patches@gcc.gnu.org'; 'r...@redhat.com' Subject: Re: [PATCH] _Cilk_for for C and C++ Here are a few things. + if (g_expr.value TREE_CODE (g_expr.value) == C_MAYBE_CONST_EXPR) + { + error_at (input_location, cannot convert grain to long integer.\n); + c_parser_skip_to_pragma_eol (parser); + } Remove final period. Also, where's the testcase? Also, there seems to be spurious white space after the }. Was not necessary. Fixed. Is it required that it be a long integer? Because I see no further checks for this. Yes, I believe the language spec says the grainsize should be convertible to a long int. + c_token *token = c_parser_peek_token (parser); + if (token token-type == CPP_KEYWORD + token-keyword == RID_CILK_FOR) It doesn't look like c_parser_peek_token() ever returns NULL, so no need to check for token != 0. Fixed. + tree grain = convert_to_integer (long_integer_type_node, + g_expr.value); + if (grain grain != error_mark_node) + c_parser_cilk_simd (parser, grain); No need to check grain != 0 here either. Fixed. == a.c.003t.original == ;; Function main (null) ;; enabled by -tree-original { int i; int i; Unknown tree: cilk_for #pragma omp parallel { { Found with -fdump-tree-all. You should handle the cilk_for tree code in the pretty printers, and add corresponding test(s). I didn't fix this yet. I didn't understand how to add tests to check pretty-printers... static void -c_parser_cilk_simd (c_parser *parser) +c_parser_cilk_simd (c_parser *parser, tree grain) No documentation for grain. Fixed. + tree clauses = NULL_TREE; + + if (!is_cilk_for) +clauses = c_parser_cilk_all_clauses (parser); + else +clauses = grain; First set of clauses=NULL_TREE is useless. OK. Just my common practice of initialize all the variables when it is defined. I have removed it. static tree c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code, - tree clauses, tree *cclauses) + tree clauses_or_grain, tree *cclauses) Don't overload clauses and grainsize into one argument. Add another argument. Also, document said argument. OK. I have added a new parameter called grain and renamed the clauses_or_grain to clauses. +/* Returns a FUNCTION_DECL of type TYPE whose name is *NAME. */ + +static tree +cilk_declare_looper (const char *name, tree type, enum built_in_function code) +{ I think you should document that it's creating a suitable built-in, not just creating a FUNCTION_DECL. Also, plesae document argument `code'. And call this function something more meaningful, like cilkrts_decalre_builtin or cilk_declare_for_builtin, but definitely not looper :). Renamed to declare_cilk_for_builtin. @@ -1192,6 +1199,9 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, int spc, int flags) case GE_EXPR: pp_greater_equal (buffer); break; + case NE_EXPR: + pp_string (buffer, !=); + break; Thank you :). That was probably my oversight on the pragma simd work. Your welcome. @@ -6603,6 +6614,11 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p) } for_body = NULL; + if (flag_enable_cilkplus TREE_CODE (for_stmt) == CILK_FOR) +{ + tree it = TREE_VEC_ELT (OMP_FOR_INIT (for_stmt), 0); + gimplify_and_add (it, for_pre_body); +} And what Jason said for all the special casing for CILK_FOR in this function... This removed as mentioned in email to Jason. +static inline void +gimple_cilk_for_set_grain (tree grain, gimple gs) +{ + const gimple_statement_omp_for *omp_for_stmt = +as_a gimple_statement_omp_for (gs); + omp_for_stmt-iter[0].grain = grain; +} Can we leave the grainsize in the clause, or does it have to reside in an auxiliary data structure? It seems weird as is, since you're only setting grain for the first element. I think Jason mentioned something similar. Got rid of the grainsize functions and just kept it as a clause in _Cilk_for (as suggested by Jason). +/* A structure with necessary elements from _Cilk_for statement. This + struct. node is passed in to WALK_STMT_INFO-INFO. */ +typedef struct
Re: [PATCH] _Cilk_for for C and C++
On Thu, Jan 16, 2014 at 01:18:59PM -0800, Aldy Hernandez wrote: I'm not a C++ expert, but my understanding was that in C++ you don't need a typedef to use the following structure by name (cilk_for_information). So you can just declare struct cilk_for_information {...} and instantiate it with just cilk_for_information some_instance. If that's the case, get rid of typedef. Yes. That's what create_implicit_typedef does. Marek
Re: [PATCH] _Cilk_for for C and C++
On 01/08/2014 02:46 PM, Iyer, Balaji V wrote: + /* Grain value, only used by _Cilk_for. */ + tree grain; Why can't the grain stay as a clause for the gimple form of the loop? + if (flag_enable_cilkplus TREE_CODE (for_stmt) == CILK_FOR) +{ + tree it = TREE_VEC_ELT (OMP_FOR_INIT (for_stmt), 0); + gimplify_and_add (it, for_pre_body); +} Why doesn't the normal handling of OMP_FOR_INIT work for Cilk? All the special cases for CILK_FOR need comments explaining why they are needed. Also, this seems like you're assigning to the control variable outside of the loop, which doesn't makes sense because we initialize it in each of the invocations of the child function. Right? + /* Original initial, final and increment values are necessary to compute +the loop-count. Otherwise, they are stored in variables and their +context could be changed, potentially making it impossible to compute +them correctly. */ I don't understand. Surely all you care about is the value, and gimplification shouldn't affect that. + /* If VAR is the induction variable of the outer _Cilk_for, then +it needs to be passed as a value not pointer since it +would not be overwritten by the body. */ Here it looks like you're overriding the normal logic because we know that it's safe to assume the induction variable won't be changed by the body of the loop. But why is the induction variable shared in the first place? If it isn't going to change, it can be private. + /* We cannot use the tsubst_omp_clauses since it will try to + do checking such as whether a certain clause can be used + with a certain for-loop. We are just use schedule clause here + as a holder to hold the grain value. */ I don't see the checking you mention. Can't we fix it to do the right thing? + if (code == CILK_FOR) +{ + top_level_body = push_stmt_list (); + top_body = begin_omp_parallel (); +} I wouldn't expect the front end to care that Cilk for is implemented using a parallel call; can't we bring that in at lowering time? Jason
Re: [PATCH] _Cilk_for for C and C++
On Thu, Jan 16, 2014 at 12:29:44PM -0500, Jason Merrill wrote: + if (code == CILK_FOR) +{ + top_level_body = push_stmt_list (); + top_body = begin_omp_parallel (); +} I wouldn't expect the front end to care that Cilk for is implemented using a parallel call; can't we bring that in at lowering time? The gimplifier already cares, so if it shouldn't be added early in the FE, it must be added during genericization. Unless we treat CILK_FOR as implicit parallel in the gimplifier, but I'd say that it is better to have it expressed as parallel with cilk_for nested in it and the combined flag set, so that omp-low.c can then emit it together. Jakub
Re: [PATCH] _Cilk_for for C and C++
Here are a few things. + if (g_expr.value TREE_CODE (g_expr.value) == C_MAYBE_CONST_EXPR) + { + error_at (input_location, cannot convert grain to long integer.\n); + c_parser_skip_to_pragma_eol (parser); + } Remove final period. Also, where's the testcase? Also, there seems to be spurious white space after the }. Is it required that it be a long integer? Because I see no further checks for this. + c_token *token = c_parser_peek_token (parser); + if (token token-type == CPP_KEYWORD + token-keyword == RID_CILK_FOR) It doesn't look like c_parser_peek_token() ever returns NULL, so no need to check for token != 0. + tree grain = convert_to_integer (long_integer_type_node, + g_expr.value); + if (grain grain != error_mark_node) + c_parser_cilk_simd (parser, grain); No need to check grain != 0 here either. == a.c.003t.original == ;; Function main (null) ;; enabled by -tree-original { int i; int i; Unknown tree: cilk_for #pragma omp parallel { { Found with -fdump-tree-all. You should handle the cilk_for tree code in the pretty printers, and add corresponding test(s). static void -c_parser_cilk_simd (c_parser *parser) +c_parser_cilk_simd (c_parser *parser, tree grain) No documentation for grain. + tree clauses = NULL_TREE; + + if (!is_cilk_for) +clauses = c_parser_cilk_all_clauses (parser); + else +clauses = grain; First set of clauses=NULL_TREE is useless. static tree c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code, - tree clauses, tree *cclauses) + tree clauses_or_grain, tree *cclauses) Don't overload clauses and grainsize into one argument. Add another argument. Also, document said argument. +/* Returns a FUNCTION_DECL of type TYPE whose name is *NAME. */ + +static tree +cilk_declare_looper (const char *name, tree type, enum built_in_function code) +{ I think you should document that it's creating a suitable built-in, not just creating a FUNCTION_DECL. Also, plesae document argument `code'. And call this function something more meaningful, like cilkrts_decalre_builtin or cilk_declare_for_builtin, but definitely not looper :). @@ -1192,6 +1199,9 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, int spc, int flags) case GE_EXPR: pp_greater_equal (buffer); break; + case NE_EXPR: + pp_string (buffer, !=); + break; Thank you :). That was probably my oversight on the pragma simd work. @@ -6603,6 +6614,11 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p) } for_body = NULL; + if (flag_enable_cilkplus TREE_CODE (for_stmt) == CILK_FOR) +{ + tree it = TREE_VEC_ELT (OMP_FOR_INIT (for_stmt), 0); + gimplify_and_add (it, for_pre_body); +} And what Jason said for all the special casing for CILK_FOR in this function... +static inline void +gimple_cilk_for_set_grain (tree grain, gimple gs) +{ + const gimple_statement_omp_for *omp_for_stmt = +as_a gimple_statement_omp_for (gs); + omp_for_stmt-iter[0].grain = grain; +} Can we leave the grainsize in the clause, or does it have to reside in an auxiliary data structure? It seems weird as is, since you're only setting grain for the first element. I think Jason mentioned something similar. +/* A structure with necessary elements from _Cilk_for statement. This + struct. node is passed in to WALK_STMT_INFO-INFO. */ +typedef struct cilk_for_information { { should be in a separate line. for a Cilk_for statement, not from Cilk_for. No abbreviation on struct. Either remove the period, or spell it out. Also s/in to/to/. I'm not a C++ expert, but my understanding was that in C++ you don't need a typedef to use the following structure by name (cilk_for_information). So you can just declare struct cilk_for_information {...} and instantiate it with just cilk_for_information some_instance. If that's the case, get rid of typedef. + if (flag_enable_cilkplus BTW, weren't you going to change this to flag_cilkplus or something in some past follow-up? fd-sched_kind = OMP_CLAUSE_SCHEDULE_STATIC; fd-chunk_size = NULL_TREE; + if (flag_enable_cilkplus + gimple_omp_for_kind (fd-for_stmt) == GF_OMP_FOR_KIND_CILKFOR) +fd-sched_kind = OMP_CLAUSE_SCHEDULE_CILKFOR; I believe most of the flag_enable_cilkplus checks in omp-low.c can be removed, especially the ones related to syntax. You shouldn't be getting any cilk constructs this late if cilkplus was not enabled. For that matter, we don't check flag_openmp in this file throughout, we only check for it at the gates. bool is_cilk_for = (flag_enable_cilkplus outer_ctx is_cilk_for_stmt (outer_ctx-stmt, ind_var)); Although here you
Re: [PATCH] _Cilk_for for C and C++
On Tue, Jan 07, 2014 at 10:11:59PM +, Iyer, Balaji V wrote: I used a similar existing one (safelen). Attached, please find 2 fixed patches for C and C++ along with their changelogs. But safelen is something completely different, while if I skim the _Cilk_for docs, the grain is really a chunk size, where the runtime library performs the scheduling of grain sized chunks, so using OMP_CLAUSE_SCHEDULE clause with OMP_CLAUSE_SCHEDULE_KIND (c) = OMP_CLAUSE_SCHEDULE_RUNTIME; OMP_CLAUSE_SCHEDULE_CHUNK_EXPR (c) = grain_expr; sounds like what should be used. OMP_CLAUSE_SAFELEN says what is the minimal vectorization factor the compiler can assume is safe for a simd loop. Jakub
Re: [PATCH] _Cilk_for for C and C++
On 12/17/2013 07:21 PM, Iyer, Balaji V wrote: The reason why I store it in OMP_FOR_CLAUSE is because OMP clauses cannot occur in _Cilk_for. So adding a new clause seem to be an overkill IMHO. I need a place to store the grain value and so I chose this spot. But code expects OMP_FOR_CLAUSES to have a certain form, and you are violating that so that now code needs to check whether we're dealing with a for loop in order to know to parse OMP_FOR_CLAUSES. Doing it your way requires lots of little special cases. Please represent it as a clause. Jason
RE: [PATCH] _Cilk_for for C and C++
-Original Message- From: Jason Merrill [mailto:ja...@redhat.com] Sent: Tuesday, January 7, 2014 3:41 PM To: Iyer, Balaji V; 'Jeff Law'; 'Aldy Hernandez' Cc: 'gcc-patches@gcc.gnu.org'; 'r...@redhat.com'; 'Jakub Jelinek' Subject: Re: [PATCH] _Cilk_for for C and C++ On 12/17/2013 07:21 PM, Iyer, Balaji V wrote: The reason why I store it in OMP_FOR_CLAUSE is because OMP clauses cannot occur in _Cilk_for. So adding a new clause seem to be an overkill IMHO. I need a place to store the grain value and so I chose this spot. But code expects OMP_FOR_CLAUSES to have a certain form, and you are violating that so that now code needs to check whether we're dealing with a for loop in order to know to parse OMP_FOR_CLAUSES. Doing it your way requires lots of little special cases. Please represent it as a clause. Hi Jason, In gimplify_omp_for, I remove the information in OMP_FOR_CLAUSES () and then replace it with a NULL_TREE. Till that point, nothing steps on it (except in pt.c and that I am handling it). Then the grain value is stored in gimple tree for omp_for. Thanks, Balaji V. Iyer. Jason
Re: [PATCH] _Cilk_for for C and C++
On Tue, Jan 07, 2014 at 09:24:21PM +, Iyer, Balaji V wrote: -Original Message- From: Jason Merrill [mailto:ja...@redhat.com] Sent: Tuesday, January 7, 2014 3:41 PM To: Iyer, Balaji V; 'Jeff Law'; 'Aldy Hernandez' Cc: 'gcc-patches@gcc.gnu.org'; 'r...@redhat.com'; 'Jakub Jelinek' Subject: Re: [PATCH] _Cilk_for for C and C++ On 12/17/2013 07:21 PM, Iyer, Balaji V wrote: The reason why I store it in OMP_FOR_CLAUSE is because OMP clauses cannot occur in _Cilk_for. So adding a new clause seem to be an overkill IMHO. I need a place to store the grain value and so I chose this spot. But code expects OMP_FOR_CLAUSES to have a certain form, and you are violating that so that now code needs to check whether we're dealing with a for loop in order to know to parse OMP_FOR_CLAUSES. Doing it your way requires lots of little special cases. Please represent it as a clause. Hi Jason, In gimplify_omp_for, I remove the information in OMP_FOR_CLAUSES () and then replace it with a NULL_TREE. Till that point, nothing steps on it (except in pt.c and that I am handling it). Then the grain value is stored in gimple tree for omp_for. So, you are abusing OMP_FOR_CLAUSES for shorter time, still, I agree with Jason that you shouldn't do that. If you don't want to add a new clause, just use a similar existing one, if grain is something like scheduling chunk size, just with a different name for it, then using OMP_CLAUSE_SCHEDULE with OMP_CLAUSE_SCHEDULE_EXPR being the grain expression would be certainly cleaner. But even adding a new artificial clause isn't that hard. Jakub
Re: [PATCH] _Cilk_for for C and C++
On 12/15/2013 07:40 PM, Iyer, Balaji V wrote: - tree clauses, tree *cclauses) + tree clauses_or_grain, tree *cclauses) Instead of this, please make the grainsize a new type of clause. - return (gimple_omp_subcode (g) GF_OMP_FOR_COMBINED) != 0; + return (gimple_omp_for_kind (g) == GF_OMP_FOR_COMBINED); I don't really know this code, but this change seems unlikely to be correct. Can you explain it? + tree data_name = get_identifier (.omp_data_i); + if (is_cilk_for) +data_name = get_identifier (.cilk_for_data_i); Why does the name of an artificial parameter matter? } +/* A subroutine of expand_omp_for. Generate code for _Cilk_for loop. + Given parameters: Need a blank line after the }. Jason
RE: [PATCH] _Cilk_for for C and C++
- return (gimple_omp_subcode (g) GF_OMP_FOR_COMBINED) != 0; + return (gimple_omp_for_kind (g) == GF_OMP_FOR_COMBINED); I don't really know this code, but this change seems unlikely to be correct. Can you explain it? I really need help on this. I need a new enum type (I call this: GF_OMP_FOR_KIND_CILKFOR) that I need to mark _Cilk_for so that I can differentiate it. Can someone please help me pick a new one or explain to me how the enum gf_mask is structured? I am not understanding the header comment fully... Thanks, Balaji V. Iyer.
RE: [PATCH] _Cilk_for for C and C++
-Original Message- From: Jeff Law [mailto:l...@redhat.com] Sent: Tuesday, December 3, 2013 1:30 AM To: Jason Merrill; Iyer, Balaji V; Aldy Hernandez Cc: gcc-patches@gcc.gnu.org; r...@redhat.com; Jakub Jelinek Subject: Re: [PATCH] _Cilk_for for C and C++ On 11/27/13 17:52, Jason Merrill wrote: On 11/27/2013 04:14 PM, Iyer, Balaji V wrote: I completely agree with you that there are certain parts of Cilk Plus that is similar to OMP4, namely #pragma simd and SIMD-enabled functions (formerly called elemental functions). But, the Cilk keywords is almost completely orthogonal to OpenMP. They are semantically different and one cannot be transformed to another. Cilk uses automatically load-balanced work-stealing using the Cilk runtime, whereas OMP uses work sharing via OMP runtime. There are a number of other semantic differences but this is the core-issue. #pragma simd and #pragma omp have converged in several places but the Cilk part has always been different from OpenMP. Yes, Cilk for loops will use the Cilk runtime and OMP for loops will use the OMP runtime, but that doesn't mean they can't share a lot of the middle end code along the way. We already have several different varieties of parallel/simd loops all represented by GIMPLE_OMP_FOR, and I think this could be another GP_OMP_FOR_KIND_. Right. It's not a question of what runtime they call back into, but that both share a common overall structure. Conceptually I look at a for loop as having 4 main components. Initializer, test condition, increment and the body. I'd like to hope things like the syntatic semantic analysis of the first three would be largely the same. Most of the Cilk specific bits would be in the handling of the body -- but there may be some significant code sharing that can happen there too. ... As you can tell, this is not how openmp handles a #pragma omp for loop. It's different in detail, but #pragma omp parallel for works very similarly: it creates a separate function for the body of the loop and then passes that to GOMP_parallel along with any shared data. My thoughts exactly. I understand you both now. Let me look into the OMP routines and see what it is doing and see how I can port it to _Cilk_for. Thanks, Balaji V. Iyer. jeff
Re: [PATCH] _Cilk_for for C and C++
On Tue, Dec 03, 2013 at 01:25:57PM +, Iyer, Balaji V wrote: As you can tell, this is not how openmp handles a #pragma omp for loop. It's different in detail, but #pragma omp parallel for works very similarly: it creates a separate function for the body of the loop and then passes that to GOMP_parallel along with any shared data. My thoughts exactly. I understand you both now. Let me look into the OMP routines and see what it is doing and see how I can port it to _Cilk_for. Yeah. The work is actually multi-stage, first during gimplification the code does determine what variables are used in the #pragma omp parallel (etc., in your case _Cilk_for) region, and whether they should be shared, or privatized (and in that case in what way, normal private, firstprivate, lastprivate, firstprivate+lastprivate, reduction, ...). Then there is omplower pass (already enabled for Cilk+ due to #pragma simd) that e.g. lowers accesses to shared variables, creates new VAR_DECLs for the privatized vars etc. and then ompexp pass that will create the outlined body of the function and create call to the runtime library. I have no idea what privatization behavior _Cilk_for wants, I'd expect that at least the IV must be privatized, otherwise it would be racy, but about other vars? Jakub
RE: [PATCH] _Cilk_for for C and C++
-Original Message- From: Jakub Jelinek [mailto:ja...@redhat.com] Sent: Tuesday, December 3, 2013 8:40 AM To: Iyer, Balaji V Cc: Jeff Law; Jason Merrill; Aldy Hernandez; gcc-patches@gcc.gnu.org; r...@redhat.com Subject: Re: [PATCH] _Cilk_for for C and C++ On Tue, Dec 03, 2013 at 01:25:57PM +, Iyer, Balaji V wrote: As you can tell, this is not how openmp handles a #pragma omp for loop. It's different in detail, but #pragma omp parallel for works very similarly: it creates a separate function for the body of the loop and then passes that to GOMP_parallel along with any shared data. My thoughts exactly. I understand you both now. Let me look into the OMP routines and see what it is doing and see how I can port it to _Cilk_for. Yeah. The work is actually multi-stage, first during gimplification the code does determine what variables are used in the #pragma omp parallel (etc., in your case _Cilk_for) region, and whether they should be shared, or privatized (and in that case in what way, normal private, firstprivate, lastprivate, firstprivate+lastprivate, reduction, ...). Then there is omplower pass (already enabled for Cilk+ due to #pragma simd) that e.g. lowers accesses to shared variables, creates new VAR_DECLs for the privatized vars etc. and then ompexp pass that will create the outlined body of the function and create call to the runtime library. I have no idea what privatization behavior _Cilk_for wants, I'd expect that at least the IV must be privatized, otherwise it would be racy, but about other vars? In Cilk_for you don't need to privatize any variables. I need to pass in the loop_count, the grain (if the user specifies one), the nested function and its context to a Cilk specific function:__cilkrts_cilk_for_64 (or 32). The nested function has the body of the _Cilk_for and it has 3 parameter, context, start and end, where the start and end are passed in by the runtime which will tell what parts of the loop to execute. This thread has an example: http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03567.html Thanks, Balaji V. Iyer. Jakub
Re: [PATCH] _Cilk_for for C and C++
On Tue, Dec 03, 2013 at 02:01:17PM +, Iyer, Balaji V wrote: In Cilk_for you don't need to privatize any variables. I need to pass in the loop_count, the grain (if the user specifies one), the nested function and its context to a Cilk specific function:__cilkrts_cilk_for_64 (or 32). The nested function has the body of the _Cilk_for and it has 3 parameter, context, start and end, where the start and end are passed in by the runtime which will tell what parts of the loop to execute. This thread has an example: http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03567.html So Cilk+ only allows say: _Cilk_for (int ii = 5; ii 15; ii++) { body } and not int ii; _Cilk_for (ii = 5; ii 15; ii++) { body } ? Other variables can be all shared, that is after all the default for #pragma omp parallel for, except for the IVs and a couple of other exceptions (e.g. readonly vars etc.), if somebody wants private vars, they can be surely declared inside of the body somwhere. Jakub
Re: [PATCH] _Cilk_for for C and C++
On 12/03/13 06:25, Iyer, Balaji V wrote: I understand you both now. Let me look into the OMP routines and see what it is doing and see how I can port it to _Cilk_for. Thanks. I know it's a bit of a pain, but part what's driving the desire to share is to reduce the long term maintenance cost for everyone. jeff
Re: [PATCH] _Cilk_for for C and C++
On 11/27/13 17:52, Jason Merrill wrote: On 11/27/2013 04:14 PM, Iyer, Balaji V wrote: I completely agree with you that there are certain parts of Cilk Plus that is similar to OMP4, namely #pragma simd and SIMD-enabled functions (formerly called elemental functions). But, the Cilk keywords is almost completely orthogonal to OpenMP. They are semantically different and one cannot be transformed to another. Cilk uses automatically load-balanced work-stealing using the Cilk runtime, whereas OMP uses work sharing via OMP runtime. There are a number of other semantic differences but this is the core-issue. #pragma simd and #pragma omp have converged in several places but the Cilk part has always been different from OpenMP. Yes, Cilk for loops will use the Cilk runtime and OMP for loops will use the OMP runtime, but that doesn't mean they can't share a lot of the middle end code along the way. We already have several different varieties of parallel/simd loops all represented by GIMPLE_OMP_FOR, and I think this could be another GP_OMP_FOR_KIND_. Right. It's not a question of what runtime they call back into, but that both share a common overall structure. Conceptually I look at a for loop as having 4 main components. Initializer, test condition, increment and the body. I'd like to hope things like the syntatic semantic analysis of the first three would be largely the same. Most of the Cilk specific bits would be in the handling of the body -- but there may be some significant code sharing that can happen there too. ... As you can tell, this is not how openmp handles a #pragma omp for loop. It's different in detail, but #pragma omp parallel for works very similarly: it creates a separate function for the body of the loop and then passes that to GOMP_parallel along with any shared data. My thoughts exactly. jeff
Re: [PING]RE: [PATCH] _Cilk_for for C and C++
On 11/26/2013 12:23 PM, Iyer, Balaji V wrote: Did you get a chance to look at my _Cilk_for patch for C? BTW, I think pinging less than 24 hours after you send the patch is a bit excessive. :) Jason
Re: [PATCH] _Cilk_for for C and C++
On 11/25/2013 11:03 PM, Iyer, Balaji V wrote: On a broad note, I think there's a lot of OpenMP code you could be reusing here rather than writing it all again. And that way Cilk code will benefit from improvements to OpenMP handling, and vice versa. It probably makes sense to turn Cilk_for into an OMP_FOR loop, and then gimplify into GIMPLE_OMP_FOR, rather than create a new tree code and handle everything at the tree level. But I don't know the OMP code well enough to suggest exactly how that would work. Finer-grained comments: + tree exp = build_new_op (EXPR_LOCATION (op1), code, flags, op0, op1, + NULL_TREE, NULL, tf_none); + if (exp == error_mark_node) +exp = build_x_modify_expr (EXPR_LOCATION (op1), op0, code, op1, tf_none); + if (exp exp != error_mark_node) +return exp; I thought you were changing this? +/* Handler for iterator to compute the loop variable. ADD_OP indicates + whether we need a '+' or '-' operation. LOW indicates the starting point + and LOOP_VAR is the induction variable. This functin returns an + INIT_EXPR. */ This comment still doesn't document VAR2. function + tree exp = build_new_op (loc, add_op, 0, low, loop_var, NULL_TREE, 0, + tf_none); + gcc_assert (exp != error_mark_node); + exp = cp_build_modify_expr (var2, INIT_EXPR, exp, tf_warning_or_error); Looking at online Cilk documentation I see: The increment expression must add to or subtract from the control variable using one of the following supported operations: += -= ++ (prefix or postfix) -- (prefix or postfix) From this, I think people would expect the increment to use a user-defined operator+=/-=/++/--, but your code above uses operator+/- instead. + -fcilkplus must be enabled t use %_Cilk_for%); to +cp_parser_cilk_for (cp_parser *parser, tree grain) Please reuse cp_parser_omp_for, like Aldy did for #pragma simd (cp_parser_cilk_simd) rather than write yet another for-statement parser. This should reduce the patch size quite a bit. +case PRAGMA_CILK_GRAINSIZE: + if (context == pragma_external) + { + error_at (pragma_tok-location, + %#pragma cilk grainsize% may only be be used inside a + function); + break; + } + + /* Ignore the pragma if Cilk Plus is not enabled. */ + if (flag_enable_cilkplus) + { + cp_parser_cilk_grainsize (parser, pragma_tok); + return true; + } default: Do you mean to fall through to the default case if Cilk+ is not enabled? + tmp = CILK_FOR_COND (t); + if (COMPARISON_CLASS_P (tmp)) + { + tree op0 = RECUR (TREE_OPERAND (tmp, 0)); + tree op1 = RECUR (TREE_OPERAND (tmp, 1)); + tmp = build2 (TREE_CODE (tmp), boolean_type_node, op0, op1); + } + CILK_FOR_COND (stmt) = tmp; Why not just recur into CILK_FOR_COND? + tmp = CILK_FOR_EXPR (t); + if (TREE_CODE (tmp) == MODIFY_EXPR) + { + tree lhs = TREE_OPERAND (tmp, 0); + tree rhs = TREE_OPERAND (tmp, 1); + lhs = RECUR (lhs); + rhs = build2 (TREE_CODE (rhs), TREE_TYPE (lhs), + RECUR (TREE_OPERAND (rhs, 0)), + RECUR (TREE_OPERAND (rhs, 1))); + tmp = build2 (MODIFY_EXPR, void_type_node, lhs, rhs); + } + else + tmp = build2 (TREE_CODE (tmp), void_type_node, + RECUR (TREE_OPERAND (tmp, 0)), + RECUR (TREE_OPERAND (tmp, 1))); + finish_for_expr (tmp, stmt); And CILK_FOR_EXPR? Jason
Re: [PING]RE: [PATCH] _Cilk_for for C and C++
On 11/27/2013 12:06 PM, Jason Merrill wrote: On 11/26/2013 12:23 PM, Iyer, Balaji V wrote: Did you get a chance to look at my _Cilk_for patch for C? BTW, I think pinging less than 24 hours after you send the patch is a bit excessive. :) Ah, I see, you were pinging the non-C++ parts. Jason
Re: [PATCH] _Cilk_for for C and C++
On 11/15/2013 02:23 PM, Iyer, Balaji V wrote: One small thing that I have not done that Jakub and several other have asked me before is that, there are no tests in c-c++-common for _Cilk_for. The reason being that the syntax between C and C++ implementations are different. In C++, the induction variable must be defined in the initializer (e.g. it should start wth _Cilk_for (int ii = 0)). In C, this is not allowed (e.g. it should start as _Cilk_for (ii = 0; ii 10; ii++)). That can be handled with #ifdef __cplusplus. Jason
Re: [PATCH] _Cilk_for for C and C++
On Wed, Nov 27, 2013 at 12:48:11PM -0500, Jason Merrill wrote: On 11/15/2013 02:23 PM, Iyer, Balaji V wrote: One small thing that I have not done that Jakub and several other have asked me before is that, there are no tests in c-c++-common for _Cilk_for. The reason being that the syntax between C and C++ implementations are different. In C++, the induction variable must be defined in the initializer (e.g. it should start wth _Cilk_for (int ii = 0)). In C, this is not allowed (e.g. it should start as _Cilk_for (ii = 0; ii 10; ii++)). That can be handled with #ifdef __cplusplus. It isn't allowed even in C99? For OpenMP, int a[30]; void foo () { #pragma omp for for (int i = 0; i 30; i++) a[i] = i; } is valid for C99. So, perhaps you just want /* { dg-additional-options -std=c99 { target c } } */ in the c-c++-common tests? Jakub
Re: [PATCH] _Cilk_for for C and C++
On 11/27/13 10:54, Jakub Jelinek wrote: On Wed, Nov 27, 2013 at 12:48:11PM -0500, Jason Merrill wrote: On 11/15/2013 02:23 PM, Iyer, Balaji V wrote: One small thing that I have not done that Jakub and several other have asked me before is that, there are no tests in c-c++-common for _Cilk_for. The reason being that the syntax between C and C++ implementations are different. In C++, the induction variable must be defined in the initializer (e.g. it should start wth _Cilk_for (int ii = 0)). In C, this is not allowed (e.g. it should start as _Cilk_for (ii = 0; ii 10; ii++)). That can be handled with #ifdef __cplusplus. It isn't allowed even in C99? For OpenMP, int a[30]; void foo () { #pragma omp for for (int i = 0; i 30; i++) a[i] = i; } is valid for C99. So, perhaps you just want /* { dg-additional-options -std=c99 { target c } } */ in the c-c++-common tests? Jakub Yup, that's what I did for the Cilk Plus pragma simd tests.
Re: [PATCH] _Cilk_for for C and C++
On 11/27/13 10:06, Jason Merrill wrote: On 11/25/2013 11:03 PM, Iyer, Balaji V wrote: On a broad note, I think there's a lot of OpenMP code you could be reusing here rather than writing it all again. And that way Cilk code will benefit from improvements to OpenMP handling, and vice versa. It probably makes sense to turn Cilk_for into an OMP_FOR loop, and then gimplify into GIMPLE_OMP_FOR, rather than create a new tree code and handle everything at the tree level. But I don't know the OMP code well enough to suggest exactly how that would work. That's certainly the direction I'd like to see this work go as well. To the fullest extent possible Cilk+ should be layering on top of the OpenMP 4 work -- ie, Cilk+ should really be dealing with parsing issues, then handoff to OpenMP for the real work. Jeff
RE: [PATCH] _Cilk_for for C and C++
-Original Message- From: Jeff Law [mailto:l...@redhat.com] Sent: Wednesday, November 27, 2013 2:43 PM To: Jason Merrill; Iyer, Balaji V; Aldy Hernandez Cc: gcc-patches@gcc.gnu.org; r...@redhat.com; Jakub Jelinek Subject: Re: [PATCH] _Cilk_for for C and C++ On 11/27/13 10:06, Jason Merrill wrote: On 11/25/2013 11:03 PM, Iyer, Balaji V wrote: On a broad note, I think there's a lot of OpenMP code you could be reusing here rather than writing it all again. And that way Cilk code will benefit from improvements to OpenMP handling, and vice versa. It probably makes sense to turn Cilk_for into an OMP_FOR loop, and then gimplify into GIMPLE_OMP_FOR, rather than create a new tree code and handle everything at the tree level. But I don't know the OMP code well enough to suggest exactly how that would work. That's certainly the direction I'd like to see this work go as well. To the fullest extent possible Cilk+ should be layering on top of the OpenMP 4 work -- ie, Cilk+ should really be dealing with parsing issues, then handoff to OpenMP for the real work. Hello Jeff and Jason, I completely agree with you that there are certain parts of Cilk Plus that is similar to OMP4, namely #pragma simd and SIMD-enabled functions (formerly called elemental functions). But, the Cilk keywords is almost completely orthogonal to OpenMP. They are semantically different and one cannot be transformed to another. Cilk uses automatically load-balanced work-stealing using the Cilk runtime, whereas OMP uses work sharing via OMP runtime. There are a number of other semantic differences but this is the core-issue. #pragma simd and #pragma omp have converged in several places but the Cilk part has always been different from OpenMP. I have thought about sharing routines with OpenMP and have done it in several parts of Cilk plus. It is not possible to share any middle end work between Cilk keywords and OpenMP because they are fundamentally different. I have shared some parsing parts with omp in C. Since we are talking about _Cilk_for loops, maybe an example of how a compiler is supposed break down a _Cilk_for loop will help. Please see the example below. It is a simple main routine with one _Cilk_for in it and it returns a local variable X that may or may not be read/written in the body: int main (void) { int X = 0; _Cilk_for (int ii = 5; ii 15; ii++) { body } return X; } This program is converted to the following: /* Low and high fields are passed in by the runtime using the user defined grainsize or the rumtime computed one. Data field is ignored in GCC, please see below. */ cilk_for_helper_function (void *data, int low, int high) { for (ii = low; ii high; ii++) body; } int main (void) { int X = 0; /* This function is actually a call the the runtime whose implementation is in libcilkrts/runtime/cilk-abi-cilk-for.c. */ __cilkrts_cilk_for_64 (__cilk_for_001, /* Nested/Lambda function */ __cilk_for_001, /* Data used by the lambda function, the runtime does not worry about it. It is an interface to pass the information to the lambda function. In GCC we create a nested function so it is ignored. */ 10/* loop_count (15-5) */, 0 /* grain value from the #pragma grainsize pragma */ ); /* Note: if the trip-count is 32 bit then __cilkrts_cilk_for_64 is replaced by __cilkrts_cilk_for_32 */ return X; } As you can tell, this is not how openmp handles a #pragma omp for loop. Thanks, Balaji V. Iyer. Jeff
Re: [PATCH] _Cilk_for for C and C++
On 11/18/13 14:50, Iyer, Balaji V wrote: Attached, please find a refreshed patches (one for C and 1 for C++). The trunk was diffed after Aldy's check in of pragma simd was in. So, now this patch is only dependent on _Cilk_spawn and _Cilk_sync (mostly for execution of tests). They are tested on x86_64 and works successfully. Here are the fixed Changelog entries (C related changelogs are given first then C++): I'm just getting started on the C stuff. This likely will be a partial review. First, it looks like you've got a ton of unrelated cleanup work going on in c-family/cilk.c. Function/parameter renaming and the like. If this stuff is important, can it go forward independently? It's certainly hard to find Cilk_for stuff in there with all the unrelated changes going on. Some of the stuff seems to have moved into cilk.h which is probably good too, but probably should be separated from the Cilk_for support patch as well. The semantics of cilk_check_loop_difference_type seem a bit odd. Can you explain a bit more how those semantics were choosen? And given the implementation, isn't it impossible for it to return NULL_TREE, meaning that check in c_extract_cilk_for_fields is pointless? Isn't cilk_simplify_tree just used in the validation code? Can it be moved into that file and privatized? It probably needs a better name too. I have to also admit, it looks pretty bogus to just start stripping things like that. Oh yea, declaring and using tree_ssa_... is not OK there, thus it's not ok to use STRIP_USELESS_TYPE_CONVERSION. In general, could you do a pass over those functions with external linkage and if they're only used in one file, move them to that file and privatize them? So Cilk_for has a type? That's the impression I get by looking at cilk_loop_convert. In reality, I think it's just poorly named. I've got to believe there's something, somewhere that will do equivalent conversions for you :-) There's a mismatch between direction tracking between cilk_calc_forward_div_op and cilk_compute_incr_direction. THe former handles -1, the latter never returns it. I'd probably prefer to see the direction information represented as an ENUM anyway. . Presumably you're using nested functions to encapsulate the body for the Cilk_for loop? Does the nested function have access to the parent's context? If the verification routines have to stay, then isn't there a lot of commonality between (for example) cilk_set_incr_info and c_check_cilk_loop_incr? Anything worth refactoring there? Probably not I guess I must have missed something -- when can we get a CONTINUE_STMT outside a loop body? See cilk_resolve_continue_stmts. In c-common.h, you added several prototypes for things in c-cilkplus.c. We're trying to avoid doing that, instead favoring putting them into c-cilkplus.h and including that where needed. Similarly in c-tree.h Is there any significant sharable code between parsing a normal C for statement a a Cilk_for that we could factor out? What about the validation code? Can any of that be shared with OpenMP? What about the gimplification? Are the syntatical differences so significant that we can't share any of that code? Any particular reason why the runtime uses different names for the 32bit and 64bit Clik_for support? ISTM you wouldn't ever be binding the 32bit and 64bit runtimes into a single application, so you could have just called it __cilkrts_cilk_for and be done with it. I guess it's too late to change that now. Or does the 32/64 split refer to the # bits necessary to hold the iteration counter? vertical whitespace removed after the cilk_arrow function, causing it to end up immediately adjacent to the comment for the next function. I suspect this was unintentional. You define CILK_FOR_STMT and new gimplification routines to handle it. Is there any way you can funnel all this through the OpenMP 4 support? It looks like you've got unrelated cleanups going on in c-family/cilk.c. Function renaming and such. Is that strictly related to Cilk_for support? Much of it looks independent of Cilk_for support. Seems to me like that should break out and go forward independently. I'm going to have to look at this some more, but I wanted to give you as much feedback as I could before I disappear for the holidays. jeff
Re: [PATCH] _Cilk_for for C and C++
On 11/27/2013 04:14 PM, Iyer, Balaji V wrote: I completely agree with you that there are certain parts of Cilk Plus that is similar to OMP4, namely #pragma simd and SIMD-enabled functions (formerly called elemental functions). But, the Cilk keywords is almost completely orthogonal to OpenMP. They are semantically different and one cannot be transformed to another. Cilk uses automatically load-balanced work-stealing using the Cilk runtime, whereas OMP uses work sharing via OMP runtime. There are a number of other semantic differences but this is the core-issue. #pragma simd and #pragma omp have converged in several places but the Cilk part has always been different from OpenMP. Yes, Cilk for loops will use the Cilk runtime and OMP for loops will use the OMP runtime, but that doesn't mean they can't share a lot of the middle end code along the way. We already have several different varieties of parallel/simd loops all represented by GIMPLE_OMP_FOR, and I think this could be another GP_OMP_FOR_KIND_. ... As you can tell, this is not how openmp handles a #pragma omp for loop. It's different in detail, but #pragma omp parallel for works very similarly: it creates a separate function for the body of the loop and then passes that to GOMP_parallel along with any shared data. Jason
[PING]RE: [PATCH] _Cilk_for for C and C++
Hi Jeff et al., Did you get a chance to look at my _Cilk_for patch for C? Thanks, Balaji V. Iyer. -Original Message- From: Iyer, Balaji V Sent: Monday, November 18, 2013 4:51 PM To: Aldy Hernandez Cc: gcc-patches@gcc.gnu.org; Jeff Law; Jason Merrill (ja...@redhat.com); r...@redhat.com Subject: RE: [PATCH] _Cilk_for for C and C++ Hello Everyone, Please see my comment below: -Original Message- From: Aldy Hernandez [mailto:al...@redhat.com] Sent: Friday, November 15, 2013 4:51 PM To: Iyer, Balaji V Cc: gcc-patches@gcc.gnu.org; Jeff Law; Jason Merrill (ja...@redhat.com); r...@redhat.com Subject: Re: [PATCH] _Cilk_for for C and C++ On 11/15/13 12:23, Iyer, Balaji V wrote: This patch is dependent on the following patches: #pragma simd work (they both share the same parser routines) I have just committed this to trunk, so it shouldn't be a blocker. Also, in the past 2 days the #pragma simd parsing has been merged with the OpenMP parsing routines, so please adjust your patch accordingly. Attached, please find a refreshed patches (one for C and 1 for C++). The trunk was diffed after Aldy's check in of pragma simd was in. So, now this patch is only dependent on _Cilk_spawn and _Cilk_sync (mostly for execution of tests). They are tested on x86_64 and works successfully. Here are the fixed Changelog entries (C related changelogs are given first then C++): C- Related Changes == === gcc/ChangeLog. 2013-11-18 Balaji V. Iyer balaji.v.i...@intel.com * cilk-builtins.def: Added 2 builtin functions: __cilkrts_cilk_for_64 and __cilkrts_cilk_for_32. * cilk-common.c (cilk_declare_looper): New function. (cilk_init_builtins): Added two calls to cilk_declare_looper. * cilk.h (enum cilk_tree_index): Added two enums: CILK_TI_F_LOOP_32 and CILK_TI_F_LOOP_64. (enum add_variable_type): Moved here from c-family/cilk.c (enum cilk_block_type): Likewise. (struct wrapper_data): Likewise. (struct cilk_for_desc): New struct. (cilk_for_32_fndecl): New #define. (cilk_for_64_fndecl): Likewise. * tree.h (CILK_FOR_INIT): Likewise. (CILK_FOR_COND): Likewise. (CILK_FOR_EXPR): Likewise. (CILK_FOR_BODY): Likewise. (CILK_FOR_SCOPE): Likewise. (CILK_FOR_GRAIN): Likewise. (CILK_FOR_VAR): Likewise. * gimplify.c (gimplify_expr): Added CILK_FOR_STMT case. * tree-pretty-print.c (dump_generic_node): Likewise. * langhooks-def.h (LANG_HOOKS_CILKPLUS_GIMPLIFY_CILK_FOR): New #define. (LANG_HOOKS_CILKPLUS): Added LANG_HOOKS_CILKPLUS_GIMPLIFY_FOR field. * langhooks.h (struct lang_hooks_for_cilkplus): Added a new field gimplify_cilk_for. * tree.def: Added a new tree CILK_FOR_STMT. gcc/c-family/ChangeLog. 2013-11-18 Balaji V. Iyer balaji.v.i...@intel.com * c-cilkplus.c (c_check_cilk_loop_incr): New function. (c_validate_cilk_plus_loop): Likewise. (c_check_cilk_loop): Likewise. (c_finish_cilk_for_loops): Likewise. (cp_finish_cilk_for_loops): Likewise. * c-common.c (c_common_resword): Added _Cilk_for keyword. * c-common.h (enum rid): Added RID_CILK_FOR. (cp_finish_cilk_for_loop): New prototype. (c_finish_cilk_for_loop): Likewise. (c_validate_cilk_loop): Likewise. (c_check_cilk_loop): Likewise. (cilk_init_fd): Likewise. (cilk_extract_free_variables): Likewise. (cilk_create_cilk_helper_decl): Likewise. (cilk_call_graph_add_fn): Likewise. (cilk_outline_body): Likewise. (cilk_check_loop_difference_type): Likewise. (declare_cilk_for_parms): Likewise. (declare_cilk_for_vars): Likewise. (cilk_loop_convert): Likewise. (cilk_divide_count): Likewise. (cilk_calc_forward_div_op): Likewise. (cilk_compute_loop_count): Likewise. (insert_cilk_for_nested_fn): Likewise. (cilk_compute_loop_var): Likewise. (cilk_set_inclusive_and_direction): Likewise. (cilk_set_iter_difftype): Likewise. (cilk_set_incr_info): Likewise. (cilk_set_init_info): Likewise. (clk_simplify_tree): Likewise. (cilk_find_code_from_call): Likewise. (cilk_tree_operand_noconv): Likewise. (cilk_resolve_continue_stmts): Likewise. * c-pragma.c (init_pragma): Added pragma grainsize. * c-pragma.h (enum pragma_kind): Added PRAGMA_CILK_GRAINSIZE. * cilk.c (enum add_variable_type): Moved to ../cilk.h. (enum cilk_block_type): Likewise. (struct wrapper_data): Likewise. (cilk_call_graph_add_fn): New function. (cilk_create_cilk_helper_decl): Likewise. (cilk_outline): Renamed to cilk_outline_body. Also added a parameter to hold throw flag for C
Re: [PATCH] _Cilk_for for C and C++
On 11/18/2013 04:50 PM, Iyer, Balaji V wrote: + int flags = LOOKUP_PROTECT | LOOKUP_ONLYCONVERTING; Why not LOOKUP_NORMAL? LOOKUP_ONLYCONVERTING isn't relevant in this context. + tree exp = build_new_op (EXPR_LOCATION (op1), code, flags, op0, op1, + NULL_TREE, NULL, 0); Use tf_none instead of 0. + if (exp == error_mark_node) +exp = build_x_modify_expr (EXPR_LOCATION (op1), op0, code, op1, tf_none); + if (exp exp != error_mark_node) +return exp; This doesn't make sense to me. build_x_modify_expr takes codes like PLUS_EXPR and then does an assignment afterward; we don't want to quietly do += just because there's some error with the evaluation of the + operation. What is this code trying to do? +/* Handler for iterator to compute the loop variable. ADD_OP indicates + whether we need a '+' or '-' operation. LOW indicates the starting point + and LOOP_VAR is the induction variable. Returns an expression (or a + STATEMENT_LIST of expressions). If it is unable to find the appropriate + iteration, then it returns an error mark node and its parent will set + the loop as invalid. */ This doesn't explain what VAR2 is. And it seems like you're also using LOW as the increment? + tree new_stmt = build_x_modify_expr (loc, new_var, INIT_EXPR, + build_zero_cst (TREE_TYPE (new_var)), + tf_warning_or_error); + if (new_stmt == error_mark_node) + return error_mark_node; + append_to_statement_list (new_stmt, exp); + new_stmt = build_x_modify_expr (loc, new_var, NOP_EXPR, low, + tf_warning_or_error); Why assign 0 if you're going to immediately assign low afterwards? + /* We have to manually create this loop for two reasons: + a. We need to have access to continue and start label since we need +to resolve continue and breaks by hand. Why do you need to resolve them by hand? It looks like break isn't even allowed. + b. C++ doesn't provide a c_finish_loop function like C does. */ Why is that important? sk_for, /* The scope of the variable declared in a for-init-statement. */ + sk_cilk_for, /* The scope of the variable declared in _Cilk_for init + statement. */ How is this different from a normal for-init-statement? Nothing seems to use it. Jason
Re: [PATCH] _Cilk_for for C and C++
One small thing that I have not done that Jakub and several other have asked me before is that, there are no tests in c-c++-common for _Cilk_for. The reason being that the syntax between C and C++ implementations are different. In C++, the induction variable must be defined in the initializer (e.g. it should start wth _Cilk_for (int ii = 0)). In C, this is not allowed (e.g. it should start as _Cilk_for (ii = 0; ii 10; ii++)). For pragma simd what I did was put the tests in c-c++-common and pass -std=c99 to the C tests. That should allow declaration at initialization. Can you do this? Aldy
Re: [PATCH] _Cilk_for for C and C++
On 11/15/13 12:23, Iyer, Balaji V wrote: This patch is dependent on the following patches: #pragma simd work (they both share the same parser routines) I have just committed this to trunk, so it shouldn't be a blocker. Also, in the past 2 days the #pragma simd parsing has been merged with the OpenMP parsing routines, so please adjust your patch accordingly. Thanks. Aldy