Hi Jakub et al., Did you get a chance to look at this _Cilk_for patch?
Thanks, Balaji V. Iyer. > -----Original Message----- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Iyer, Balaji V > Sent: Friday, January 24, 2014 3:34 PM > To: Jakub Jelinek > Cc: Jason Merrill; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org'; > 'r...@redhat.com' > Subject: RE: [PATCH] _Cilk_for for C and C++ > > > > > -----Original Message----- > > From: Jakub Jelinek [mailto:ja...@redhat.com] > > Sent: Friday, January 24, 2014 2:42 PM > > To: Iyer, Balaji V > > Cc: Jason Merrill; 'Jeff Law'; 'Aldy Hernandez'; > > 'gcc-patches@gcc.gnu.org'; 'r...@redhat.com' > > Subject: Re: [PATCH] _Cilk_for for C and C++ > > > > On Thu, Jan 23, 2014 at 04:38:53PM +0000, Iyer, Balaji V wrote: > > > This is how I started to think of it at first, but then when I > > > thought > > about it ... in _Cilk_for unlike the #pragma simd's for, the for > > statement - not the body - (e.g. "_Cilk_for (int ii = 0; ii < 10; > > ii++") doesn't really do anything nor does it belong in the child > > function. It is really mostly used to calculate the loop count and capture > step-size and starting point. > > > > > > The child function has its own loop that will have a step size of 1 > > regardless of your step size. You use the step-size to find the correct > > spot. > > Let me give you an example: > > > > > > _Cilk_for (int ii = 0; ii < 10; ii = ii + 2) { > > > Array [ii] = 5; > > > } > > > > > > This is translated to the following (assume grain is something that > > > the user > > input): > > > > > > data_ptr.start = 0; > > > data_ptr.end = 10; > > > data_ptr.step_size = 2; > > > __cilkrts_cilk_for_32 (child_function, &data_ptr, (10-0)/2, grain); > > > > > > Child_function (void *data_ptr, int high, int low) { > > > for (xx = low; xx < high; xx++) > > > { > > > Tmp_var = (xx * data_ptr->step_size) + data_ptr->start; > > > // Note: if the _Cilk_for was (ii = 9; ii >= 0; ii -= 2), we > > > would > > have something like this: > > > // Tmp_var = data_ptr->end - (xx * data_ptr->step_size) > > > // The for-loop above won't change. > > > Array[Tmp_var] = 5; > > > } > > > } > > > > This isn't really much different from > > #pragma omp parallel for schedule(runtime, N) (i.e. the combined > > construct), when it is combined, we also don't emit a call to > > GOMP_parallel but to some other function to which we pass the number > > of iterations and chunk size (== grain in Cilk+ terminology), the only > > (minor) difference is that for OpenMP when you handle the whole low ... > > high range the child function doesn't exit, but calls a function to > > give it next pari of low/high and only when that function tells it > > there is no further work to do, it returns. But, the Cilk+ case is > > clearly the same thing with just implicit telling there is no further work > > in > the current function. > > > > So, I'd strongly prefer if you swap the parallel with Cilk_for, just > > set the flag that the two are combined like OpenMP already has for > > tons of constructs, and during expansion you just treat it together. > > Hi Jakub, > What you are suggesting here would require a significant rewrite of > the code. This version of _Cilk_for works and it does share significant amount > of work with OMP routines as requested by other GCC developers. Given > the time constraints, let's try to get this version accepted so that the > feature > will be available for the users and we will look into moving toward your > suggestion when the phase 1 opens again. > > Thanks, > > Balaji V. Iyer. > > > > > > Jakub