On Thu, Jan 23, 2014 at 04:38:53PM +0000, Iyer, Balaji V wrote: > This is how I started to think of it at first, but then when I thought > about it ... in _Cilk_for unlike the #pragma simd's for, the for statement - > not the body - (e.g. "_Cilk_for (int ii = 0; ii < 10; ii++") doesn't really > do anything nor does it belong in the child function. It is really mostly > used to calculate the loop count and capture step-size and starting point. > > The child function has its own loop that will have a step size of 1 > regardless of your step size. You use the step-size to find the correct spot. > Let me give you an example: > > _Cilk_for (int ii = 0; ii < 10; ii = ii + 2) > { > Array [ii] = 5; > } > > This is translated to the following (assume grain is something that the user > input): > > data_ptr.start = 0; > data_ptr.end = 10; > data_ptr.step_size = 2; > __cilkrts_cilk_for_32 (child_function, &data_ptr, (10-0)/2, grain); > > Child_function (void *data_ptr, int high, int low) > { > for (xx = low; xx < high; xx++) > { > Tmp_var = (xx * data_ptr->step_size) + data_ptr->start; > // Note: if the _Cilk_for was (ii = 9; ii >= 0; ii -= 2), we > would have something like this: > // Tmp_var = data_ptr->end - (xx * data_ptr->step_size) > // The for-loop above won't change. > Array[Tmp_var] = 5; > } > }
This isn't really much different from #pragma omp parallel for schedule(runtime, N) (i.e. the combined construct), when it is combined, we also don't emit a call to GOMP_parallel but to some other function to which we pass the number of iterations and chunk size (== grain in Cilk+ terminology), the only (minor) difference is that for OpenMP when you handle the whole low ... high range the child function doesn't exit, but calls a function to give it next pari of low/high and only when that function tells it there is no further work to do, it returns. But, the Cilk+ case is clearly the same thing with just implicit telling there is no further work in the current function. So, I'd strongly prefer if you swap the parallel with Cilk_for, just set the flag that the two are combined like OpenMP already has for tons of constructs, and during expansion you just treat it together. Jakub