Hi Jakub,

> -----Original Message-----
> From: Jakub Jelinek [mailto:ja...@redhat.com]
> Sent: Thursday, January 23, 2014 5:13 AM
> To: Iyer, Balaji V
> Cc: Jason Merrill; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
> 'r...@redhat.com'
> Subject: Re: [PATCH] _Cilk_for for C and C++
> 
> On Sun, Jan 19, 2014 at 04:50:39AM +0000, Iyer, Balaji V wrote:
> > I have answered your questions below. In addition to your changes, I
> > have also fixed the issues Aldy pointed out and have answered his
> > questions in that thread.  With this email I have attached two patches
> > and 2 change-logs (for C and C++).  I have also rebased these patches
> > to the trunk revision (r206756)
> 
> Haven't looked at the patch yet, just applied and looked what it generates:
> 
> Say in cilk_fors.c -O2 -fcilkplus -std=c99 -fdump-tree-
> {original,gimple,omplower,ompexp}
> I'm seeing in *.original dump:
>     <<< Unknown tree: cilk_for
> which suggests that tree-pretty-print.c doesn't handle CILK_FOR.
> 

OK. I will work on this and send you a patch soon.

> Much more important is what is seen in the *.gimple dump though:
>            schedule(runtime,0) private(ii)
>           _Cilk_for (ii = 0; ii <= 9; ii = ii + 1)
>             {
>               #pragma omp parallel shared(__cilk_incr.0) shared(__cilk_cond.2)
> shared(__cilk_init.1) shared(ii) shared(Array)
>                 {
>                   Array[ii] = 1133;
>                 }
>             }
> Why do you put the parallel inside of _Cilk_for rather than the other way
> around?  That looks just wrong.  That would represent runtime scheduling of
> work across the parallel regions at the _Cilk_for, and then in each iteration
> running the body in several threads concurrently.
> You want the parallel around the _Cilk_for, and
> gimple_omp_parallel_set_combined_p (parallel_stmt, true) so that you can
> then handle it specially during omp lowering/expansion.

        This is how I started to think of it at first, but then when I thought 
about it ... in _Cilk_for unlike the #pragma simd's for, the for statement - 
not the body - (e.g. "_Cilk_for (int ii = 0; ii < 10; ii++") doesn't really do 
anything nor does it belong in the child function. It is really mostly used to 
calculate the loop count and capture step-size and starting point.

        The child function has its own loop that will have a step size of 1 
regardless of your step size. You use the step-size to find the correct spot. 
Let me give you an example:

_Cilk_for (int ii = 0; ii < 10; ii = ii  + 2)
{
        Array [ii] = 5;
}

This is translated to the following (assume grain is something that the user 
input):

data_ptr.start = 0;
data_ptr.end = 10;
data_ptr.step_size = 2;
__cilkrts_cilk_for_32 (child_function, &data_ptr, (10-0)/2, grain);

Child_function (void *data_ptr, int high, int low)
{
        for (xx = low; xx < high; xx++) 
         {
                Tmp_var = (xx * data_ptr->step_size) + data_ptr->start;
                // Note: if the _Cilk_for was (ii = 9; ii >= 0; ii -= 2), we 
would have something like this:
                // Tmp_var = data_ptr->end - (xx * data_ptr->step_size)
                // The for-loop above won't change.  
                Array[Tmp_var] = 5;
        }
}

High and low are passed in by the runtime and thus their range can be any 
number (in this case between 1 and 5)

Now, if we model this like #pragma omp parallel for then all _Cilk_for 
statement will also be pulled into the child function. This can be circumvented 
(althrough I feel it is a bit convoluted) using the region->data_arg pointers 
to pass values back and forth and doing other adjustments in expand_omp_for, 
and it will work for C, but it gets problematic for STL. This is because during 
the gimplification, the vector.start() and vector.end () are replaced with the 
start and end integers and the translation is put in the child function and 
that messes things up in omp-low.c and the calculation for count in the parent 
function. Another thing also gets messed up for C++ which I can't recall off 
the top of my head.

On high-level, you can't think of _Cilk_for in terms of Open MP's for. These 
both are orthogonal technologies that produce parallel code with different 
starting points and assumptions. The reason why I modeled this way in the 
compiler is so that I can use OMP's  compiler-routines. Some things such as 
creating a child function, inserting *a* call to the library function are same 
and thus the routines to do those can be shared but the internals are very 
different and so this is why it  is modelled this way in the compiler.

> 
> Also, the printing of _Cilk_for is weird, the clauses (with space before) look
> really weird above the _Cilk_for when there is no #pragma or something
> similar.  Perhaps print the clauses after _Cilk_for?
>   _Cilk_for (ii = 0; ii <= 9; ii = ii + 1) schedule(runtime,0) private(ii) ?
> 

OK. I will work on this and send you a patch.

>       Jakub

Reply via email to