Re: Issue with _Cilk_for

2014-01-31 Thread Jakub Jelinek
On Wed, Jan 29, 2014 at 10:58:55PM +, Iyer, Balaji V wrote:
> This is the testcase I am using:
> 
> _Cilk_for (vector::iterator iter = array.begin(); iter != array.end();
>   iter++)
> {
>if (*iter  == 6)
>  *iter = 13;
> }

> ==
>  try
> {
>   #pragma omp parallel schedule(cilk-for,0) 
> if(__gnu_cxx::operator- > ((const struct 
> __normal_iterator &) (const struct __normal_iterator *) &TARGET_EXPR 
> ::end (&array)>, (const struct __normal_iterator &) 
> (const struct __normal_iterator *) &iter))
> {
>   {
> {
>   struct iterator iter;
>   difference_type D.28409;
>   difference_type D.28410;
> 
> struct iterator iter;
>   <   (void) (iter = TARGET_EXPR ::begin (&array)>) 
> >;
> difference_type D.28409;
> difference_type D.28410;
>grainsize = 0 if(__gnu_cxx::operator- std::vector > ((const struct __normal_iterator &) (const struct 
> __normal_iterator *) &TARGET_EXPR ::end (&array)>, 
> (const struct __normal_iterator &) (const struct __normal_iterator *) &iter))
> {
>   <   (void) (D.28410 = 0) >
>   _Cilk_for (D.28409 = 0; D.28409 != < __gnu_cxx::operator- > ((const struct 
> __normal_iterator &) (const struct __normal_iterator *) &TARGET_EXPR 
> ::end (&array)>, (const struct __normal_iterator &) 
> (const struct __normal_iterator *) &iter)>>; D.28409 = D.28409 + 1)

Can you explain why you emit anything in between the parallel and _Cilk_for? 
I don't see why it should be needed.
Depending on what the Cilk+ standard allows you to do (can array.begin ()
be evaluated multiple times or not, and if not, can you invoke say a copy
constructor on it), you should just emit an EXPR_STMT that initializes
an artificial scalar (say get_temp_regvar created) to
(array.end () - array.begin ()) before you add the parallel, or,
if array.begin () can't be evaluated multiple times, then construct some
temporary before the parallel with array.begin () as initializer
and then do the subtraction between array.end () and that temporary.

Then the parallel, with _Cilk_for immediately in it and just another
temporary scalar as loop iterator there, and only inside of _Cilk_for
declare the iterator var and construct (if it has been declared in _Cilk_for, 
otherwise
just initialize it) to, depending on what Cilk+ requires, either to
array.begin () and then operator+ it to the corresponding value, or
construct already to array.begin () + scalariv.

Jakub


Re: Issue with _Cilk_for

2014-01-31 Thread Jakub Jelinek
On Fri, Jan 31, 2014 at 04:42:57PM +0100, Jakub Jelinek wrote:
> Can you explain why you emit anything in between the parallel and _Cilk_for? 
> I don't see why it should be needed.
> Depending on what the Cilk+ standard allows you to do (can array.begin ()
> be evaluated multiple times or not, and if not, can you invoke say a copy
> constructor on it), you should just emit an EXPR_STMT that initializes
> an artificial scalar (say get_temp_regvar created) to
> (array.end () - array.begin ()) before you add the parallel, or,
> if array.begin () can't be evaluated multiple times, then construct some
> temporary before the parallel with array.begin () as initializer
> and then do the subtraction between array.end () and that temporary.
> 
> Then the parallel, with _Cilk_for immediately in it and just another
> temporary scalar as loop iterator there, and only inside of _Cilk_for
> declare the iterator var and construct (if it has been declared in _Cilk_for, 
> otherwise
> just initialize it) to, depending on what Cilk+ requires, either to
> array.begin () and then operator+ it to the corresponding value, or
> construct already to array.begin () + scalariv.

Actually, small correction, the iterator variable likely should be
constructed before the parallel and just marked as firstprivate on the
parallel, then you even don't need any special temporary.

So for:

_Cilk_for (vector::iterator iter = array.begin(); iter != array.end(); 
iter++)
{
  if (*iter == 6)
*iter = 13;
}

emit basically:
  {
Iterator iter;
Iterator::Iterator(&iter, array.begin());
ptrdiff_t tmpcount = Iterator::operator-(array.end(), &iter);
ptrdiff_t tmpiv;
ptrdiff_t tmplast = 0;
#pragma omp parallel firstprivate(iter, tmpcount, tmplast) private(tmpiv)
{
  _Cilk_for(tmpiv = 0; tmpiv < tmpcount; tmpiv++)
  {
Iterator::operator+(&iter, tmpiv - tmplast);
tmplast = tmpiv;
{
  if (*iter == 6)
*iter = 13;
}
  }
}
  }

Jakub


RE: Issue with _Cilk_for

2014-01-31 Thread Iyer, Balaji V


> -Original Message-
> From: Jakub Jelinek [mailto:ja...@redhat.com]
> Sent: Friday, January 31, 2014 11:04 AM
> To: Iyer, Balaji V
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: Issue with _Cilk_for
> 
> On Fri, Jan 31, 2014 at 04:42:57PM +0100, Jakub Jelinek wrote:
> > Can you explain why you emit anything in between the parallel and
> _Cilk_for?
> > I don't see why it should be needed.
> > Depending on what the Cilk+ standard allows you to do (can array.begin
> > () be evaluated multiple times or not, and if not, can you invoke say
> > a copy constructor on it), you should just emit an EXPR_STMT that
> > initializes an artificial scalar (say get_temp_regvar created) to
> > (array.end () - array.begin ()) before you add the parallel, or, if
> > array.begin () can't be evaluated multiple times, then construct some
> > temporary before the parallel with array.begin () as initializer and
> > then do the subtraction between array.end () and that temporary.
> >
> > Then the parallel, with _Cilk_for immediately in it and just another
> > temporary scalar as loop iterator there, and only inside of _Cilk_for
> > declare the iterator var and construct (if it has been declared in
> > _Cilk_for, otherwise just initialize it) to, depending on what Cilk+
> > requires, either to array.begin () and then operator+ it to the
> > corresponding value, or construct already to array.begin () + scalariv.
> 
> Actually, small correction, the iterator variable likely should be constructed
> before the parallel and just marked as firstprivate on the parallel, then you
> even don't need any special temporary.
> 
> So for:
> 
> _Cilk_for (vector::iterator iter = array.begin(); iter != array.end(); 
> iter++)
> {
>   if (*iter == 6)
> *iter = 13;
> }
> 
> emit basically:
>   {
> Iterator iter;
> Iterator::Iterator(&iter, array.begin());
> ptrdiff_t tmpcount = Iterator::operator-(array.end(), &iter);
> ptrdiff_t tmpiv;
> ptrdiff_t tmplast = 0;
> #pragma omp parallel firstprivate(iter, tmpcount, tmplast) private(tmpiv)
> {
>   _Cilk_for(tmpiv = 0; tmpiv < tmpcount; tmpiv++)
>   {
> Iterator::operator+(&iter, tmpiv - tmplast);
> tmplast = tmpiv;
>   {
>   if (*iter == 6)
>   *iter = 13;
> }
>   }
> }
>   }
> 

I tried to do this. The thing is, you had me model like this:

#pragma omp parallel for 
_Cilk_for (...)
{
Body
}

Now, the compiler will do something like this:

#pragma OMP parallel
{
#pragma Omp for 
{
_Cilk_for (...)
Body
}
}

Now, if by the time it starts to look at evaluating/breaking up the _Cilk-for's 
parts, we are already at the scope inside #pragma omp parallel. If I try to pop 
here, it has some scope issues with #pragma omp parallel. This requires a lot 
of rewrite of existing OMP code to port it for _Cilk_for. This can be done but 
will take significant time which I don't think I have since we are close to end 
of stage3.

But, if I had the parallel around the body, then the body is transplanted into 
the child function and everything I need for _Cilk-for is done. Yes, it does 
not make sense for an OMP expert. I am nowhere close to being an expert and 
even I had a hard time to wrap my head around this.

We can look into what you want, but in the meantime, can you accept this patch 
with the way I have modelled so that the feature can make it into 4.9?

Thanks,

Balaji V. Iyer.



>   Jakub


Re: Issue with _Cilk_for

2014-01-31 Thread Jakub Jelinek
On Fri, Jan 31, 2014 at 04:18:28PM +, Iyer, Balaji V wrote:
> I tried to do this. The thing is, you had me model like this:
> 
> #pragma omp parallel for 
> _Cilk_for (...)
> {
>   Body
> }
> 
> Now, the compiler will do something like this:
> 
> #pragma OMP parallel
> {
>   #pragma Omp for 
>   {
>   _Cilk_for (...)
>   Body
>   }
> }
> 
> Now, if by the time it starts to look at evaluating/breaking up the
> _Cilk-for's parts, we are already at the scope inside #pragma omp
> parallel.  If I try to pop here, it has some scope issues with #pragma omp
> parallel.  This requires a lot of rewrite of existing OMP code to port it
> for _Cilk_for.  This can be done but will take significant time which I
> don't think I have since we are close to end of stage3.

It really doesn't require lots of rewriting, after all, OpenMP handles
the C++ iterators very similarly.

> We can look into what you want, but in the meantime, can you accept this
> patch with the way I have modelled so that the feature can make it into
> 4.9?

No.  In GCC we do not rush in bad changes just because there is time
pressure.  As Cilk+ is new in GCC 4.9, if you adjust things properly in
say 2-3 weeks frame, we might still make an exception and allow it in
during stage4, otherwise it will have to wait for 5.0.

Jakub


RE: Issue with _Cilk_for

2014-01-31 Thread Iyer, Balaji V


> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Jakub Jelinek
> Sent: Friday, January 31, 2014 11:26 AM
> To: Iyer, Balaji V
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: Issue with _Cilk_for
> 
> On Fri, Jan 31, 2014 at 04:18:28PM +, Iyer, Balaji V wrote:
> > I tried to do this. The thing is, you had me model like this:
> >
> > #pragma omp parallel for
> > _Cilk_for (...)
> > {
> > Body
> > }
> >
> > Now, the compiler will do something like this:
> >
> > #pragma OMP parallel
> > {
> > #pragma Omp for
> > {
> > _Cilk_for (...)
> > Body
> > }
> > }
> >
> > Now, if by the time it starts to look at evaluating/breaking up the
> > _Cilk-for's parts, we are already at the scope inside #pragma omp
> > parallel.  If I try to pop here, it has some scope issues with #pragma
> > omp parallel.  This requires a lot of rewrite of existing OMP code to
> > port it for _Cilk_for.  This can be done but will take significant
> > time which I don't think I have since we are close to end of stage3.
> 
> It really doesn't require lots of rewriting, after all, OpenMP handles the C++
> iterators very similarly.
> 
> > We can look into what you want, but in the meantime, can you accept
> > this patch with the way I have modelled so that the feature can make
> > it into 4.9?
> 
> No.  In GCC we do not rush in bad changes just because there is time
> pressure.  As Cilk+ is new in GCC 4.9, if you adjust things properly in say 
> 2-3
> weeks frame, we might still make an exception and allow it in during stage4,
> otherwise it will have to wait for 5.0.

Hi Jakub,
I am not asking to rush anything. Personally, I won't submit anything 
if I don't think it is a correct solution, especially for something this big 
into a system that is used by a large population. On the top level, I don't 
understand why it matters where the #pragma omp parallel is placed. Can you 
please explain me that? I agree it feels weird when you think about from OMP 
standpoint, but so what? It is just used in the internal representation for 
_Cilk_for so that I can use the OMP routines to create child functions and 
transplant the correct parts of code to it. _Cilk_for behaves very differently 
compared to OMP for.  It is almost like apple and orange when it comes to 
functionality. To my best knowledge, there is no additional scoping or overhead 
instructions inserted when I put #prgma omp parallel around the body.

I am just asking this to get the whole picture. Please don't think this as an 
argument/flame.

Thanks,

Balaji V. Iyer.

> 
>   Jakub