Re: Issue with _Cilk_for
On Wed, Jan 29, 2014 at 10:58:55PM +, Iyer, Balaji V wrote: > This is the testcase I am using: > > _Cilk_for (vector::iterator iter = array.begin(); iter != array.end(); > iter++) > { >if (*iter == 6) > *iter = 13; > } > == > try > { > #pragma omp parallel schedule(cilk-for,0) > if(__gnu_cxx::operator- > ((const struct > __normal_iterator &) (const struct __normal_iterator *) &TARGET_EXPR > ::end (&array)>, (const struct __normal_iterator &) > (const struct __normal_iterator *) &iter)) > { > { > { > struct iterator iter; > difference_type D.28409; > difference_type D.28410; > > struct iterator iter; > < (void) (iter = TARGET_EXPR ::begin (&array)>) > >; > difference_type D.28409; > difference_type D.28410; >grainsize = 0 if(__gnu_cxx::operator- std::vector > ((const struct __normal_iterator &) (const struct > __normal_iterator *) &TARGET_EXPR ::end (&array)>, > (const struct __normal_iterator &) (const struct __normal_iterator *) &iter)) > { > < (void) (D.28410 = 0) > > _Cilk_for (D.28409 = 0; D.28409 != < __gnu_cxx::operator- > ((const struct > __normal_iterator &) (const struct __normal_iterator *) &TARGET_EXPR > ::end (&array)>, (const struct __normal_iterator &) > (const struct __normal_iterator *) &iter)>>; D.28409 = D.28409 + 1) Can you explain why you emit anything in between the parallel and _Cilk_for? I don't see why it should be needed. Depending on what the Cilk+ standard allows you to do (can array.begin () be evaluated multiple times or not, and if not, can you invoke say a copy constructor on it), you should just emit an EXPR_STMT that initializes an artificial scalar (say get_temp_regvar created) to (array.end () - array.begin ()) before you add the parallel, or, if array.begin () can't be evaluated multiple times, then construct some temporary before the parallel with array.begin () as initializer and then do the subtraction between array.end () and that temporary. Then the parallel, with _Cilk_for immediately in it and just another temporary scalar as loop iterator there, and only inside of _Cilk_for declare the iterator var and construct (if it has been declared in _Cilk_for, otherwise just initialize it) to, depending on what Cilk+ requires, either to array.begin () and then operator+ it to the corresponding value, or construct already to array.begin () + scalariv. Jakub
Re: Issue with _Cilk_for
On Fri, Jan 31, 2014 at 04:42:57PM +0100, Jakub Jelinek wrote: > Can you explain why you emit anything in between the parallel and _Cilk_for? > I don't see why it should be needed. > Depending on what the Cilk+ standard allows you to do (can array.begin () > be evaluated multiple times or not, and if not, can you invoke say a copy > constructor on it), you should just emit an EXPR_STMT that initializes > an artificial scalar (say get_temp_regvar created) to > (array.end () - array.begin ()) before you add the parallel, or, > if array.begin () can't be evaluated multiple times, then construct some > temporary before the parallel with array.begin () as initializer > and then do the subtraction between array.end () and that temporary. > > Then the parallel, with _Cilk_for immediately in it and just another > temporary scalar as loop iterator there, and only inside of _Cilk_for > declare the iterator var and construct (if it has been declared in _Cilk_for, > otherwise > just initialize it) to, depending on what Cilk+ requires, either to > array.begin () and then operator+ it to the corresponding value, or > construct already to array.begin () + scalariv. Actually, small correction, the iterator variable likely should be constructed before the parallel and just marked as firstprivate on the parallel, then you even don't need any special temporary. So for: _Cilk_for (vector::iterator iter = array.begin(); iter != array.end(); iter++) { if (*iter == 6) *iter = 13; } emit basically: { Iterator iter; Iterator::Iterator(&iter, array.begin()); ptrdiff_t tmpcount = Iterator::operator-(array.end(), &iter); ptrdiff_t tmpiv; ptrdiff_t tmplast = 0; #pragma omp parallel firstprivate(iter, tmpcount, tmplast) private(tmpiv) { _Cilk_for(tmpiv = 0; tmpiv < tmpcount; tmpiv++) { Iterator::operator+(&iter, tmpiv - tmplast); tmplast = tmpiv; { if (*iter == 6) *iter = 13; } } } } Jakub
RE: Issue with _Cilk_for
> -Original Message- > From: Jakub Jelinek [mailto:ja...@redhat.com] > Sent: Friday, January 31, 2014 11:04 AM > To: Iyer, Balaji V > Cc: gcc-patches@gcc.gnu.org > Subject: Re: Issue with _Cilk_for > > On Fri, Jan 31, 2014 at 04:42:57PM +0100, Jakub Jelinek wrote: > > Can you explain why you emit anything in between the parallel and > _Cilk_for? > > I don't see why it should be needed. > > Depending on what the Cilk+ standard allows you to do (can array.begin > > () be evaluated multiple times or not, and if not, can you invoke say > > a copy constructor on it), you should just emit an EXPR_STMT that > > initializes an artificial scalar (say get_temp_regvar created) to > > (array.end () - array.begin ()) before you add the parallel, or, if > > array.begin () can't be evaluated multiple times, then construct some > > temporary before the parallel with array.begin () as initializer and > > then do the subtraction between array.end () and that temporary. > > > > Then the parallel, with _Cilk_for immediately in it and just another > > temporary scalar as loop iterator there, and only inside of _Cilk_for > > declare the iterator var and construct (if it has been declared in > > _Cilk_for, otherwise just initialize it) to, depending on what Cilk+ > > requires, either to array.begin () and then operator+ it to the > > corresponding value, or construct already to array.begin () + scalariv. > > Actually, small correction, the iterator variable likely should be constructed > before the parallel and just marked as firstprivate on the parallel, then you > even don't need any special temporary. > > So for: > > _Cilk_for (vector::iterator iter = array.begin(); iter != array.end(); > iter++) > { > if (*iter == 6) > *iter = 13; > } > > emit basically: > { > Iterator iter; > Iterator::Iterator(&iter, array.begin()); > ptrdiff_t tmpcount = Iterator::operator-(array.end(), &iter); > ptrdiff_t tmpiv; > ptrdiff_t tmplast = 0; > #pragma omp parallel firstprivate(iter, tmpcount, tmplast) private(tmpiv) > { > _Cilk_for(tmpiv = 0; tmpiv < tmpcount; tmpiv++) > { > Iterator::operator+(&iter, tmpiv - tmplast); > tmplast = tmpiv; > { > if (*iter == 6) > *iter = 13; > } > } > } > } > I tried to do this. The thing is, you had me model like this: #pragma omp parallel for _Cilk_for (...) { Body } Now, the compiler will do something like this: #pragma OMP parallel { #pragma Omp for { _Cilk_for (...) Body } } Now, if by the time it starts to look at evaluating/breaking up the _Cilk-for's parts, we are already at the scope inside #pragma omp parallel. If I try to pop here, it has some scope issues with #pragma omp parallel. This requires a lot of rewrite of existing OMP code to port it for _Cilk_for. This can be done but will take significant time which I don't think I have since we are close to end of stage3. But, if I had the parallel around the body, then the body is transplanted into the child function and everything I need for _Cilk-for is done. Yes, it does not make sense for an OMP expert. I am nowhere close to being an expert and even I had a hard time to wrap my head around this. We can look into what you want, but in the meantime, can you accept this patch with the way I have modelled so that the feature can make it into 4.9? Thanks, Balaji V. Iyer. > Jakub
Re: Issue with _Cilk_for
On Fri, Jan 31, 2014 at 04:18:28PM +, Iyer, Balaji V wrote: > I tried to do this. The thing is, you had me model like this: > > #pragma omp parallel for > _Cilk_for (...) > { > Body > } > > Now, the compiler will do something like this: > > #pragma OMP parallel > { > #pragma Omp for > { > _Cilk_for (...) > Body > } > } > > Now, if by the time it starts to look at evaluating/breaking up the > _Cilk-for's parts, we are already at the scope inside #pragma omp > parallel. If I try to pop here, it has some scope issues with #pragma omp > parallel. This requires a lot of rewrite of existing OMP code to port it > for _Cilk_for. This can be done but will take significant time which I > don't think I have since we are close to end of stage3. It really doesn't require lots of rewriting, after all, OpenMP handles the C++ iterators very similarly. > We can look into what you want, but in the meantime, can you accept this > patch with the way I have modelled so that the feature can make it into > 4.9? No. In GCC we do not rush in bad changes just because there is time pressure. As Cilk+ is new in GCC 4.9, if you adjust things properly in say 2-3 weeks frame, we might still make an exception and allow it in during stage4, otherwise it will have to wait for 5.0. Jakub
RE: Issue with _Cilk_for
> -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Jakub Jelinek > Sent: Friday, January 31, 2014 11:26 AM > To: Iyer, Balaji V > Cc: gcc-patches@gcc.gnu.org > Subject: Re: Issue with _Cilk_for > > On Fri, Jan 31, 2014 at 04:18:28PM +, Iyer, Balaji V wrote: > > I tried to do this. The thing is, you had me model like this: > > > > #pragma omp parallel for > > _Cilk_for (...) > > { > > Body > > } > > > > Now, the compiler will do something like this: > > > > #pragma OMP parallel > > { > > #pragma Omp for > > { > > _Cilk_for (...) > > Body > > } > > } > > > > Now, if by the time it starts to look at evaluating/breaking up the > > _Cilk-for's parts, we are already at the scope inside #pragma omp > > parallel. If I try to pop here, it has some scope issues with #pragma > > omp parallel. This requires a lot of rewrite of existing OMP code to > > port it for _Cilk_for. This can be done but will take significant > > time which I don't think I have since we are close to end of stage3. > > It really doesn't require lots of rewriting, after all, OpenMP handles the C++ > iterators very similarly. > > > We can look into what you want, but in the meantime, can you accept > > this patch with the way I have modelled so that the feature can make > > it into 4.9? > > No. In GCC we do not rush in bad changes just because there is time > pressure. As Cilk+ is new in GCC 4.9, if you adjust things properly in say > 2-3 > weeks frame, we might still make an exception and allow it in during stage4, > otherwise it will have to wait for 5.0. Hi Jakub, I am not asking to rush anything. Personally, I won't submit anything if I don't think it is a correct solution, especially for something this big into a system that is used by a large population. On the top level, I don't understand why it matters where the #pragma omp parallel is placed. Can you please explain me that? I agree it feels weird when you think about from OMP standpoint, but so what? It is just used in the internal representation for _Cilk_for so that I can use the OMP routines to create child functions and transplant the correct parts of code to it. _Cilk_for behaves very differently compared to OMP for. It is almost like apple and orange when it comes to functionality. To my best knowledge, there is no additional scoping or overhead instructions inserted when I put #prgma omp parallel around the body. I am just asking this to get the whole picture. Please don't think this as an argument/flame. Thanks, Balaji V. Iyer. > > Jakub