FW: [PING] [PATCH] _Cilk_for for C and C++

2014-03-20 Thread Iyer, Balaji V
I mis-spelled the org as og and thus the email got bounced. So, here it is 
again.

Thanks,
Balaji V. Iyer.

 -Original Message-
 From: Iyer, Balaji V
 Sent: Thursday, March 20, 2014 4:34 PM
 To: 'Jakub Jelinek'
 Cc: gcc-patc...@gcc.gnu.og
 Subject: RE: [PING] [PATCH] _Cilk_for for C and C++
 
 
 
  -Original Message-
  From: Jakub Jelinek [mailto:ja...@redhat.com]
  Sent: Thursday, March 6, 2014 6:55 AM
  To: Iyer, Balaji V
  Cc: gcc-patc...@gcc.gnu.og
  Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
 
  On Tue, Mar 04, 2014 at 03:35:32PM +, Iyer, Balaji V wrote:
 Did you get a chance to look at this?
 
  Please look what you are emitting for cf3.cc, e.g. in *.gimple:
 
  D.2883 = Jint::begin (j);
  Iint::I (i, D.2883);
  D.2885 = Jint::end (j);
  retval.0 = operator-int (D.2885, i);
  D.2886 = retval.0 /[cl] 2;
  #pragma omp parallel firstprivate(i) if(D.2886) shared(D.2865)
  lastprivate(D.2864)
{
  const difference_type D.2866;
  long int D.2887;
  struct I  D.2888;
 
  try
{
 
  _Cilk_for (D.2864 = 0; D.2864  0; D.2864 = D.2864 +
  2) linear(D.2864:2) schedule(cilk-for grain,0)
 
  That is just plain wrong, you aren't iterating zero times, but
  D.2886 times, so there should be D.2864  retval.0.
  In *.original that is:
  cleanup_point  Unknown tree: expr_stmt
(void) (i = TARGET_EXPR D.2854,  Unknown tree: aggr_init_expr
5
__comp_ctor
D.2854
0B
(const struct I ) (const struct I *) Jint::begin ((struct J *) j) )
 ;
  #pragma omp parallel firstprivate(i) schedule(cilk-for grain,0)
  if(cleanup_point operator-int ((const struct I ) (const struct I
  *) Jint::end ((struct J *) j), (const struct I ) (const struct I *) 
  i) /[cl] 2)
{
  {
try   HERE
 
  {
  difference_type D.2864;
  difference_type D.2865;
 
  {
cleanup_point  Unknown tree: expr_stmt (void)
  (D.2865 = 0)
  
_Cilk_for (D.2864 = 0; D.2864  0; D.2864 = D.2864 +
  2) schedule(cilk- for grain,0) if(cleanup_point operator-int
  ((const struct I ) (const struct I *) Jint::end ((struct J *) j),
  (const struct I ) (const struct I *) i) /[cl] 2)
 
  As I've said several times before, don't put the operator- expression
  into if clause, instead evaluate it into a scalar temporary first
  (get_temp_regvar), before the #pragma omp parallel, and use the result
  of get_temp_regvar in the if clause (temp /[cl] 2) and in firstprivate
  clause on omp parallel and as the high bound of _Cilk_for.
 
 Hi Jakub,
   First off, I am sorry for the delay in response. I had a couple family
 issues to deal with that tied me up for the past week.
   Yes, I agree that it looks weird in the intermediate code. I tried to
 move the operator-(...) value into a variable (in the function
 handle_omp_for_class_iterator) and it was giving me an ICE during gimplify
 case. The issue was scoping. The if clause is sitting on #pragma omp parallel,
 but when I move the operator- to a variable it is inside the try block (marked
 by  HERE =). Also, this only happens when we deal with iterators
 and overloaded opeators. All the other cases the intermediate code looks
 OK. This only happens in dump of gimple and original. All the intermediate
 codes after that (= .012) looks OK. Since this is a specific case, can you
 accept this as a workaround for now?
 
 Thanks,
 
 Balaji V. Iyer.


[PING]RE: [PING] [PATCH] _Cilk_for for C and C++

2014-02-26 Thread Iyer, Balaji V
Hi Jakub,
Did you get a chance to look at this?

Thanks,

Balaji V. Iyer.

 -Original Message-
 From: Iyer, Balaji V
 Sent: Monday, February 24, 2014 6:17 PM
 To: 'Jakub Jelinek'
 Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
 'r...@redhat.com'
 Subject: RE: [PING] [PATCH] _Cilk_for for C and C++
 
 Hi Jakub,
   Please see my responses below.
 
  -Original Message-
  From: Iyer, Balaji V
  Sent: Thursday, February 20, 2014 11:38 PM
  To: Jakub Jelinek
  Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez';
  'gcc-patches@gcc.gnu.org'; 'r...@redhat.com'
  Subject: RE: [PING] [PATCH] _Cilk_for for C and C++
 
  Hi Jakub,
  I have attached the fixed patch and have answered your questions
  below.
 
   -Original Message-
   From: Jakub Jelinek [mailto:ja...@redhat.com]
   Sent: Wednesday, February 19, 2014 6:24 AM
   To: Iyer, Balaji V
   Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez';
   'gcc-patches@gcc.gnu.org'; 'r...@redhat.com'
   Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
  
   On Wed, Feb 19, 2014 at 04:43:06AM +, Iyer, Balaji V wrote:
Attached, please find a patch with the test case attached (for1.cc).
The patch is the same but the cp-changelog has been modified to
reflect the new test-case.  Is this OK to install?
  
   1) have you tested the patch at all?  I see
   FAIL: g++.dg/gomp/for-1.C -std=c++98  (test for errors, line 27)
   FAIL: g++.dg/gomp/for-1.C -std=c++98 (test for excess errors)
   FAIL: g++.dg/gomp/for-1.C -std=c++11  (test for errors, line 27)
   FAIL: g++.dg/gomp/for-1.C -std=c++11 (test for excess errors)
   FAIL: g++.dg/gomp/for-19.C -std=gnu++98 (internal compiler error)
   FAIL: g++.dg/gomp/for-19.C -std=gnu++98  (test for warnings, line
   30)
   FAIL: g++.dg/gomp/for-19.C -std=gnu++98  (test for warnings, line
   37)
   FAIL: g++.dg/gomp/for-19.C -std=gnu++98  (test for warnings, line
   40)
   FAIL: g++.dg/gomp/for-19.C -std=gnu++98 (test for excess errors)
   FAIL: g++.dg/gomp/for-19.C -std=gnu++11 (internal compiler error)
   FAIL: g++.dg/gomp/for-19.C -std=gnu++11  (test for warnings, line
   30)
   FAIL: g++.dg/gomp/for-19.C -std=gnu++11  (test for warnings, line
   37)
   FAIL: g++.dg/gomp/for-19.C -std=gnu++11  (test for warnings, line
   40)
   FAIL: g++.dg/gomp/for-19.C -std=gnu++11 (test for excess errors)
   regressions caused by the patch, that is of course unacceptable.
  
 
  Fixed. I apologize for them. I have confirmed that it is OK now.
 
 
   2) try this updated cf3.cc, e.g. with -O2 -fcilkplus if you can't
   find out why calling something multiple times is a bad idea,
   actually the latest patch is even worse than the older one, you now
   create 3 calls to the end method and 3 calls to operator-.  There
   should be just one call to that, before the #pragma omp parallel
   obviously, anything that
  doesn't do that is just bad.
   I don't see a point in having if clause on the _Cilk_for, just keep
   it on the #pragma omp parallel only, at ompexp time you can easily
   find it there, there is no point to check it again in the parallel
   body of the
  function.
  
 
  I have removed the if-clause from the _Cilk_for and now it is just in
  #pragma omp parallel.
 
  I have removed the 3rd operator-, but I am not able to remove the 2nd.
  I am looking into it, but I am not able to do it. The thing is, first
  operator- was for the if-clause, which I need to calculate the
  loop-count for the
  __cilkrts_cilk_for_64 function. The second one is not necessary
  because the end-value does not matter for _Cilk_for since they will be
  replaced with the low and high values.  I tried several things such as
  stopping gimplifcation of the cond value, or replacing it with a
  constant, etc and those are causing other problems elsewhere.
 
   The thing is, I am not able to find a way to automate this. I can't
  assume the cond's  end-value is same as count, because this is only
  true if we have an iterator and the handle_omp_for_iterator function
  modifies the cond value correctly. I need to use the count value
  (retval.0) as retval.1 but count-value is computed at a later time
  than handle_omp_for_iterator (since it does not have any knowledge
  about the #pragma omp parallel). It is giving the correct answers for
  the benchmarks and is removing the 2nd operator- when optimization is
 turned on for the inlinable operator-.
 
  Can you provide me some advice about how to do it?
 
 I have fixed the issue. Now the 2nd operator- does not occur. Attached,
 please fixed a fixed patch. Is this OK for trunk?
 
 Thanks,
 
 Balaji V. Iyer.


Re: [PING] [PATCH] _Cilk_for for C and C++

2014-02-19 Thread Jakub Jelinek
On Wed, Feb 19, 2014 at 04:43:06AM +, Iyer, Balaji V wrote:
 Attached, please find a patch with the test case attached (for1.cc). The
 patch is the same but the cp-changelog has been modified to reflect the
 new test-case.  Is this OK to install?

1) have you tested the patch at all?  I see
FAIL: g++.dg/gomp/for-1.C -std=c++98  (test for errors, line 27)
FAIL: g++.dg/gomp/for-1.C -std=c++98 (test for excess errors)
FAIL: g++.dg/gomp/for-1.C -std=c++11  (test for errors, line 27)
FAIL: g++.dg/gomp/for-1.C -std=c++11 (test for excess errors)
FAIL: g++.dg/gomp/for-19.C -std=gnu++98 (internal compiler error)
FAIL: g++.dg/gomp/for-19.C -std=gnu++98  (test for warnings, line 30)
FAIL: g++.dg/gomp/for-19.C -std=gnu++98  (test for warnings, line 37)
FAIL: g++.dg/gomp/for-19.C -std=gnu++98  (test for warnings, line 40)
FAIL: g++.dg/gomp/for-19.C -std=gnu++98 (test for excess errors)
FAIL: g++.dg/gomp/for-19.C -std=gnu++11 (internal compiler error)
FAIL: g++.dg/gomp/for-19.C -std=gnu++11  (test for warnings, line 30)
FAIL: g++.dg/gomp/for-19.C -std=gnu++11  (test for warnings, line 37)
FAIL: g++.dg/gomp/for-19.C -std=gnu++11  (test for warnings, line 40)
FAIL: g++.dg/gomp/for-19.C -std=gnu++11 (test for excess errors)
regressions caused by the patch, that is of course unacceptable.

2) try this updated cf3.cc, e.g. with -O2 -fcilkplus if you can't find out
why calling something multiple times is a bad idea, actually the latest patch
is even worse than the older one, you now create 3 calls to the end method
and 3 calls to operator-.  There should be just one call to that, before the
#pragma omp parallel obviously, anything that doesn't do that is just bad.
I don't see a point in having if clause on the _Cilk_for, just keep it on
the #pragma omp parallel only, at ompexp time you can easily find it there,
there is no point to check it again in the parallel body of the function.

typedef __PTRDIFF_TYPE__ ptrdiff_t;

template typename T
class I
{
public:
  typedef ptrdiff_t difference_type;
  I ();
  ~I ();
  I (T *);
  I (const I );
  T operator * ();
  T *operator - ();
  T operator [] (const difference_type ) const;
  I operator = (const I );
  I operator ++ ();
  I operator ++ (int);
  I operator -- ();
  I operator -- (int);
  I operator += (const difference_type );
  I operator -= (const difference_type );
  I operator + (const difference_type ) const;
  I operator - (const difference_type ) const;
  template typename S friend bool operator == (IS , IS );
  template typename S friend bool operator == (const IS , const IS );
  template typename S friend bool operator  (IS , IS );
  template typename S friend bool operator  (const IS , const IS );
  template typename S friend bool operator = (IS , IS );
  template typename S friend bool operator = (const IS , const IS );
  template typename S friend bool operator  (IS , IS );
  template typename S friend bool operator  (const IS , const IS );
  template typename S friend bool operator = (IS , IS );
  template typename S friend bool operator = (const IS , const IS );
  template typename S friend typename IS::difference_type operator - (IS 
, IS );
  template typename S friend typename IS::difference_type operator - (const 
IS , const IS );
  template typename S friend IS operator + (typename IS::difference_type 
, const IS );
private:
  T *p;
};
template typename T IT::I () : p (0) {}
template typename T IT::~I () {}
template typename T IT::I (T *x) : p (x) {}
template typename T IT::I (const I x) : p (x.p) {}
template typename T T IT::operator * () { return *p; }
template typename T T *IT::operator - () { return p; }
template typename T T IT::operator [] (const difference_type x) const { 
return p[x]; }
template typename T IT IT::operator = (const I x) { p = x.p; return 
*this; }
template typename T IT IT::operator ++ () { ++p; return *this; }
template typename T IT IT::operator ++ (int) { return I (p++); }
template typename T IT IT::operator -- () { --p; return *this; }
template typename T IT IT::operator -- (int) { return I (p--); }
template typename T IT IT::operator += (const difference_type x) { p += 
x; return *this; }
template typename T IT IT::operator -= (const difference_type x) { p -= 
x; return *this; }
template typename T IT IT::operator + (const difference_type x) const { 
return I (p + x); }
template typename T __attribute__((noinline)) IT IT::operator - (const 
difference_type x) const { __asm (); return I (p - x); }
template typename T bool operator == (IT x, IT y) { return x.p == y.p; }
template typename T bool operator == (const IT x, const IT y) { return 
x.p == y.p; }
template typename T bool operator != (IT x, IT y) { return !(x == y); }
template typename T bool operator != (const IT x, const IT y) { return 
!(x == y); }
template typename T bool operator  (IT x, IT y) { return x.p  y.p; }
template typename T bool operator  (const IT x, const IT y) { return 
x.p  y.p; }
template typename T bool operator = (IT x, IT y) { return x.p = y.p; }

[PING] RE: [PING] [PATCH] _Cilk_for for C and C++

2014-02-12 Thread Iyer, Balaji V
Hi Jakub,
Did you get a chance to look at this patch?

Thanks,

Balaji V. Iyer.

 -Original Message-
 From: Iyer, Balaji V
 Sent: Monday, February 10, 2014 5:07 PM
 To: Jakub Jelinek
 Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
 'r...@redhat.com'
 Subject: RE: [PING] [PATCH] _Cilk_for for C and C++
 
 Hi Jakub,
 
  -Original Message-
  From: Jakub Jelinek [mailto:ja...@redhat.com]
  Sent: Monday, February 10, 2014 12:58 PM
  To: Iyer, Balaji V
  Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez';
  'gcc-patches@gcc.gnu.org'; 'r...@redhat.com'
  Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
 
  On Fri, Feb 07, 2014 at 10:14:21PM +, Iyer, Balaji V wrote:
 Attached, please find a fixed patch. Along with it, I have also
   added
   2 changelog files for C and C++ respectively.
 
  Have you even looked at the second testcase I've posted?
  gimplification ICEs on it with your latest patch, because firstprivate
  clause is added for the same variable multiple times, and it seems
  parallel still isn't around _Cilk_for.
 
 I looked at both but forgot to test them with my implementation. Sorry
 about this. I have fixed the ICE issue. To make sure this does not happen
 further, I have added your test cf3.C into test suite (renamed to cf3.cc). I
 hope that is OK with you.
 
 I have attached a fixed patch and Changelogs. Is this OK?
 
 Thanks,
 
 Balaji V. Iyer.
 
 
  Jakub


Re: [PING] [PATCH] _Cilk_for for C and C++

2014-02-12 Thread Jakub Jelinek
On Mon, Feb 10, 2014 at 10:07:18PM +, Iyer, Balaji V wrote:
 I looked at both but forgot to test them with my implementation. Sorry
 about this.  I have fixed the ICE issue.  To make sure this does not
 happen further, I have added your test cf3.C into test suite (renamed to
 cf3.cc).  I hope that is OK with you.

The testcase is GPL as the original libgomp.c++/for-1.C testcase, so sure.
Perhaps it would be much better though if instead of having a compile time
testcase you'd just do what libgomp.c++/for-1.C does, just replace all the
#pragma omp parallel for in there with _Cilk_for and turn it into a runtime
testcase.

 I have attached a fixed patch and Changelogs. Is this OK?

Looks better (note, still looking just at the dumps), but not completely ok
yet.  On cf3.cc, I see in *.gimple:

D.2883 = Jint::begin (j);
Iint::I (i, D.2883);
D.2885 = Jint::end (j);
retval.0 = operator-int (D.2885, i);
D.2886 = retval.0 / 2;
#pragma omp parallel firstprivate(i) if(D.2886) shared(D.2865) shared(j)
  {
difference_type retval.1;
const struct I  D.2888;
const difference_type D.2866;
long int D.2889;
struct I  D.2890;

try
  {

_Cilk_for (D.2864 = 0; D.2864  retval.1; D.2864 = D.2864 + 2) 
private(D.2864)
  {
D.2889 = D.2864 - D.2865;
D.2866 = D.2889;
try
  {
D.2890 = Iint::operator+= (i, D.2866);

First a minor nit, there is extra newline before _Cilk_for:
  newline_and_indent (buffer, spc);
  if (flag_cilkplus
   gimple_omp_for_kind (gs) == GF_OMP_FOR_KIND_CILKFOR)
pp_string (buffer, _Cilk_for ();
  else
pp_string (buffer, for ();
I guess for _Cilk_for collapse is never  1, right?  If that is the case,
then perhaps you should move the newline_and_indent (buffer, spc); call
into the else block.

More importantly, what is retval.1?  I'd expect you should be using retval.0
there and have it also as firstprivate(retval.0) on the parallel.
In *.omplower dump I actually see:
retval.0 = operator-int (D.2885, i);
...
retval.1 = operator-int (D.2888, i);
i.e. operator-int is called twice.

Jakub


RE: [PING] [PATCH] _Cilk_for for C and C++

2014-02-12 Thread Iyer, Balaji V


 -Original Message-
 From: Jakub Jelinek [mailto:ja...@redhat.com]
 Sent: Wednesday, February 12, 2014 9:59 AM
 To: Iyer, Balaji V
 Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
 'r...@redhat.com'
 Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
 
 On Mon, Feb 10, 2014 at 10:07:18PM +, Iyer, Balaji V wrote:
  I looked at both but forgot to test them with my implementation. Sorry
  about this.  I have fixed the ICE issue.  To make sure this does not
  happen further, I have added your test cf3.C into test suite (renamed
  to cf3.cc).  I hope that is OK with you.
 
 The testcase is GPL as the original libgomp.c++/for-1.C testcase, so sure.
 Perhaps it would be much better though if instead of having a compile time
 testcase you'd just do what libgomp.c++/for-1.C does, just replace all the
 #pragma omp parallel for in there with _Cilk_for and turn it into a runtime
 testcase.
 
I really don't want to do that because I don't think there is a 1:1 match-up 
between the rules of #pragma omp for and _Cilk_for. 

  I have attached a fixed patch and Changelogs. Is this OK?
 
 Looks better (note, still looking just at the dumps), but not completely ok
 yet.  On cf3.cc, I see in *.gimple:
 
 D.2883 = Jint::begin (j);
 Iint::I (i, D.2883);
 D.2885 = Jint::end (j);
 retval.0 = operator-int (D.2885, i);
 D.2886 = retval.0 / 2;
 #pragma omp parallel firstprivate(i) if(D.2886) shared(D.2865) 
 shared(j)
   {
 difference_type retval.1;
 const struct I  D.2888;
 const difference_type D.2866;
 long int D.2889;
 struct I  D.2890;
 
 try
   {
 
 _Cilk_for (D.2864 = 0; D.2864  retval.1; D.2864 = D.2864 + 2)
 private(D.2864)
   {
 D.2889 = D.2864 - D.2865;
 D.2866 = D.2889;
 try
   {
 D.2890 = Iint::operator+= (i, D.2866);
 
 First a minor nit, there is extra newline before _Cilk_for:
   newline_and_indent (buffer, spc);
   if (flag_cilkplus
gimple_omp_for_kind (gs) == GF_OMP_FOR_KIND_CILKFOR)
 pp_string (buffer, _Cilk_for ();
   else
 pp_string (buffer, for (); I guess for _Cilk_for collapse is 
 never  1,
 right?  If that is the case, then perhaps you should move the
 newline_and_indent (buffer, spc); call into the else block.
 

OK. I will fix this and send you a patch? 

 More importantly, what is retval.1?  I'd expect you should be using retval.0
 there and have it also as firstprivate(retval.0) on the parallel.
 In *.omplower dump I actually see:
 retval.0 = operator-int (D.2885, i); ...
 retval.1 = operator-int (D.2888, i); i.e. 
 operator-int is
 called twice.
 

Yes, one is for the if-clause and one is for the condition. It really doesn't 
matter because we get of the stuff in the condition and replace with our own 
for loop with something like the for-loop shown  below. So retval1 doesn't come 
into picture. It is only alive from parser to the expand_cilk_for function.

For (i = low; i  high; i++)
{
_Cilk_for_body
}

So, is there any other changes that you need me to make?

Thanks,

Balaji V. Iyer.


   Jakub


Re: [PING] [PATCH] _Cilk_for for C and C++

2014-02-12 Thread Jakub Jelinek
On Wed, Feb 12, 2014 at 03:14:23PM +, Iyer, Balaji V wrote:
  The testcase is GPL as the original libgomp.c++/for-1.C testcase, so sure.
  Perhaps it would be much better though if instead of having a compile time
  testcase you'd just do what libgomp.c++/for-1.C does, just replace all the
  #pragma omp parallel for in there with _Cilk_for and turn it into a runtime
  testcase.
  

 I really don't want to do that because I don't think there is a 1:1
 match-up between the rules of #pragma omp for and _Cilk_for.

But there is nothing OpenMP specific on any of the tests, all could as well
be tested by removing all the #pragma omp ... lines and just be tested as
normal C+++ loops.  Is there anything that _Cilk_for wouldn't handle?

IMNSHO if you remove all the #pragma omp parallel for lines (even with any
clauses it sometimes has) and replace for with _Cilk_for on the following
line, you should have a valid Cilk+ program.

Only f11 would need to be changed from:
#pragma omp parallel
  {
#pragma omp for nowait
for (T i = x; i = y; i += 3)
  baz (i);
#pragma omp single
{
  T j = y + 3;
  baz (j);
}
  }
to say:
  _Cilk_for (T i = x; i = y; i += 3)
baz (i);
  {
T j = y + 3;
baz (j);
  }
so that it performs the same thing.

  First a minor nit, there is extra newline before _Cilk_for:
newline_and_indent (buffer, spc);
if (flag_cilkplus
 gimple_omp_for_kind (gs) == GF_OMP_FOR_KIND_CILKFOR)
  pp_string (buffer, _Cilk_for ();
else
  pp_string (buffer, for (); I guess for _Cilk_for collapse is 
  never  1,
  right?  If that is the case, then perhaps you should move the
  newline_and_indent (buffer, spc); call into the else block.
  
 
 OK. I will fix this and send you a patch? 

Sure.

  More importantly, what is retval.1?  I'd expect you should be using retval.0
  there and have it also as firstprivate(retval.0) on the parallel.
  In *.omplower dump I actually see:
  retval.0 = operator-int (D.2885, i); ...
  retval.1 = operator-int (D.2888, i); i.e. 
  operator-int is
  called twice.
  
 
 Yes, one is for the if-clause and one is for the condition. It really doesn't 
 matter because we get of the stuff in the condition and replace with our own 
 for loop with something like the for-loop shown  below. So retval1 doesn't 
 come into picture. It is only alive from parser to the expand_cilk_for 
 function.
 
 For (i = low; i  high; i++)
 {
   _Cilk_for_body
 }

No, it really does matter.  Just look at the *.optimized dump with -O0 
-fcilkplus:

  retval.0_4 = operator-int (_3, i);
in _Z3foo1JIiE function and
  _4 = .omp_data_i_2(D)-j;
  _5 = Jint::end (_4);
  retval.1_6 = operator-int (_5, i);
  retval.3_7 = retval.1_6;
in _Z3foo1JIiE._cilk_for_fn.0.  All the 4 statements are dead, you really
shouldn't emit them, even when optimizing, if e.g. the operator- isn't
inline, g++ won't be able to optimize it away.

Jakub


RE: [PING] [PATCH] _Cilk_for for C and C++

2014-02-12 Thread Iyer, Balaji V
   More importantly, what is retval.1?  I'd expect you should be using
   retval.0 there and have it also as firstprivate(retval.0) on the parallel.
   In *.omplower dump I actually see:
   retval.0 = operator-int (D.2885, i); ...
   retval.1 = operator-int (D.2888, i);
   i.e. operator-int is called twice.
  
 
  Yes, one is for the if-clause and one is for the condition. It really 
  doesn't
 matter because we get of the stuff in the condition and replace with our own
 for loop with something like the for-loop shown  below. So retval1 doesn't
 come into picture. It is only alive from parser to the expand_cilk_for 
 function.
 
  For (i = low; i  high; i++)
  {
  _Cilk_for_body
  }
 
 No, it really does matter.  Just look at the *.optimized dump with -O0 -
 fcilkplus:
 
   retval.0_4 = operator-int (_3, i);
 in _Z3foo1JIiE function and
   _4 = .omp_data_i_2(D)-j;
   _5 = Jint::end (_4);
   retval.1_6 = operator-int (_5, i);
   retval.3_7 = retval.1_6;
 in _Z3foo1JIiE._cilk_for_fn.0.  All the 4 statements are dead, you really
 shouldn't emit them, even when optimizing, if e.g. the operator- isn't inline,
 g++ won't be able to optimize it away.
 

I looked at the test code you send me (cf3.cc) at -O1 and it is removing all 
the lines you have shown above. Yes, I would imagine -O0 to have code that can 
be redundant or unnecessary. Some of it could be the artifact of internal code 
insertion. But isn't the main job of the instruction scheduler to remove all 
these redundant work? Besides, it is just a function call. The compiler at -O2, 
-O and -O3 removes the chunk of code that you mentioned.

-Balaji V. Iyer.


   Jakub


Re: [PING] [PATCH] _Cilk_for for C and C++

2014-02-12 Thread Jakub Jelinek
On Wed, Feb 12, 2014 at 05:04:38PM +, Iyer, Balaji V wrote:
 I looked at the test code you send me (cf3.cc) at -O1 and it is removing
 all the lines you have shown above.  Yes, I would imagine -O0 to have code
 that can be redundant or unnecessary.  Some of it could be the artifact of
 internal code insertion.  But isn't the main job of the instruction
 scheduler to remove all these redundant work?  Besides, it is just a
 function call.  The compiler at -O2, -O and -O3 removes the chunk of code
 that you mentioned.

As I said, just change the testcase so that the operator isn't inline, and
suddenly even -O3 will not be able to remove the call.

Jakub


RE: [PING] [PATCH] _Cilk_for for C and C++

2014-02-12 Thread Iyer, Balaji V


 -Original Message-
 From: Jakub Jelinek [mailto:ja...@redhat.com]
 Sent: Wednesday, February 12, 2014 12:10 PM
 To: Iyer, Balaji V
 Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
 'r...@redhat.com'
 Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
 
 On Wed, Feb 12, 2014 at 05:04:38PM +, Iyer, Balaji V wrote:
  I looked at the test code you send me (cf3.cc) at -O1 and it is
  removing all the lines you have shown above.  Yes, I would imagine -O0
  to have code that can be redundant or unnecessary.  Some of it could
  be the artifact of internal code insertion.  But isn't the main job of
  the instruction scheduler to remove all these redundant work?
  Besides, it is just a function call.  The compiler at -O2, -O and -O3
  removes the chunk of code that you mentioned.
 
 As I said, just change the testcase so that the operator isn't inline, and
 suddenly even -O3 will not be able to remove the call.

I am sorry, I do not see any operators being asked to inline explicitly..

class I
{
public:
  typedef ptrdiff_t difference_type;
  I ();
  ~I ();
  I (T *);
  I (const I );
  T operator * ();
  T *operator - ();
  T operator [] (const difference_type ) const;
  I operator = (const I );
  I operator ++ ();
  I operator ++ (int);
  I operator -- ();
  I operator -- (int);
  I operator += (const difference_type );
  I operator -= (const difference_type );
  I operator + (const difference_type ) const;
  I operator - (const difference_type ) const;
  template typename S friend bool operator == (IS , IS );
  template typename S friend bool operator == (const IS , const IS );
  template typename S friend bool operator  (IS , IS );
  template typename S friend bool operator  (const IS , const IS );
  template typename S friend bool operator = (IS , IS );
  template typename S friend bool operator = (const IS , const IS );
  template typename S friend bool operator  (IS , IS );
  template typename S friend bool operator  (const IS , const IS );
  template typename S friend bool operator = (IS , IS );
  template typename S friend bool operator = (const IS , const IS );
  template typename S friend typename IS::difference_type operator - (IS 
, IS );
  template typename S friend typename IS::difference_type operator - (const 
IS , const IS );
  template typename S friend IS operator + (typename IS::difference_type 
, const IS );
private:
  T *p;
};


 
   Jakub


Re: [PING] [PATCH] _Cilk_for for C and C++

2014-02-10 Thread Jakub Jelinek
On Fri, Feb 07, 2014 at 10:14:21PM +, Iyer, Balaji V wrote:
   Attached, please find a fixed patch. Along with it, I have also
 added 2 changelog files for C and C++ respectively.

Have you even looked at the second testcase I've posted?
gimplification ICEs on it with your latest patch, because firstprivate
clause is added for the same variable multiple times, and it seems parallel
still isn't around _Cilk_for.

Jakub


Re: [PING] [PATCH] _Cilk_for for C and C++

2014-02-07 Thread Jakub Jelinek
On Wed, Feb 05, 2014 at 05:27:26AM +, Iyer, Balaji V wrote:
   Attached, please find a fixed patch (diff.txt) that will do as you 
 requested (model _Cilk_for like a #pragma omp parallel for). Along with this, 
 I have also attached two Changelog entries (1 for C and 1 for C++).
   It passes all the tests on my x86_64 box (both 32 and 64 bit modes) and 
 does not affect any other tests in the testsuite.
   Is this Ok for trunk?

A step in the right direction, but I still see issues just from looking at
the *.gimple dump:

For the first testcase, I see:
iter = std::vectorint::begin (array); [return slot optimization]
iter.1 = iter;
D.13615 = std::vectorint::end (array); [return slot optimization]
try
  {
retval.0 = __gnu_cxx::operator-int*, std::vectorint  
(D.13615, iter);
  }
finally
  {
D.13615 = {CLOBBER};
  }
#pragma omp parallel schedule(cilk-for grain,0) if(retval.0)
#shared(iter.1) shared(D.13632) shared(D.13615) shared(iter)
  {
difference_type retval.2;
const difference_type D.13633;
int D.13725;
struct __normal_iterator  D.13726;
bool retval.3;
int  D.13728;
int D.13729;
int  D.13732;

iter = iter.1;
 private(D.13631)
_Cilk_for (D.13631 = 0; D.13631 != retval.2; D.13631 = D.13631 
+ 1)
  D.13725 = D.13631 - D.13632;

So, the issues I see:
1) what is iter.1, why do you have it at all, and, after all, the iterator
is a class that needs to be constructed/destructed in the general way, so
creating any further copies of something is both costly and undesirable

2) the schedule clause doesn't belong on the omp parallel, but on the _Cilk_for

3) iter should be firstprivate, and there should be no explicit private var
with assignment during gimplification, just handle it like any other
firstprivate during omp lowering

4) the printing looks weird for _Cilk_for, as I said earlier, the clauses
should probably be printed after the closing ) of _Cilk_for rather than
after nothing on the previous line; also, there is no {} printed around the
_Cilk_for body and the next line is weirdly indented

But more importantly, if I create some testcase with a generic C++
conforming iterator (copied over from
libgomp/testsuite/libgomp.c++/for-1.C), as in the second testcase, the
*.gimple dump shows that _Cilk_for is still around the #pragma omp parallel.
The intent of the second testcase is that you can really eyeball all the
ctors/dtors/copy ctors etc. that should happen, and for -O0 shouldn't be
really inlined.

Jakub
#include vector

void
foo (std::vectorint array)
{
  _Cilk_for (std::vectorint::iterator iter = array.begin(); iter != 
array.end(); iter++)
  {
if (*iter  == 6)
  *iter = 13;
  }
}
typedef __PTRDIFF_TYPE__ ptrdiff_t;

template typename T
class I
{
public:
  typedef ptrdiff_t difference_type;
  I ();
  ~I ();
  I (T *);
  I (const I );
  T operator * ();
  T *operator - ();
  T operator [] (const difference_type ) const;
  I operator = (const I );
  I operator ++ ();
  I operator ++ (int);
  I operator -- ();
  I operator -- (int);
  I operator += (const difference_type );
  I operator -= (const difference_type );
  I operator + (const difference_type ) const;
  I operator - (const difference_type ) const;
  template typename S friend bool operator == (IS , IS );
  template typename S friend bool operator == (const IS , const IS );
  template typename S friend bool operator  (IS , IS );
  template typename S friend bool operator  (const IS , const IS );
  template typename S friend bool operator = (IS , IS );
  template typename S friend bool operator = (const IS , const IS );
  template typename S friend bool operator  (IS , IS );
  template typename S friend bool operator  (const IS , const IS );
  template typename S friend bool operator = (IS , IS );
  template typename S friend bool operator = (const IS , const IS );
  template typename S friend typename IS::difference_type operator - (IS 
, IS );
  template typename S friend typename IS::difference_type operator - (const 
IS , const IS );
  template typename S friend IS operator + (typename IS::difference_type 
, const IS );
private:
  T *p;
};
template typename T IT::I () : p (0) {}
template typename T IT::~I () {}
template typename T IT::I (T *x) : p (x) {}
template typename T IT::I (const I x) : p (x.p) {}
template typename T T IT::operator * () { return *p; }
template typename T T *IT::operator - () { return p; }
template typename T T IT::operator [] (const difference_type x) const { 
return p[x]; }
template typename T IT IT::operator = (const I x) { p = x.p; return 
*this; }
template typename T IT IT::operator ++ () { ++p; return 

RE: [PING] [PATCH] _Cilk_for for C and C++

2014-02-07 Thread Iyer, Balaji V


 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Jakub Jelinek
 Sent: Friday, February 7, 2014 9:03 AM
 To: Iyer, Balaji V
 Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
 'r...@redhat.com'
 Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
 
 On Wed, Feb 05, 2014 at 05:27:26AM +, Iyer, Balaji V wrote:
  Attached, please find a fixed patch (diff.txt) that will do as you
 requested (model _Cilk_for like a #pragma omp parallel for). Along with this,
 I have also attached two Changelog entries (1 for C and 1 for C++).
  It passes all the tests on my x86_64 box (both 32 and 64 bit modes)
 and does not affect any other tests in the testsuite.
  Is this Ok for trunk?
 
 A step in the right direction, but I still see issues just from looking at the
 *.gimple dump:
 
 For the first testcase, I see:
 iter = std::vectorint::begin (array); [return slot optimization]
 iter.1 = iter;
 D.13615 = std::vectorint::end (array); [return slot 
 optimization]
 try
   {
 retval.0 = __gnu_cxx::operator-int*, std::vectorint  
 (D.13615,
 iter);
   }
 finally
   {
 D.13615 = {CLOBBER};
   }
 #pragma omp parallel schedule(cilk-for grain,0) if(retval.0)
 #shared(iter.1) shared(D.13632) shared(D.13615) shared(iter)
   {
 difference_type retval.2;
 const difference_type D.13633;
 int D.13725;
 struct __normal_iterator  D.13726;
 bool retval.3;
 int  D.13728;
 int D.13729;
 int  D.13732;
 
 iter = iter.1;
  private(D.13631)
 _Cilk_for (D.13631 = 0; D.13631 != retval.2; D.13631 = 
 D.13631 + 1)
   D.13725 = D.13631 - D.13632;
 
 So, the issues I see:
 1) what is iter.1, why do you have it at all, and, after all, the iterator is 
 a class
 that needs to be constructed/destructed in the general way, so creating any
 further copies of something is both costly and undesirable
 

Well, to get the loop count, I need to calculate it using operator-(array.end 
(), iter).

Now, if I do that iter is already set. I need to reset iter back to the 
original one (array.begin ()) in the child function. This is why I used a 
temporary variable called iter1.



 2) the schedule clause doesn't belong on the omp parallel, but on the
 _Cilk_for
 

What if grain is a variable say x? If I have it in the _Cilk_for, then won't 
it create omp_data_i-x. That is not correct. It should just emit x. But let 
me look into this to make sure...

 3) iter should be firstprivate, and there should be no explicit private var 
 with
 assignment during gimplification, just handle it like any other firstprivate
 during omp lowering
 

Do you mean to say I should manually insert a firstprivate for iter and not the 
system figure out that it is shared? 


 4) the printing looks weird for _Cilk_for, as I said earlier, the clauses 
 should
 probably be printed after the closing ) of _Cilk_for rather than after nothing
 on the previous line; also, there is no {} printed around the _Cilk_for body
 and the next line is weirdly indented
 

Ok will look into this.

 But more importantly, if I create some testcase with a generic C++
 conforming iterator (copied over from libgomp/testsuite/libgomp.c++/for-
 1.C), as in the second testcase, the *.gimple dump shows that _Cilk_for is 
 still
 around the #pragma omp parallel.
 The intent of the second testcase is that you can really eyeball all the
 ctors/dtors/copy ctors etc. that should happen, and for -O0 shouldn't be
 really inlined.
 
   Jakub


Re: [PING] [PATCH] _Cilk_for for C and C++

2014-02-07 Thread Jakub Jelinek
On Fri, Feb 07, 2014 at 02:33:41PM +, Iyer, Balaji V wrote:
  So, the issues I see:
  1) what is iter.1, why do you have it at all, and, after all, the iterator 
  is a class
  that needs to be constructed/destructed in the general way, so creating any
  further copies of something is both costly and undesirable
  
 
 Well, to get the loop count, I need to calculate it using operator-(array.end 
 (), iter).
 
 Now, if I do that iter is already set. I need to reset iter back to the
 original one (array.begin ()) in the child function.  This is why I used a
 temporary variable called iter1.

operator- shouldn't really change iter, if it does, it is purely the user's
fault, isn't it?  It isn't operator -=, so it shouldn't really change
array.end () either.

  2) the schedule clause doesn't belong on the omp parallel, but on the
  _Cilk_for
  
 
 What if grain is a variable say x? If I have it in the _Cilk_for, then
 won't it create omp_data_i-x.  That is not correct.  It should just emit
 x. But let me look into this to make sure...

You certainly should gimplify the clause operand before the omp parallel, it
must be an integral anyway, right?  So just use get_temp_regvar?
Then simply use firstprivate on the #pragma omp parallel.  When you actually
omp expand, you'll still be able to find the original variable and look it
up on the parallel.  But, if you can't make it work, guess I could live with
the clause on the parallel.

  3) iter should be firstprivate, and there should be no explicit private var 
  with
  assignment during gimplification, just handle it like any other firstprivate
  during omp lowering
  
 
 Do you mean to say I should manually insert a firstprivate for iter and
 not the system figure out that it is shared?

Yes.  The class iterator is quite special thing, because already the C++ FE
lowers it to an integral iterator instead.  And when you make it
firstprivate, omp lowering/expansion should take care of running the copy
constructor/destructor in the parallel for you.

Jakub


RE: [PING] [PATCH] _Cilk_for for C and C++

2014-01-31 Thread Iyer, Balaji V
Hello Everyone,
Did anyone get a chance to look at this patch (link to patches: 
http://gcc.gnu.org/ml/gcc-patches/2014-01/msg01612.html)? I tried to do as 
Jakub mentioned but it hits a road-block when it comes to iterators due to 
variable scoping issues.
This patch does not disrupt any other parts of the code and all the 
code here are wrapped inside a check for for Cilk Plus enabling. It also passes 
all the tests and does not disrupt any existing failing or passing ones for 
both 32 and 64 bit modes in my x86_64 machine. 

It is the last feature to make Cilk Plus feature-complete. Is the patch OK for 
trunk?

Thanks,

Balaji V. Iyer.

 -Original Message-
 From: Iyer, Balaji V
 Sent: Wednesday, January 29, 2014 10:54 AM
 To: Jakub Jelinek
 Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
 'r...@redhat.com'
 Subject: RE: [PING] [PATCH] _Cilk_for for C and C++
 
 Hi Jakub,
 
  -Original Message-
  From: Jakub Jelinek [mailto:ja...@redhat.com]
  Sent: Wednesday, January 29, 2014 6:31 AM
  To: Iyer, Balaji V
  Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez';
  'gcc-patches@gcc.gnu.org'; 'r...@redhat.com'
  Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
 
  On Tue, Jan 28, 2014 at 04:55:38PM +, Iyer, Balaji V wrote:
 I thought about it a bit more, and the main issue here is that we
   need access to the _Cilk_for loop's components both inside the child
   function and the parent function.
 
  I guess for the C++ iterators, if in the _Cilk_for model you need to
  provide number of iterations before parallelization, it really depends
  on what the standard allows and what you want to do exactly.
 
 Yes, I need the value before the parallelization context hits. This is why in 
 my
 last patch I had the parallel around the body and omp for around the _Cilk-
 for statement.
 
 
  If you need to provide the iteration count before spawning the threads
  and the standard allows you that, then just lower it in the C++ FE
  already so that you do:
vectorint::iterator temp = array.begin ();
sizetype tempcount = (array.end () - temp); before the parallel, and then
#pragma omp parallel firstprivate(temp, tempcount)
  _Cilk_for (sizetype temp2 = 0; temp2  tempcount; temp2++)
{
  vectorint::iterator ii = temp + temp2;
  body
}
 
 This is kind of what I did (atlest tried to accomplish what you mentioned
 above). I can look into doing this, but is it possible for you to accept the 
 patch
 as-is and we will look into fixing it in the future?
 
 Thanks,
 
 Balaji V. Iyer.
 
  or similar.  The C++ FE needs to lower the C++ iterators anyway, the
  middle- end can really only work with integral or pointer iterators,
  and it depends on how exactly the Cilk+ standard defines _Cilk_for
  with iterators (what methods must be implemented on the iterators and
  what methods and in what order should be called).
 
  Jakub


Re: [PING] [PATCH] _Cilk_for for C and C++

2014-01-29 Thread Jakub Jelinek
On Tue, Jan 28, 2014 at 04:55:38PM +, Iyer, Balaji V wrote:
   I thought about it a bit more, and the main issue here is that we
 need access to the _Cilk_for loop's components both inside the child
 function and the parent function.

I guess for the C++ iterators, if in the _Cilk_for model you need to provide
number of iterations before parallelization, it really depends on what the
standard allows and what you want to do exactly.
If you need to provide the iteration count before spawning the threads and
the standard allows you that, then just lower it in the C++ FE already
so that you do:
  vectorint::iterator temp = array.begin ();
  sizetype tempcount = (array.end () - temp);
before the parallel, and then
  #pragma omp parallel firstprivate(temp, tempcount)
_Cilk_for (sizetype temp2 = 0; temp2  tempcount; temp2++)
  {
vectorint::iterator ii = temp + temp2;
body
  }
or similar.  The C++ FE needs to lower the C++ iterators anyway, the
middle-end can really only work with integral or pointer iterators, and it
depends on how exactly the Cilk+ standard defines _Cilk_for with iterators
(what methods must be implemented on the iterators and what methods and in
what order should be called).

Jakub


RE: [PING] [PATCH] _Cilk_for for C and C++

2014-01-29 Thread Iyer, Balaji V
Hi Jakub,

 -Original Message-
 From: Jakub Jelinek [mailto:ja...@redhat.com]
 Sent: Wednesday, January 29, 2014 6:31 AM
 To: Iyer, Balaji V
 Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
 'r...@redhat.com'
 Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
 
 On Tue, Jan 28, 2014 at 04:55:38PM +, Iyer, Balaji V wrote:
  I thought about it a bit more, and the main issue here is that we
  need access to the _Cilk_for loop's components both inside the child
  function and the parent function.
 
 I guess for the C++ iterators, if in the _Cilk_for model you need to provide
 number of iterations before parallelization, it really depends on what the
 standard allows and what you want to do exactly.

Yes, I need the value before the parallelization context hits. This is why in 
my last patch I had the parallel around the body and omp for around the 
_Cilk-for statement. 


 If you need to provide the iteration count before spawning the threads and
 the standard allows you that, then just lower it in the C++ FE already so that
 you do:
   vectorint::iterator temp = array.begin ();
   sizetype tempcount = (array.end () - temp); before the parallel, and then
   #pragma omp parallel firstprivate(temp, tempcount)
 _Cilk_for (sizetype temp2 = 0; temp2  tempcount; temp2++)
   {
 vectorint::iterator ii = temp + temp2;
 body
   }

This is kind of what I did (atlest tried to accomplish what you mentioned 
above). I can look into doing this, but is it possible for you to accept the 
patch as-is and we will look into fixing it in the future?

Thanks,

Balaji V. Iyer.

 or similar.  The C++ FE needs to lower the C++ iterators anyway, the middle-
 end can really only work with integral or pointer iterators, and it depends on
 how exactly the Cilk+ standard defines _Cilk_for with iterators (what
 methods must be implemented on the iterators and what methods and in
 what order should be called).
 
   Jakub


RE: [PING] [PATCH] _Cilk_for for C and C++

2014-01-28 Thread Iyer, Balaji V


 -Original Message-
 From: Iyer, Balaji V
 Sent: Monday, January 27, 2014 4:36 PM
 To: Jakub Jelinek
 Cc: Jason Merrill; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
 'r...@redhat.com'
 Subject: RE: [PING] [PATCH] _Cilk_for for C and C++
 
  -Original Message-
  From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
  ow...@gcc.gnu.org] On Behalf Of Jakub Jelinek
  Sent: Monday, January 27, 2014 3:50 PM
  To: Iyer, Balaji V
  Cc: Jason Merrill; 'Jeff Law'; 'Aldy Hernandez';
  'gcc-patches@gcc.gnu.org'; 'r...@redhat.com'
  Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
 
  On Mon, Jan 27, 2014 at 08:41:14PM +, Iyer, Balaji V wrote:
 Did you get a chance to look at this _Cilk_for patch?
 
  IMHO it is not as much work as you are fearing, at most a few hours of
  work to get it right, and well worth doing.  So, please at least try
  it out and if you get stuck with it, explain why.
 
 Hi Jakub,
   I tried it that way in the original patch submission for C
 (http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01369.html), but it hit a
 dead-end when I was trying to get STL iterators working for C++. This is why I
 re-structured things this way to get them both working.
 
 Thanks,
 
 Balaji V. Iyer.
 

Hi Jakub,
I thought about it a bit more, and the main issue here is that we need 
access to the _Cilk_for loop's components both inside the child function and 
the parent function.

So, at the moment, I have modelled the _Cilk_for as something like this:

#pragma omp for  schedule (runtime: grain)
_Cilk-for (vectorint::iterator ii = array.begin (); ii != array.end (); ii++)
#pragma omp parallel 
{
body
}

From what I understand, you feel this is a bit ugly and you want this to be 
modelled something like this?

#pragma omp parallel for schedule (runtime: grain)
_Cilk_for (vectorint::iterator ii = array.begin (); ii != array.end(); ii++)
{
body
}

Am I right?

As it stands, doing it the way you suggested did not work when we have 
iterators since iterator expansion pushed inside the child function and its 
expanded variables are not accessible outside the child function by 
gimplify_omp_for. That is, the expansion is put after #pragma omp parallel for 
and that is all pulled into the child function and thus the information to 
compute the count is lost for the parent function.

There is a hack that I think may get around this. This is a bit ugly and really 
is not the way I would think of _Cilk_fors. I am OK with trying this if you 
will accept it.
 
If I do something like this:
 
#pragma omp parallel for schedule (runtime:grain) if ((array.end() - 
array.begin ())/1)
_Cilk_for (vector int::iterator ii = array.begin (); ii != array.end (); ii++)
{
body
}

The new addition is if clause where if ((end - start) / step)

Then, in the expand_parallel_task, I can extract the if (...) clause and then 
pass the expression as a parameter for the loop-count. Yes, it's bit ugly but 
if you are willing to accept it, I can try to implement this.

Please let me know.

Thanks,

Balaji V. Iyer.


[PING] [PATCH] _Cilk_for for C and C++

2014-01-27 Thread Iyer, Balaji V
Hi Jakub et al.,

Did you get a chance to look at this _Cilk_for patch? 

Thanks,

Balaji V. Iyer.

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Iyer, Balaji V
 Sent: Friday, January 24, 2014 3:34 PM
 To: Jakub Jelinek
 Cc: Jason Merrill; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
 'r...@redhat.com'
 Subject: RE: [PATCH] _Cilk_for for C and C++
 
 
 
  -Original Message-
  From: Jakub Jelinek [mailto:ja...@redhat.com]
  Sent: Friday, January 24, 2014 2:42 PM
  To: Iyer, Balaji V
  Cc: Jason Merrill; 'Jeff Law'; 'Aldy Hernandez';
  'gcc-patches@gcc.gnu.org'; 'r...@redhat.com'
  Subject: Re: [PATCH] _Cilk_for for C and C++
 
  On Thu, Jan 23, 2014 at 04:38:53PM +, Iyer, Balaji V wrote:
 This is how I started to think of it at first, but then when I
   thought
  about it ... in _Cilk_for unlike the #pragma simd's for, the for
  statement - not the body - (e.g. _Cilk_for (int ii = 0; ii  10;
  ii++) doesn't really do anything nor does it belong in the child
  function. It is really mostly used to calculate the loop count and capture
 step-size and starting point.
  
 The child function has its own loop that will have a step size of 1
  regardless of your step size. You use the step-size to find the correct 
  spot.
  Let me give you an example:
  
   _Cilk_for (int ii = 0; ii  10; ii = ii  + 2) {
 Array [ii] = 5;
   }
  
   This is translated to the following (assume grain is something that
   the user
  input):
  
   data_ptr.start = 0;
   data_ptr.end = 10;
   data_ptr.step_size = 2;
   __cilkrts_cilk_for_32 (child_function, data_ptr, (10-0)/2, grain);
  
   Child_function (void *data_ptr, int high, int low) {
 for (xx = low; xx  high; xx++)
  {
 Tmp_var = (xx * data_ptr-step_size) + data_ptr-start;
 // Note: if the _Cilk_for was (ii = 9; ii = 0; ii -= 2), we 
   would
  have something like this:
 // Tmp_var = data_ptr-end - (xx * data_ptr-step_size)
 // The for-loop above won't change.
 Array[Tmp_var] = 5;
 }
   }
 
  This isn't really much different from
  #pragma omp parallel for schedule(runtime, N) (i.e. the combined
  construct), when it is combined, we also don't emit a call to
  GOMP_parallel but to some other function to which we pass the number
  of iterations and chunk size (== grain in Cilk+ terminology), the only
  (minor) difference is that for OpenMP when you handle the whole low ...
  high range the child function doesn't exit, but calls a function to
  give it next pari of low/high and only when that function tells it
  there is no further work to do, it returns.  But, the Cilk+ case is
  clearly the same thing with just implicit telling there is no further work 
  in
 the current function.
 
  So, I'd strongly prefer if you swap the parallel with Cilk_for, just
  set the flag that the two are combined like OpenMP already has for
  tons of constructs, and during expansion you just treat it together.
 
 Hi Jakub,
   What you are suggesting here would require a significant rewrite of
 the code. This version of _Cilk_for works and it does share significant amount
 of work with OMP routines as requested by other GCC developers. Given
 the time constraints, let's try to get this version accepted so that the 
 feature
 will be available for the users and we will look into moving toward your
 suggestion when the phase 1 opens again.
 
 Thanks,
 
 Balaji V. Iyer.
 
 
 
  Jakub


Re: [PING] [PATCH] _Cilk_for for C and C++

2014-01-27 Thread Jakub Jelinek
On Mon, Jan 27, 2014 at 08:41:14PM +, Iyer, Balaji V wrote:
   Did you get a chance to look at this _Cilk_for patch? 

IMHO it is not as much work as you are fearing, at most a few hours of work
to get it right, and well worth doing.  So, please at least try it out
and if you get stuck with it, explain why.

Jakub


RE: [PING] [PATCH] _Cilk_for for C and C++

2014-01-27 Thread Iyer, Balaji V
 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Jakub Jelinek
 Sent: Monday, January 27, 2014 3:50 PM
 To: Iyer, Balaji V
 Cc: Jason Merrill; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
 'r...@redhat.com'
 Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
 
 On Mon, Jan 27, 2014 at 08:41:14PM +, Iyer, Balaji V wrote:
  Did you get a chance to look at this _Cilk_for patch?
 
 IMHO it is not as much work as you are fearing, at most a few hours of work
 to get it right, and well worth doing.  So, please at least try it out and if 
 you
 get stuck with it, explain why.

Hi Jakub,
I tried it that way in the original patch submission for C 
(http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01369.html), but it hit a 
dead-end when I was trying to get STL iterators working for C++. This is why I 
re-structured things this way to get them both working.

Thanks,

Balaji V. Iyer.

 
   Jakub


Re: [PATCH] _Cilk_for for C and C++

2014-01-24 Thread Jakub Jelinek
On Thu, Jan 23, 2014 at 04:38:53PM +, Iyer, Balaji V wrote:
   This is how I started to think of it at first, but then when I thought 
 about it ... in _Cilk_for unlike the #pragma simd's for, the for statement - 
 not the body - (e.g. _Cilk_for (int ii = 0; ii  10; ii++) doesn't really 
 do anything nor does it belong in the child function. It is really mostly 
 used to calculate the loop count and capture step-size and starting point.
 
   The child function has its own loop that will have a step size of 1 
 regardless of your step size. You use the step-size to find the correct spot. 
 Let me give you an example:
 
 _Cilk_for (int ii = 0; ii  10; ii = ii  + 2)
 {
   Array [ii] = 5;
 }
 
 This is translated to the following (assume grain is something that the user 
 input):
 
 data_ptr.start = 0;
 data_ptr.end = 10;
 data_ptr.step_size = 2;
 __cilkrts_cilk_for_32 (child_function, data_ptr, (10-0)/2, grain);
 
 Child_function (void *data_ptr, int high, int low)
 {
   for (xx = low; xx  high; xx++) 
{
   Tmp_var = (xx * data_ptr-step_size) + data_ptr-start;
   // Note: if the _Cilk_for was (ii = 9; ii = 0; ii -= 2), we 
 would have something like this:
   // Tmp_var = data_ptr-end - (xx * data_ptr-step_size)
   // The for-loop above won't change.  
   Array[Tmp_var] = 5;
   }
 }

This isn't really much different from
#pragma omp parallel for schedule(runtime, N)
(i.e. the combined construct), when it is combined, we also don't emit a
call to GOMP_parallel but to some other function to which we pass the
number of iterations and chunk size (== grain in Cilk+ terminology), the
only (minor) difference is that for OpenMP when you handle the whole low ...
high range the child function doesn't exit, but calls a function to give it
next pari of low/high and only when that function tells it there is no
further work to do, it returns.  But, the Cilk+ case is clearly the same
thing with just implicit telling there is no further work in the current
function.

So, I'd strongly prefer if you swap the parallel with Cilk_for, just set
the flag that the two are combined like OpenMP already has for tons of
constructs, and during expansion you just treat it together.

Jakub


RE: [PATCH] _Cilk_for for C and C++

2014-01-24 Thread Iyer, Balaji V


 -Original Message-
 From: Jakub Jelinek [mailto:ja...@redhat.com]
 Sent: Friday, January 24, 2014 2:42 PM
 To: Iyer, Balaji V
 Cc: Jason Merrill; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
 'r...@redhat.com'
 Subject: Re: [PATCH] _Cilk_for for C and C++
 
 On Thu, Jan 23, 2014 at 04:38:53PM +, Iyer, Balaji V wrote:
  This is how I started to think of it at first, but then when I thought
 about it ... in _Cilk_for unlike the #pragma simd's for, the for statement - 
 not
 the body - (e.g. _Cilk_for (int ii = 0; ii  10; ii++) doesn't really do 
 anything
 nor does it belong in the child function. It is really mostly used to 
 calculate the
 loop count and capture step-size and starting point.
 
  The child function has its own loop that will have a step size of 1
 regardless of your step size. You use the step-size to find the correct spot.
 Let me give you an example:
 
  _Cilk_for (int ii = 0; ii  10; ii = ii  + 2) {
  Array [ii] = 5;
  }
 
  This is translated to the following (assume grain is something that the user
 input):
 
  data_ptr.start = 0;
  data_ptr.end = 10;
  data_ptr.step_size = 2;
  __cilkrts_cilk_for_32 (child_function, data_ptr, (10-0)/2, grain);
 
  Child_function (void *data_ptr, int high, int low) {
  for (xx = low; xx  high; xx++)
   {
  Tmp_var = (xx * data_ptr-step_size) + data_ptr-start;
  // Note: if the _Cilk_for was (ii = 9; ii = 0; ii -= 2), we 
  would
 have something like this:
  // Tmp_var = data_ptr-end - (xx * data_ptr-step_size)
  // The for-loop above won't change.
  Array[Tmp_var] = 5;
  }
  }
 
 This isn't really much different from
 #pragma omp parallel for schedule(runtime, N) (i.e. the combined
 construct), when it is combined, we also don't emit a call to GOMP_parallel
 but to some other function to which we pass the number of iterations and
 chunk size (== grain in Cilk+ terminology), the only (minor) difference is 
 that
 for OpenMP when you handle the whole low ...
 high range the child function doesn't exit, but calls a function to give it 
 next
 pari of low/high and only when that function tells it there is no further work
 to do, it returns.  But, the Cilk+ case is clearly the same thing with just 
 implicit
 telling there is no further work in the current function.
 
 So, I'd strongly prefer if you swap the parallel with Cilk_for, just set the 
 flag
 that the two are combined like OpenMP already has for tons of constructs,
 and during expansion you just treat it together.

Hi Jakub,
What you are suggesting here would require a significant rewrite of the 
code. This version of _Cilk_for works and it does share significant amount of 
work with OMP routines as requested by other GCC developers. Given the time 
constraints, let's try to get this version accepted so that the feature will be 
available for the users and we will look into moving toward your suggestion 
when the phase 1 opens again.

Thanks,

Balaji V. Iyer.


 
   Jakub


Re: [PATCH] _Cilk_for for C and C++

2014-01-23 Thread Jakub Jelinek
On Sun, Jan 19, 2014 at 04:50:39AM +, Iyer, Balaji V wrote:
 I have answered your questions below. In addition to your changes, I have
 also fixed the issues Aldy pointed out and have answered his questions in
 that thread.  With this email I have attached two patches and 2
 change-logs (for C and C++).  I have also rebased these patches to the
 trunk revision (r206756)

Haven't looked at the patch yet, just applied and looked what it generates:

Say in cilk_fors.c -O2 -fcilkplus -std=c99 
-fdump-tree-{original,gimple,omplower,ompexp}
I'm seeing in *.original dump:
 Unknown tree: cilk_for
which suggests that tree-pretty-print.c doesn't handle CILK_FOR.

Much more important is what is seen in the *.gimple dump though:
   schedule(runtime,0) private(ii)
  _Cilk_for (ii = 0; ii = 9; ii = ii + 1)
{
  #pragma omp parallel shared(__cilk_incr.0) shared(__cilk_cond.2) 
shared(__cilk_init.1) shared(ii) shared(Array)
{
  Array[ii] = 1133;
}
}
Why do you put the parallel inside of _Cilk_for rather than the other way
around?  That looks just wrong.  That would represent runtime scheduling of
work across the parallel regions at the _Cilk_for, and then in each
iteration running the body in several threads concurrently.
You want the parallel around the _Cilk_for, and
gimple_omp_parallel_set_combined_p (parallel_stmt, true) so that you can
then handle it specially during omp lowering/expansion.

Also, the printing of _Cilk_for is weird, the clauses (with space before)
look really weird above the _Cilk_for when there is no #pragma or something
similar.  Perhaps print the clauses after _Cilk_for?
  _Cilk_for (ii = 0; ii = 9; ii = ii + 1) schedule(runtime,0) private(ii)
?

Jakub


RE: [PATCH] _Cilk_for for C and C++

2014-01-23 Thread Iyer, Balaji V
Hi Jakub,

 -Original Message-
 From: Jakub Jelinek [mailto:ja...@redhat.com]
 Sent: Thursday, January 23, 2014 5:13 AM
 To: Iyer, Balaji V
 Cc: Jason Merrill; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
 'r...@redhat.com'
 Subject: Re: [PATCH] _Cilk_for for C and C++
 
 On Sun, Jan 19, 2014 at 04:50:39AM +, Iyer, Balaji V wrote:
  I have answered your questions below. In addition to your changes, I
  have also fixed the issues Aldy pointed out and have answered his
  questions in that thread.  With this email I have attached two patches
  and 2 change-logs (for C and C++).  I have also rebased these patches
  to the trunk revision (r206756)
 
 Haven't looked at the patch yet, just applied and looked what it generates:
 
 Say in cilk_fors.c -O2 -fcilkplus -std=c99 -fdump-tree-
 {original,gimple,omplower,ompexp}
 I'm seeing in *.original dump:
  Unknown tree: cilk_for
 which suggests that tree-pretty-print.c doesn't handle CILK_FOR.
 

OK. I will work on this and send you a patch soon.

 Much more important is what is seen in the *.gimple dump though:
schedule(runtime,0) private(ii)
   _Cilk_for (ii = 0; ii = 9; ii = ii + 1)
 {
   #pragma omp parallel shared(__cilk_incr.0) shared(__cilk_cond.2)
 shared(__cilk_init.1) shared(ii) shared(Array)
 {
   Array[ii] = 1133;
 }
 }
 Why do you put the parallel inside of _Cilk_for rather than the other way
 around?  That looks just wrong.  That would represent runtime scheduling of
 work across the parallel regions at the _Cilk_for, and then in each iteration
 running the body in several threads concurrently.
 You want the parallel around the _Cilk_for, and
 gimple_omp_parallel_set_combined_p (parallel_stmt, true) so that you can
 then handle it specially during omp lowering/expansion.

This is how I started to think of it at first, but then when I thought 
about it ... in _Cilk_for unlike the #pragma simd's for, the for statement - 
not the body - (e.g. _Cilk_for (int ii = 0; ii  10; ii++) doesn't really do 
anything nor does it belong in the child function. It is really mostly used to 
calculate the loop count and capture step-size and starting point.

The child function has its own loop that will have a step size of 1 
regardless of your step size. You use the step-size to find the correct spot. 
Let me give you an example:

_Cilk_for (int ii = 0; ii  10; ii = ii  + 2)
{
Array [ii] = 5;
}

This is translated to the following (assume grain is something that the user 
input):

data_ptr.start = 0;
data_ptr.end = 10;
data_ptr.step_size = 2;
__cilkrts_cilk_for_32 (child_function, data_ptr, (10-0)/2, grain);

Child_function (void *data_ptr, int high, int low)
{
for (xx = low; xx  high; xx++) 
 {
Tmp_var = (xx * data_ptr-step_size) + data_ptr-start;
// Note: if the _Cilk_for was (ii = 9; ii = 0; ii -= 2), we 
would have something like this:
// Tmp_var = data_ptr-end - (xx * data_ptr-step_size)
// The for-loop above won't change.  
Array[Tmp_var] = 5;
}
}

High and low are passed in by the runtime and thus their range can be any 
number (in this case between 1 and 5)

Now, if we model this like #pragma omp parallel for then all _Cilk_for 
statement will also be pulled into the child function. This can be circumvented 
(althrough I feel it is a bit convoluted) using the region-data_arg pointers 
to pass values back and forth and doing other adjustments in expand_omp_for, 
and it will work for C, but it gets problematic for STL. This is because during 
the gimplification, the vector.start() and vector.end () are replaced with the 
start and end integers and the translation is put in the child function and 
that messes things up in omp-low.c and the calculation for count in the parent 
function. Another thing also gets messed up for C++ which I can't recall off 
the top of my head.

On high-level, you can't think of _Cilk_for in terms of Open MP's for. These 
both are orthogonal technologies that produce parallel code with different 
starting points and assumptions. The reason why I modeled this way in the 
compiler is so that I can use OMP's  compiler-routines. Some things such as 
creating a child function, inserting *a* call to the library function are same 
and thus the routines to do those can be shared but the internals are very 
different and so this is why it  is modelled this way in the compiler.

 
 Also, the printing of _Cilk_for is weird, the clauses (with space before) look
 really weird above the _Cilk_for when there is no #pragma or something
 similar.  Perhaps print the clauses after _Cilk_for?
   _Cilk_for (ii = 0; ii = 9; ii = ii + 1) schedule(runtime,0) private(ii) ?
 

OK. I will work on this and send you a patch.

   Jakub


RE: [PATCH] _Cilk_for for C and C++

2014-01-18 Thread Iyer, Balaji V
Hi Aldy,
I have answered your questions below. But, I am attaching the patch 
with the response to Jason Merrill's email. I am not attaching them here to 
remove unnecessary duplication. I hope that's OK with you. If you would like it 
otherwise please let me know and I can send them to you.

Thanks,

Balaji V. Iyer.

 -Original Message-
 From: Aldy Hernandez [mailto:al...@redhat.com]
 Sent: Thursday, January 16, 2014 4:19 PM
 To: Iyer, Balaji V; Jakub Jelinek
 Cc: Jason Merrill; 'Jeff Law'; 'gcc-patches@gcc.gnu.org'; 'r...@redhat.com'
 Subject: Re: [PATCH] _Cilk_for for C and C++
 
 Here are a few things.
 
  +  if (g_expr.value  TREE_CODE (g_expr.value) ==
 C_MAYBE_CONST_EXPR)
  +   {
  + error_at (input_location, cannot convert grain to long integer.\n);
  + c_parser_skip_to_pragma_eol (parser);
  +   }
 
 Remove final period.  Also, where's the testcase?  Also, there seems to be
 spurious white space after the }.
 

Was not necessary. Fixed.

 Is it required that it be a long integer?  Because I see no further checks for
 this.
 
Yes, I believe the language spec says the grainsize should be convertible to a 
long int.

  + c_token *token = c_parser_peek_token (parser);
  + if (token  token-type == CPP_KEYWORD
  +  token-keyword == RID_CILK_FOR)
 
 It doesn't look like c_parser_peek_token() ever returns NULL, so no need to
 check for token != 0.

Fixed.

 
  + tree grain = convert_to_integer (long_integer_type_node,
  +  g_expr.value);
  + if (grain  grain != error_mark_node)
  +   c_parser_cilk_simd (parser, grain);
 
 No need to check grain != 0 here either.
 

Fixed.

  == a.c.003t.original ==
 
  ;; Function main (null)
  ;; enabled by -tree-original
 
 
  {
int i;
 
  int i;
 Unknown tree: cilk_for
#pragma omp parallel
  {
{
 
 Found with -fdump-tree-all.  You should handle the cilk_for tree code in the
 pretty printers, and add corresponding test(s).
 

I didn't fix this yet. I didn't understand how to add tests to check 
pretty-printers...


 
   static void
  -c_parser_cilk_simd (c_parser *parser)
  +c_parser_cilk_simd (c_parser *parser, tree grain)
 
 No documentation for grain.
 


Fixed.

  +  tree clauses = NULL_TREE;
  +
  +  if (!is_cilk_for)
  +clauses = c_parser_cilk_all_clauses (parser);
  +  else
  +clauses = grain;
 
 First set of clauses=NULL_TREE is useless.
 

OK. Just my common practice of initialize all the variables when it is defined. 
I have removed it.

   static tree
   c_parser_omp_for_loop (location_t loc, c_parser *parser, enum
 tree_code code,
  -  tree clauses, tree *cclauses)
  +  tree clauses_or_grain, tree *cclauses)
 
 Don't overload clauses and grainsize into one argument.  Add another
 argument.  Also, document said argument.
 

OK. I have added a new parameter called grain and renamed the clauses_or_grain 
to clauses.

  +/* Returns a FUNCTION_DECL of type TYPE whose name is *NAME.  */
  +
  +static tree
  +cilk_declare_looper (const char *name, tree type, enum built_in_function
 code)
  +{
 
 I think you should document that it's creating a suitable built-in, not
 just creating a FUNCTION_DECL.  Also, plesae document argument `code'.
 And call this function something more meaningful, like
 cilkrts_decalre_builtin or cilk_declare_for_builtin, but definitely
 not looper :).
 

Renamed to declare_cilk_for_builtin.

  @@ -1192,6 +1199,9 @@ dump_gimple_omp_for (pretty_printer *buffer,
 gimple gs, int spc, int flags)
  case GE_EXPR:
pp_greater_equal (buffer);
break;
  +   case NE_EXPR:
  + pp_string (buffer, !=);
  + break;
 
 Thank you :).  That was probably my oversight on the pragma simd work.
 

Your welcome.

  @@ -6603,6 +6614,11 @@ gimplify_omp_for (tree *expr_p, gimple_seq
 *pre_p)
   }
 
 for_body = NULL;
  +  if (flag_enable_cilkplus  TREE_CODE (for_stmt) == CILK_FOR)
  +{
  +  tree it = TREE_VEC_ELT (OMP_FOR_INIT (for_stmt), 0);
  +  gimplify_and_add (it, for_pre_body);
  +}
 
 And what Jason said for all the special casing for CILK_FOR in this
 function...
 

This removed as mentioned in email to Jason.

  +static inline void
  +gimple_cilk_for_set_grain (tree grain, gimple gs)
  +{
  +  const gimple_statement_omp_for *omp_for_stmt =
  +as_a gimple_statement_omp_for (gs);
  +  omp_for_stmt-iter[0].grain = grain;
  +}
 
 Can we leave the grainsize in the clause, or does it have to reside in
 an auxiliary data structure?  It seems weird as is, since you're only
 setting grain for the first element.  I think Jason mentioned something
 similar.
 

Got rid of the grainsize functions and just kept it as a clause in _Cilk_for 
(as suggested by Jason).

  +/* A structure with necessary elements from _Cilk_for statement.  This
  +   struct. node is passed in to WALK_STMT_INFO-INFO.  */
  +typedef struct

Re: [PATCH] _Cilk_for for C and C++

2014-01-17 Thread Marek Polacek
On Thu, Jan 16, 2014 at 01:18:59PM -0800, Aldy Hernandez wrote:
 I'm not a C++ expert, but my understanding was that in C++ you don't
 need a typedef to use the following structure by name
 (cilk_for_information).  So you can just declare struct
 cilk_for_information {...} and instantiate it with just
 cilk_for_information some_instance.  If that's the case, get rid
 of typedef.

Yes.  That's what create_implicit_typedef does.

Marek


Re: [PATCH] _Cilk_for for C and C++

2014-01-16 Thread Jason Merrill

On 01/08/2014 02:46 PM, Iyer, Balaji V wrote:

+  /* Grain value, only used by _Cilk_for.  */
+  tree grain;


Why can't the grain stay as a clause for the gimple form of the loop?


+  if (flag_enable_cilkplus  TREE_CODE (for_stmt) == CILK_FOR)
+{
+  tree it = TREE_VEC_ELT (OMP_FOR_INIT (for_stmt), 0);
+  gimplify_and_add (it, for_pre_body);
+}


Why doesn't the normal handling of OMP_FOR_INIT work for Cilk?  All the 
special cases for CILK_FOR need comments explaining why they are needed.


Also, this seems like you're assigning to the control variable outside 
of the loop, which doesn't makes sense because we initialize it in each 
of the invocations of the child function.  Right?



+  /* Original initial, final and increment values are necessary to compute
+the loop-count.  Otherwise, they are stored in variables and their
+context could be changed, potentially making it impossible to compute
+them correctly.  */


I don't understand.  Surely all you care about is the value, and 
gimplification shouldn't affect that.



+ /* If VAR is the induction variable of the outer _Cilk_for, then
+it needs to be passed as a value not pointer since it
+would not be overwritten by the body.  */


Here it looks like you're overriding the normal logic because we know 
that it's safe to assume the induction variable won't be changed by the 
body of the loop.  But why is the induction variable shared in the first 
place?  If it isn't going to change, it can be private.



+   /* We cannot use the tsubst_omp_clauses since it will try to
+  do checking such as whether a certain clause can be used
+  with a certain for-loop.  We are just use schedule clause here
+  as a holder to hold the grain value.  */


I don't see the checking you mention.  Can't we fix it to do the right 
thing?



+  if (code == CILK_FOR)
+{
+  top_level_body = push_stmt_list ();
+  top_body = begin_omp_parallel ();
+}


I wouldn't expect the front end to care that Cilk for is implemented 
using a parallel call; can't we bring that in at lowering time?


Jason



Re: [PATCH] _Cilk_for for C and C++

2014-01-16 Thread Jakub Jelinek
On Thu, Jan 16, 2014 at 12:29:44PM -0500, Jason Merrill wrote:
 +  if (code == CILK_FOR)
 +{
 +  top_level_body = push_stmt_list ();
 +  top_body = begin_omp_parallel ();
 +}
 
 I wouldn't expect the front end to care that Cilk for is implemented
 using a parallel call; can't we bring that in at lowering time?

The gimplifier already cares, so if it shouldn't be added early in the FE,
it must be added during genericization.
Unless we treat CILK_FOR as implicit parallel in the gimplifier, but I'd say
that it is better to have it expressed as parallel with cilk_for nested in
it and the combined flag set, so that omp-low.c can then emit it together.

Jakub


Re: [PATCH] _Cilk_for for C and C++

2014-01-16 Thread Aldy Hernandez

Here are a few things.


+  if (g_expr.value  TREE_CODE (g_expr.value) == C_MAYBE_CONST_EXPR)
+   {
+ error_at (input_location, cannot convert grain to long integer.\n);
+ c_parser_skip_to_pragma_eol (parser);
+   }


Remove final period.  Also, where's the testcase?  Also, there seems to 
be spurious white space after the }.


Is it required that it be a long integer?  Because I see no further 
checks for this.



+ c_token *token = c_parser_peek_token (parser);
+ if (token  token-type == CPP_KEYWORD
+  token-keyword == RID_CILK_FOR)


It doesn't look like c_parser_peek_token() ever returns NULL, so no need 
to check for token != 0.



+ tree grain = convert_to_integer (long_integer_type_node,
+  g_expr.value);
+ if (grain  grain != error_mark_node)
+   c_parser_cilk_simd (parser, grain);


No need to check grain != 0 here either.


== a.c.003t.original ==

;; Function main (null)
;; enabled by -tree-original


{
  int i;

int i;
   Unknown tree: cilk_for
  #pragma omp parallel
{
  {


Found with -fdump-tree-all.  You should handle the cilk_for tree code in 
the pretty printers, and add corresponding test(s).




 static void
-c_parser_cilk_simd (c_parser *parser)
+c_parser_cilk_simd (c_parser *parser, tree grain)


No documentation for grain.


+  tree clauses = NULL_TREE;
+
+  if (!is_cilk_for)
+clauses = c_parser_cilk_all_clauses (parser);
+  else
+clauses = grain;


First set of clauses=NULL_TREE is useless.


 static tree
 c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
-  tree clauses, tree *cclauses)
+  tree clauses_or_grain, tree *cclauses)


Don't overload clauses and grainsize into one argument.  Add another 
argument.  Also, document said argument.



+/* Returns a FUNCTION_DECL of type TYPE whose name is *NAME.  */
+
+static tree
+cilk_declare_looper (const char *name, tree type, enum built_in_function code)
+{


I think you should document that it's creating a suitable built-in, not 
just creating a FUNCTION_DECL.  Also, plesae document argument `code'. 
And call this function something more meaningful, like 
cilkrts_decalre_builtin or cilk_declare_for_builtin, but definitely 
not looper :).



@@ -1192,6 +1199,9 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, 
int spc, int flags)
case GE_EXPR:
  pp_greater_equal (buffer);
  break;
+   case NE_EXPR:
+ pp_string (buffer, !=);
+ break;


Thank you :).  That was probably my oversight on the pragma simd work.


@@ -6603,6 +6614,11 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
 }

   for_body = NULL;
+  if (flag_enable_cilkplus  TREE_CODE (for_stmt) == CILK_FOR)
+{
+  tree it = TREE_VEC_ELT (OMP_FOR_INIT (for_stmt), 0);
+  gimplify_and_add (it, for_pre_body);
+}


And what Jason said for all the special casing for CILK_FOR in this 
function...



+static inline void
+gimple_cilk_for_set_grain (tree grain, gimple gs)
+{
+  const gimple_statement_omp_for *omp_for_stmt =
+as_a gimple_statement_omp_for (gs);
+  omp_for_stmt-iter[0].grain = grain;
+}


Can we leave the grainsize in the clause, or does it have to reside in 
an auxiliary data structure?  It seems weird as is, since you're only 
setting grain for the first element.  I think Jason mentioned something 
similar.



+/* A structure with necessary elements from _Cilk_for statement.  This
+   struct. node is passed in to WALK_STMT_INFO-INFO.  */
+typedef struct cilk_for_information {


{ should be in a separate line.

for a Cilk_for statement, not from Cilk_for.  No abbreviation on 
struct.  Either remove the period, or spell it out.  Also s/in to/to/.


I'm not a C++ expert, but my understanding was that in C++ you don't 
need a typedef to use the following structure by name 
(cilk_for_information).  So you can just declare struct 
cilk_for_information {...} and instantiate it with just 
cilk_for_information some_instance.  If that's the case, get rid of 
typedef.



+  if (flag_enable_cilkplus


BTW, weren't you going to change this to flag_cilkplus or something in 
some past follow-up?



  fd-sched_kind = OMP_CLAUSE_SCHEDULE_STATIC;
  fd-chunk_size = NULL_TREE;
+  if (flag_enable_cilkplus
+   gimple_omp_for_kind (fd-for_stmt) ==  GF_OMP_FOR_KIND_CILKFOR)
+fd-sched_kind = OMP_CLAUSE_SCHEDULE_CILKFOR;


I believe most of the flag_enable_cilkplus checks in omp-low.c can be 
removed, especially the ones related to syntax.  You shouldn't be 
getting any cilk constructs this late if cilkplus was not enabled.  For 
that matter, we don't check flag_openmp in this file throughout, we only 
check for it at the gates.



  bool is_cilk_for = (flag_enable_cilkplus  outer_ctx
   is_cilk_for_stmt (outer_ctx-stmt, ind_var));


Although here you 

Re: [PATCH] _Cilk_for for C and C++

2014-01-08 Thread Jakub Jelinek
On Tue, Jan 07, 2014 at 10:11:59PM +, Iyer, Balaji V wrote:
   I used a similar existing one (safelen). Attached, please find 2
 fixed patches for C and C++ along with their changelogs.

But safelen is something completely different, while if I skim
the _Cilk_for docs, the grain is really a chunk size, where the runtime
library performs the scheduling of grain sized chunks, so using
OMP_CLAUSE_SCHEDULE clause with
OMP_CLAUSE_SCHEDULE_KIND (c) = OMP_CLAUSE_SCHEDULE_RUNTIME;
OMP_CLAUSE_SCHEDULE_CHUNK_EXPR (c) = grain_expr;
sounds like what should be used.  OMP_CLAUSE_SAFELEN says what is the
minimal vectorization factor the compiler can assume is safe for
a simd loop.

Jakub


Re: [PATCH] _Cilk_for for C and C++

2014-01-07 Thread Jason Merrill

On 12/17/2013 07:21 PM, Iyer, Balaji V wrote:

The reason why I store it in OMP_FOR_CLAUSE is because OMP clauses cannot occur 
in _Cilk_for. So adding a new clause seem to be an overkill IMHO. I need a 
place to store the grain value and so I chose this spot.


But code expects OMP_FOR_CLAUSES to have a certain form, and you are 
violating that so that now code needs to check whether we're dealing 
with a for loop in order to know to parse OMP_FOR_CLAUSES.  Doing it 
your way requires lots of little special cases.  Please represent it as 
a clause.


Jason



RE: [PATCH] _Cilk_for for C and C++

2014-01-07 Thread Iyer, Balaji V


 -Original Message-
 From: Jason Merrill [mailto:ja...@redhat.com]
 Sent: Tuesday, January 7, 2014 3:41 PM
 To: Iyer, Balaji V; 'Jeff Law'; 'Aldy Hernandez'
 Cc: 'gcc-patches@gcc.gnu.org'; 'r...@redhat.com'; 'Jakub Jelinek'
 Subject: Re: [PATCH] _Cilk_for for C and C++
 
 On 12/17/2013 07:21 PM, Iyer, Balaji V wrote:
  The reason why I store it in OMP_FOR_CLAUSE is because OMP clauses
 cannot occur in _Cilk_for. So adding a new clause seem to be an overkill
 IMHO. I need a place to store the grain value and so I chose this spot.
 
 But code expects OMP_FOR_CLAUSES to have a certain form, and you are
 violating that so that now code needs to check whether we're dealing with a
 for loop in order to know to parse OMP_FOR_CLAUSES.  Doing it your way
 requires lots of little special cases.  Please represent it as a clause.

Hi Jason,
In gimplify_omp_for, I remove the information in OMP_FOR_CLAUSES () and 
then replace it with a NULL_TREE. Till that point, nothing steps on it (except 
in pt.c and that I am handling it).
Then the grain value is stored in gimple tree for omp_for.

Thanks,

Balaji V. Iyer.

 
 Jason



Re: [PATCH] _Cilk_for for C and C++

2014-01-07 Thread Jakub Jelinek
On Tue, Jan 07, 2014 at 09:24:21PM +, Iyer, Balaji V wrote:
  -Original Message-
  From: Jason Merrill [mailto:ja...@redhat.com]
  Sent: Tuesday, January 7, 2014 3:41 PM
  To: Iyer, Balaji V; 'Jeff Law'; 'Aldy Hernandez'
  Cc: 'gcc-patches@gcc.gnu.org'; 'r...@redhat.com'; 'Jakub Jelinek'
  Subject: Re: [PATCH] _Cilk_for for C and C++
  
  On 12/17/2013 07:21 PM, Iyer, Balaji V wrote:
   The reason why I store it in OMP_FOR_CLAUSE is because OMP clauses
  cannot occur in _Cilk_for. So adding a new clause seem to be an overkill
  IMHO. I need a place to store the grain value and so I chose this spot.
  
  But code expects OMP_FOR_CLAUSES to have a certain form, and you are
  violating that so that now code needs to check whether we're dealing with a
  for loop in order to know to parse OMP_FOR_CLAUSES.  Doing it your way
  requires lots of little special cases.  Please represent it as a clause.
 
 Hi Jason,
   In gimplify_omp_for, I remove the information in OMP_FOR_CLAUSES ()
   and then replace it with a NULL_TREE.  Till that point, nothing
   steps on it (except in pt.c and that I am handling it).  Then the
   grain value is stored in gimple tree for omp_for.

So, you are abusing OMP_FOR_CLAUSES for shorter time, still, I agree with
Jason that you shouldn't do that.

If you don't want to add a new clause, just use a similar existing one,
if grain is something like scheduling chunk size, just with a different
name for it, then using OMP_CLAUSE_SCHEDULE with OMP_CLAUSE_SCHEDULE_EXPR
being the grain expression would be certainly cleaner.
But even adding a new artificial clause isn't that hard.

Jakub


Re: [PATCH] _Cilk_for for C and C++

2013-12-16 Thread Jason Merrill

On 12/15/2013 07:40 PM, Iyer, Balaji V wrote:

-  tree clauses, tree *cclauses)
+  tree clauses_or_grain, tree *cclauses)


Instead of this, please make the grainsize a new type of clause.


-  return (gimple_omp_subcode (g)  GF_OMP_FOR_COMBINED) != 0;
+  return (gimple_omp_for_kind (g) == GF_OMP_FOR_COMBINED);


I don't really know this code, but this change seems unlikely to be 
correct.  Can you explain it?



+  tree data_name = get_identifier (.omp_data_i);
+  if (is_cilk_for)
+data_name = get_identifier (.cilk_for_data_i);


Why does the name of an artificial parameter matter?


 }
+/* A subroutine of expand_omp_for.  Generate code for _Cilk_for loop.
+   Given parameters:


Need a blank line after the }.

Jason



RE: [PATCH] _Cilk_for for C and C++

2013-12-16 Thread Iyer, Balaji V
 
  -  return (gimple_omp_subcode (g)  GF_OMP_FOR_COMBINED) != 0;
  +  return (gimple_omp_for_kind (g) == GF_OMP_FOR_COMBINED);
 
 I don't really know this code, but this change seems unlikely to be correct.
 Can you explain it?

I really need help on this. I need a new enum type (I call this: 
GF_OMP_FOR_KIND_CILKFOR) that I need to mark _Cilk_for so that I can 
differentiate it. Can someone please help me pick a new one or explain to me 
how the enum gf_mask is structured? I am not understanding the header comment  
fully...

Thanks,

Balaji V. Iyer.


RE: [PATCH] _Cilk_for for C and C++

2013-12-03 Thread Iyer, Balaji V


 -Original Message-
 From: Jeff Law [mailto:l...@redhat.com]
 Sent: Tuesday, December 3, 2013 1:30 AM
 To: Jason Merrill; Iyer, Balaji V; Aldy Hernandez
 Cc: gcc-patches@gcc.gnu.org; r...@redhat.com; Jakub Jelinek
 Subject: Re: [PATCH] _Cilk_for for C and C++
 
 On 11/27/13 17:52, Jason Merrill wrote:
  On 11/27/2013 04:14 PM, Iyer, Balaji V wrote:
  I completely agree with you that there are certain parts of Cilk
  Plus that is similar to OMP4, namely #pragma simd and SIMD-enabled
  functions (formerly called elemental functions). But, the Cilk
  keywords is almost completely orthogonal to OpenMP. They are
  semantically different  and one cannot be transformed to another.
  Cilk uses automatically load-balanced work-stealing using the Cilk
  runtime, whereas OMP uses work sharing via OMP runtime. There are a
  number of other semantic differences but this is the core-issue.
  #pragma simd and #pragma omp have converged in several places but the
  Cilk part has always been different from OpenMP.
 
  Yes, Cilk for loops will use the Cilk runtime and OMP for loops will
  use the OMP runtime, but that doesn't mean they can't share a lot of
  the middle end code along the way.
 
  We already have several different varieties of parallel/simd loops all
  represented by GIMPLE_OMP_FOR, and I think this could be another
  GP_OMP_FOR_KIND_.
 Right.  It's not a question of what runtime they call back into, but that both
 share a common overall structure.
 
 Conceptually I look at a for loop as having 4 main components.
 
 Initializer, test condition, increment and the body.
 
 I'd like to hope things like the syntatic  semantic analysis of the first 
 three
 would be largely the same.  Most of the Cilk specific bits would be in the
 handling of the body -- but there may be some significant code sharing that
 can happen there too.
 
 
 
  ...
  As you can tell, this is not how openmp handles a #pragma omp for loop.
 
  It's different in detail, but #pragma omp parallel for works very
  similarly: it creates a separate function for the body of the loop and
  then passes that to GOMP_parallel along with any shared data.
 My thoughts exactly.

I understand you both now. Let me look into the OMP routines and see what it is 
doing and see how I can port it to _Cilk_for. 

Thanks,

Balaji V. Iyer.

 jeff


Re: [PATCH] _Cilk_for for C and C++

2013-12-03 Thread Jakub Jelinek
On Tue, Dec 03, 2013 at 01:25:57PM +, Iyer, Balaji V wrote:
   As you can tell, this is not how openmp handles a #pragma omp for loop.
  
   It's different in detail, but #pragma omp parallel for works very
   similarly: it creates a separate function for the body of the loop and
   then passes that to GOMP_parallel along with any shared data.
  My thoughts exactly.
 
 I understand you both now. Let me look into the OMP routines and see what
 it is doing and see how I can port it to _Cilk_for.

Yeah.  The work is actually multi-stage, first during gimplification
the code does determine what variables are used in the #pragma omp parallel
(etc., in your case _Cilk_for) region, and whether they should be shared,
or privatized (and in that case in what way, normal private, firstprivate,
lastprivate, firstprivate+lastprivate, reduction, ...).  Then there is
omplower pass (already enabled for Cilk+ due to #pragma simd) that e.g.
lowers accesses to shared variables, creates new VAR_DECLs for the
privatized vars etc. and then ompexp pass that will create the outlined body
of the function and create call to the runtime library.
I have no idea what privatization behavior _Cilk_for wants, I'd expect that
at least the IV must be privatized, otherwise it would be racy, but about
other vars?

Jakub


RE: [PATCH] _Cilk_for for C and C++

2013-12-03 Thread Iyer, Balaji V


 -Original Message-
 From: Jakub Jelinek [mailto:ja...@redhat.com]
 Sent: Tuesday, December 3, 2013 8:40 AM
 To: Iyer, Balaji V
 Cc: Jeff Law; Jason Merrill; Aldy Hernandez; gcc-patches@gcc.gnu.org;
 r...@redhat.com
 Subject: Re: [PATCH] _Cilk_for for C and C++
 
 On Tue, Dec 03, 2013 at 01:25:57PM +, Iyer, Balaji V wrote:
As you can tell, this is not how openmp handles a #pragma omp for
 loop.
   
It's different in detail, but #pragma omp parallel for works very
similarly: it creates a separate function for the body of the loop
and then passes that to GOMP_parallel along with any shared data.
   My thoughts exactly.
 
  I understand you both now. Let me look into the OMP routines and see
  what it is doing and see how I can port it to _Cilk_for.
 
 Yeah.  The work is actually multi-stage, first during gimplification the code
 does determine what variables are used in the #pragma omp parallel (etc., in
 your case _Cilk_for) region, and whether they should be shared, or
 privatized (and in that case in what way, normal private, firstprivate,
 lastprivate, firstprivate+lastprivate, reduction, ...).  Then there is 
 omplower
 pass (already enabled for Cilk+ due to #pragma simd) that e.g.
 lowers accesses to shared variables, creates new VAR_DECLs for the
 privatized vars etc. and then ompexp pass that will create the outlined body
 of the function and create call to the runtime library.
 I have no idea what privatization behavior _Cilk_for wants, I'd expect that at
 least the IV must be privatized, otherwise it would be racy, but about other
 vars?
 

In Cilk_for you don't need to privatize any variables. I need to pass in the 
loop_count, the grain (if the user specifies one), the nested function and its 
context to a Cilk specific function:__cilkrts_cilk_for_64 (or 32). The nested 
function has the body of the _Cilk_for and it has 3 parameter, context, start 
and end, where the start and end are passed in by the runtime which will tell 
what parts of the loop to execute. This thread has an example: 
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03567.html

Thanks,

Balaji V. Iyer.

   Jakub


Re: [PATCH] _Cilk_for for C and C++

2013-12-03 Thread Jakub Jelinek
On Tue, Dec 03, 2013 at 02:01:17PM +, Iyer, Balaji V wrote:
 In Cilk_for you don't need to privatize any variables. I need to pass in
 the loop_count, the grain (if the user specifies one), the nested function
 and its context to a Cilk specific function:__cilkrts_cilk_for_64 (or 32). 
 The nested function has the body of the _Cilk_for and it has 3 parameter,
 context, start and end, where the start and end are passed in by the
 runtime which will tell what parts of the loop to execute.  This thread
 has an example: http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03567.html

So Cilk+ only allows say:
_Cilk_for (int ii = 5; ii  15; ii++)
{
body
}
and not
int ii;
_Cilk_for (ii = 5; ii  15; ii++)
{
body
}
?  Other variables can be all shared, that is after all the default
for #pragma omp parallel for, except for the IVs and a couple of other
exceptions (e.g. readonly vars etc.), if somebody wants private vars,
they can be surely declared inside of the body somwhere.

Jakub


Re: [PATCH] _Cilk_for for C and C++

2013-12-03 Thread Jeff Law

On 12/03/13 06:25, Iyer, Balaji V wrote:



I understand you both now. Let me look into the OMP routines and see what it is 
doing and see how I can port it to _Cilk_for.
Thanks.  I know it's a bit of a pain, but part what's driving the desire 
to share is to reduce the long term maintenance cost for everyone.


jeff


Re: [PATCH] _Cilk_for for C and C++

2013-12-02 Thread Jeff Law

On 11/27/13 17:52, Jason Merrill wrote:

On 11/27/2013 04:14 PM, Iyer, Balaji V wrote:

I completely agree with you that there are certain parts of Cilk
Plus that is similar to OMP4, namely #pragma simd and SIMD-enabled
functions (formerly called elemental functions). But, the Cilk
keywords is almost completely orthogonal to OpenMP. They are
semantically different  and one cannot be transformed to another. Cilk
uses automatically load-balanced work-stealing using the Cilk runtime,
whereas OMP uses work sharing via OMP runtime. There are a number of
other semantic differences but this is the core-issue. #pragma simd
and #pragma omp have converged in several places but the Cilk part has
always been different from OpenMP.


Yes, Cilk for loops will use the Cilk runtime and OMP for loops will use
the OMP runtime, but that doesn't mean they can't share a lot of the
middle end code along the way.

We already have several different varieties of parallel/simd loops all
represented by GIMPLE_OMP_FOR, and I think this could be another
GP_OMP_FOR_KIND_.
Right.  It's not a question of what runtime they call back into, but 
that both share a common overall structure.


Conceptually I look at a for loop as having 4 main components.

Initializer, test condition, increment and the body.

I'd like to hope things like the syntatic  semantic analysis of the 
first three would be largely the same.  Most of the Cilk specific bits 
would be in the handling of the body -- but there may be some 
significant code sharing that can happen there too.





...

As you can tell, this is not how openmp handles a #pragma omp for loop.


It's different in detail, but #pragma omp parallel for works very
similarly: it creates a separate function for the body of the loop and
then passes that to GOMP_parallel along with any shared data.

My thoughts exactly.
jeff


Re: [PING]RE: [PATCH] _Cilk_for for C and C++

2013-11-27 Thread Jason Merrill

On 11/26/2013 12:23 PM, Iyer, Balaji V wrote:

Did you get a chance to look at my _Cilk_for patch for C?


BTW, I think pinging less than 24 hours after you send the patch is a 
bit excessive.  :)


Jason



Re: [PATCH] _Cilk_for for C and C++

2013-11-27 Thread Jason Merrill

On 11/25/2013 11:03 PM, Iyer, Balaji V wrote:

On a broad note, I think there's a lot of OpenMP code you could be 
reusing here rather than writing it all again.  And that way Cilk code 
will benefit from improvements to OpenMP handling, and vice versa.  It 
probably makes sense to turn Cilk_for into an OMP_FOR loop, and then 
gimplify into GIMPLE_OMP_FOR, rather than create a new tree code and 
handle everything at the tree level.  But I don't know the OMP code well 
enough to suggest exactly how that would work.


Finer-grained comments:


+  tree exp = build_new_op (EXPR_LOCATION (op1), code, flags, op0, op1,
+  NULL_TREE, NULL, tf_none);
+  if (exp == error_mark_node)
+exp = build_x_modify_expr (EXPR_LOCATION (op1), op0, code, op1, tf_none);
+  if (exp  exp != error_mark_node)
+return exp;


I thought you were changing this?


+/* Handler for iterator to compute the loop variable.  ADD_OP indicates
+   whether we need a '+' or '-' operation. LOW indicates the starting point
+   and LOOP_VAR is the induction variable.  This functin returns an
+   INIT_EXPR.  */


This comment still doesn't document VAR2.

function


+  tree exp = build_new_op (loc, add_op, 0, low, loop_var, NULL_TREE, 0,
+  tf_none);
+  gcc_assert (exp != error_mark_node);
+  exp = cp_build_modify_expr (var2, INIT_EXPR, exp, tf_warning_or_error);


Looking at online Cilk documentation I see:


The increment expression must add to or subtract from the control variable 
using one of the following supported operations:
+=
-=
++ (prefix or postfix)
-- (prefix or postfix)


From this, I think people would expect the increment to use a 
user-defined operator+=/-=/++/--, but your code above uses operator+/- 
instead.



+   -fcilkplus must be enabled t use %_Cilk_for%);


to


+cp_parser_cilk_for (cp_parser *parser, tree grain)


Please reuse cp_parser_omp_for, like Aldy did for #pragma simd 
(cp_parser_cilk_simd) rather than write yet another for-statement 
parser.  This should reduce the patch size quite a bit.



+case PRAGMA_CILK_GRAINSIZE:
+  if (context == pragma_external)
+   {
+ error_at (pragma_tok-location,
+   %#pragma cilk grainsize% may only be be used inside a 
+   function);
+ break;
+   }
+
+  /* Ignore the pragma if Cilk Plus is not enabled.  */
+  if (flag_enable_cilkplus)
+   {
+ cp_parser_cilk_grainsize (parser, pragma_tok);
+ return true;
+   }
 default:


Do you mean to fall through to the default case if Cilk+ is not enabled?


+   tmp = CILK_FOR_COND (t);
+   if (COMPARISON_CLASS_P (tmp))
+ {
+   tree op0 = RECUR (TREE_OPERAND (tmp, 0));
+   tree op1 = RECUR (TREE_OPERAND (tmp, 1));
+   tmp = build2 (TREE_CODE (tmp), boolean_type_node, op0, op1);
+ }
+   CILK_FOR_COND (stmt) = tmp;


Why not just recur into CILK_FOR_COND?


+   tmp = CILK_FOR_EXPR (t);
+   if (TREE_CODE (tmp) == MODIFY_EXPR)
+ {
+   tree lhs = TREE_OPERAND (tmp, 0);
+   tree rhs = TREE_OPERAND (tmp, 1);
+   lhs = RECUR (lhs);
+   rhs = build2 (TREE_CODE (rhs), TREE_TYPE (lhs),
+ RECUR (TREE_OPERAND (rhs, 0)),
+ RECUR (TREE_OPERAND (rhs, 1)));
+   tmp = build2 (MODIFY_EXPR, void_type_node, lhs, rhs);
+ }
+   else
+ tmp = build2 (TREE_CODE (tmp), void_type_node,
+   RECUR (TREE_OPERAND (tmp, 0)),
+   RECUR (TREE_OPERAND (tmp, 1)));
+   finish_for_expr (tmp, stmt);


And CILK_FOR_EXPR?

Jason



Re: [PING]RE: [PATCH] _Cilk_for for C and C++

2013-11-27 Thread Jason Merrill

On 11/27/2013 12:06 PM, Jason Merrill wrote:

On 11/26/2013 12:23 PM, Iyer, Balaji V wrote:

Did you get a chance to look at my _Cilk_for patch for C?


BTW, I think pinging less than 24 hours after you send the patch is a
bit excessive.  :)


Ah, I see, you were pinging the non-C++ parts.

Jason




Re: [PATCH] _Cilk_for for C and C++

2013-11-27 Thread Jason Merrill

On 11/15/2013 02:23 PM, Iyer, Balaji V wrote:

One small thing that I have not done that Jakub and several other have asked me 
before is that, there are no tests in c-c++-common for _Cilk_for. The reason being 
that the syntax between C and C++ implementations are different. In C++, the 
induction variable must be defined in the initializer (e.g. it should start wth 
_Cilk_for (int ii = 0)). In C, this is not allowed (e.g. it should start as 
_Cilk_for (ii = 0; ii  10; ii++)).


That can be handled with #ifdef __cplusplus.

Jason



Re: [PATCH] _Cilk_for for C and C++

2013-11-27 Thread Jakub Jelinek
On Wed, Nov 27, 2013 at 12:48:11PM -0500, Jason Merrill wrote:
 On 11/15/2013 02:23 PM, Iyer, Balaji V wrote:
 One small thing that I have not done that Jakub and several other have asked 
 me before is that, there are no tests in c-c++-common for _Cilk_for. The 
 reason being that the syntax between C and C++ implementations are 
 different. In C++, the induction variable must be defined in the initializer 
 (e.g. it should start wth _Cilk_for (int ii = 0)). In C, this is not 
 allowed (e.g. it should start as _Cilk_for (ii = 0; ii  10; ii++)).
 
 That can be handled with #ifdef __cplusplus.

It isn't allowed even in C99?  For OpenMP,
int a[30];

void
foo ()
{
  #pragma omp for
  for (int i = 0; i  30; i++)
a[i] = i;
}
is valid for C99.  So, perhaps you just want
/* { dg-additional-options -std=c99 { target c } } */
in the c-c++-common tests?

Jakub


Re: [PATCH] _Cilk_for for C and C++

2013-11-27 Thread Aldy Hernandez

On 11/27/13 10:54, Jakub Jelinek wrote:

On Wed, Nov 27, 2013 at 12:48:11PM -0500, Jason Merrill wrote:

On 11/15/2013 02:23 PM, Iyer, Balaji V wrote:

One small thing that I have not done that Jakub and several other have asked me 
before is that, there are no tests in c-c++-common for _Cilk_for. The reason being 
that the syntax between C and C++ implementations are different. In C++, the 
induction variable must be defined in the initializer (e.g. it should start wth 
_Cilk_for (int ii = 0)). In C, this is not allowed (e.g. it should start as 
_Cilk_for (ii = 0; ii  10; ii++)).


That can be handled with #ifdef __cplusplus.


It isn't allowed even in C99?  For OpenMP,
int a[30];

void
foo ()
{
   #pragma omp for
   for (int i = 0; i  30; i++)
 a[i] = i;
}
is valid for C99.  So, perhaps you just want
/* { dg-additional-options -std=c99 { target c } } */
in the c-c++-common tests?

Jakub



Yup, that's what I did for the Cilk Plus pragma simd tests.


Re: [PATCH] _Cilk_for for C and C++

2013-11-27 Thread Jeff Law

On 11/27/13 10:06, Jason Merrill wrote:

On 11/25/2013 11:03 PM, Iyer, Balaji V wrote:

On a broad note, I think there's a lot of OpenMP code you could be
reusing here rather than writing it all again.  And that way Cilk code
will benefit from improvements to OpenMP handling, and vice versa.  It
probably makes sense to turn Cilk_for into an OMP_FOR loop, and then
gimplify into GIMPLE_OMP_FOR, rather than create a new tree code and
handle everything at the tree level.  But I don't know the OMP code well
enough to suggest exactly how that would work.
That's certainly the direction I'd like to see this work go as well.  To 
the fullest extent possible Cilk+ should be layering on top of the 
OpenMP 4 work -- ie, Cilk+ should really be dealing with parsing issues, 
then handoff to OpenMP for the real work.


Jeff


RE: [PATCH] _Cilk_for for C and C++

2013-11-27 Thread Iyer, Balaji V


 -Original Message-
 From: Jeff Law [mailto:l...@redhat.com]
 Sent: Wednesday, November 27, 2013 2:43 PM
 To: Jason Merrill; Iyer, Balaji V; Aldy Hernandez
 Cc: gcc-patches@gcc.gnu.org; r...@redhat.com; Jakub Jelinek
 Subject: Re: [PATCH] _Cilk_for for C and C++
 
 On 11/27/13 10:06, Jason Merrill wrote:
  On 11/25/2013 11:03 PM, Iyer, Balaji V wrote:
 
  On a broad note, I think there's a lot of OpenMP code you could be
  reusing here rather than writing it all again.  And that way Cilk code
  will benefit from improvements to OpenMP handling, and vice versa.  It
  probably makes sense to turn Cilk_for into an OMP_FOR loop, and then
  gimplify into GIMPLE_OMP_FOR, rather than create a new tree code and
  handle everything at the tree level.  But I don't know the OMP code
  well enough to suggest exactly how that would work.
 That's certainly the direction I'd like to see this work go as well.  To the 
 fullest
 extent possible Cilk+ should be layering on top of the OpenMP 4 work -- ie,
 Cilk+ should really be dealing with parsing issues, then handoff to OpenMP
 for the real work.

Hello Jeff and Jason,
I completely agree with you that there are certain parts of Cilk Plus 
that is similar to OMP4, namely #pragma simd and SIMD-enabled functions 
(formerly called elemental functions). But, the Cilk keywords is almost 
completely orthogonal to OpenMP. They are semantically different  and one 
cannot be transformed to another. Cilk uses automatically load-balanced 
work-stealing using the Cilk runtime, whereas OMP uses work sharing via OMP 
runtime. There are a number of other semantic differences but this is the 
core-issue. #pragma simd and #pragma omp have converged in several places but 
the Cilk part has always been different from OpenMP.

I have thought about sharing routines with OpenMP and have done it in 
several parts of Cilk plus.  It is not possible to share any middle end work 
between Cilk keywords and OpenMP because they are fundamentally different. I 
have shared some parsing parts with omp  in C.

Since we are talking about _Cilk_for loops, maybe an example of how a 
compiler is supposed break down a _Cilk_for loop will help. Please see the 
example below. It is a simple main routine with one _Cilk_for in it and it 
returns a local variable X that may or may not be read/written in the body:


int main (void)
{
int X = 0;
_Cilk_for (int ii = 5; ii  15; ii++)
{
body
}
return X;
}

This program is converted to the following:

/* Low and high fields are passed in by the runtime using the user defined 
grainsize or the rumtime 
 computed one. Data field is ignored in GCC, please see below.  */

cilk_for_helper_function  (void *data, int low, int high) {
for (ii = low; ii  high; ii++)
body;
}

int main (void)
{
int X = 0;
/* This function is actually a call the the runtime whose 
implementation is in
libcilkrts/runtime/cilk-abi-cilk-for.c.  */
__cilkrts_cilk_for_64   (__cilk_for_001,   /* Nested/Lambda function */
  __cilk_for_001,   /* Data used by the lambda 
function, the runtime

  does not worry about it.  It is an interface to 

  pass the information to the lambda function. In   


  GCC we create a nested function so it is 

   ignored.  */
10/* loop_count 
(15-5) */, 
0  /* grain 
value from the 

 #pragma grainsize pragma */   );


/* Note: if the trip-count is 32 bit then __cilkrts_cilk_for_64 is 
replaced by   __cilkrts_cilk_for_32  */

return X;
}


As you can tell, this is not how openmp handles a #pragma omp for loop.

Thanks,

Balaji V. Iyer.

 
 Jeff


Re: [PATCH] _Cilk_for for C and C++

2013-11-27 Thread Jeff Law

On 11/18/13 14:50, Iyer, Balaji V wrote:


Attached, please find a refreshed patches (one for C and 1 for C++).  The trunk 
was diffed after Aldy's check in of pragma simd was in. So, now this patch is 
only dependent on _Cilk_spawn and _Cilk_sync (mostly for execution of tests). They are 
tested on x86_64 and works successfully.

Here are the fixed Changelog entries (C related changelogs are given first then 
C++):
I'm just getting started on the C stuff.  This likely will be a partial 
review.


First, it looks like you've got a ton of unrelated cleanup work going on 
in c-family/cilk.c.  Function/parameter renaming and the like.  If this 
stuff is important, can it go forward independently?  It's certainly 
hard to find Cilk_for stuff in there with all the unrelated changes 
going on.  Some of the stuff seems to have moved into cilk.h which is 
probably good too, but probably should be separated from the Cilk_for 
support patch as well.



The semantics of cilk_check_loop_difference_type seem a bit odd.  Can 
you explain a bit more how those semantics were choosen?   And given the 
implementation, isn't it impossible for it to return NULL_TREE, meaning 
that check in c_extract_cilk_for_fields is pointless?


Isn't cilk_simplify_tree just used in the validation code?  Can it be 
moved into that file and privatized?  It probably needs a better name 
too.  I have to also admit, it looks pretty bogus to just start 
stripping things like that.  Oh yea, declaring and using tree_ssa_... is 
not OK there, thus it's not ok to use STRIP_USELESS_TYPE_CONVERSION.


In general, could you do a pass over those functions with external 
linkage and if they're only used in one file, move them to that file and 
privatize them?


So Cilk_for has a type?  That's the impression I get by looking at 
cilk_loop_convert.  In reality, I think it's just poorly named.  I've 
got to believe there's something, somewhere that will do equivalent 
conversions for you :-)


There's a mismatch between direction tracking between 
cilk_calc_forward_div_op and cilk_compute_incr_direction.  THe former 
handles -1, the latter never returns it.  I'd probably prefer to see the 
direction information represented as an ENUM anyway.

.
Presumably you're using nested functions to encapsulate the body for the 
Cilk_for loop?  Does the nested function have access to the parent's 
context?


If the verification routines have to stay, then isn't there a lot of 
commonality between (for example) cilk_set_incr_info and 
c_check_cilk_loop_incr?  Anything worth refactoring there?  Probably not 
I guess


I must have missed something -- when can we get a CONTINUE_STMT outside 
a loop body?  See cilk_resolve_continue_stmts.



In c-common.h, you added several prototypes for things in c-cilkplus.c. 
 We're trying to avoid doing that, instead favoring putting them into 
c-cilkplus.h and including that where needed.


Similarly in c-tree.h



Is there any significant sharable code between parsing a normal C for 
statement a a Cilk_for that we could factor out?  What about the 
validation code?  Can any of that be shared with OpenMP?  What about the 
gimplification?


Are the syntatical differences so significant that we can't share any of 
that code?


Any particular reason why the runtime uses different names for the 32bit 
and 64bit Clik_for support?  ISTM you wouldn't ever be binding the 32bit 
and 64bit runtimes into a single application, so you could have just 
called it __cilkrts_cilk_for and be done with it.  I guess it's too late 
to change that now.  Or does the 32/64 split refer to the # bits 
necessary to hold the iteration counter?


vertical whitespace removed after the cilk_arrow function, causing it to 
end up immediately adjacent to the comment for the next function.  I 
suspect this was unintentional.



You define CILK_FOR_STMT and new gimplification routines to handle it. 
Is there any way you can funnel all this through the OpenMP 4 support?


It looks like you've got unrelated cleanups going on in c-family/cilk.c. 
 Function renaming and such.  Is that strictly related to Cilk_for 
support?  Much of it looks independent of Cilk_for support.  Seems to me 
like that should break out and go forward independently.


I'm going to have to look at this some more, but I wanted to give you as 
much feedback as I could before I disappear for the holidays.



jeff


Re: [PATCH] _Cilk_for for C and C++

2013-11-27 Thread Jason Merrill

On 11/27/2013 04:14 PM, Iyer, Balaji V wrote:

I completely agree with you that there are certain parts of Cilk Plus 
that is similar to OMP4, namely #pragma simd and SIMD-enabled functions 
(formerly called elemental functions). But, the Cilk keywords is almost 
completely orthogonal to OpenMP. They are semantically different  and one 
cannot be transformed to another. Cilk uses automatically load-balanced 
work-stealing using the Cilk runtime, whereas OMP uses work sharing via OMP 
runtime. There are a number of other semantic differences but this is the 
core-issue. #pragma simd and #pragma omp have converged in several places but 
the Cilk part has always been different from OpenMP.


Yes, Cilk for loops will use the Cilk runtime and OMP for loops will use 
the OMP runtime, but that doesn't mean they can't share a lot of the 
middle end code along the way.


We already have several different varieties of parallel/simd loops all 
represented by GIMPLE_OMP_FOR, and I think this could be another 
GP_OMP_FOR_KIND_.


...

As you can tell, this is not how openmp handles a #pragma omp for loop.


It's different in detail, but #pragma omp parallel for works very 
similarly: it creates a separate function for the body of the loop and 
then passes that to GOMP_parallel along with any shared data.


Jason



[PING]RE: [PATCH] _Cilk_for for C and C++

2013-11-26 Thread Iyer, Balaji V
Hi Jeff et al.,
Did you get a chance to look at my _Cilk_for patch for C?

Thanks,

Balaji V. Iyer.

 -Original Message-
 From: Iyer, Balaji V
 Sent: Monday, November 18, 2013 4:51 PM
 To: Aldy Hernandez
 Cc: gcc-patches@gcc.gnu.org; Jeff Law; Jason Merrill (ja...@redhat.com);
 r...@redhat.com
 Subject: RE: [PATCH] _Cilk_for for C and C++
 
 Hello Everyone,
  Please see my comment below:
 
  -Original Message-
  From: Aldy Hernandez [mailto:al...@redhat.com]
  Sent: Friday, November 15, 2013 4:51 PM
  To: Iyer, Balaji V
  Cc: gcc-patches@gcc.gnu.org; Jeff Law; Jason Merrill
  (ja...@redhat.com); r...@redhat.com
  Subject: Re: [PATCH] _Cilk_for for C and C++
 
  On 11/15/13 12:23, Iyer, Balaji V wrote:
 
   This patch is dependent on the following patches:
  
   #pragma simd work (they both share the same parser routines)
 
  I have just committed this to trunk, so it shouldn't be a blocker.
 
  Also, in the past 2 days the #pragma simd parsing has been merged with
  the OpenMP parsing routines, so please adjust your patch accordingly.
 
   Attached, please find a refreshed patches (one for C and 1 for C++).
 The trunk was diffed after Aldy's check in of pragma simd was in. So, now
 this patch is only dependent on _Cilk_spawn and _Cilk_sync (mostly for
 execution of tests). They are tested on x86_64 and works successfully.
 
 Here are the fixed Changelog entries (C related changelogs are given first
 then C++):
 
 C- Related Changes
 ==
 ===
 gcc/ChangeLog.
 2013-11-18  Balaji V. Iyer  balaji.v.i...@intel.com
 
   * cilk-builtins.def: Added 2 builtin functions: __cilkrts_cilk_for_64
   and __cilkrts_cilk_for_32.
   * cilk-common.c (cilk_declare_looper): New function.
   (cilk_init_builtins): Added two calls to cilk_declare_looper.
   * cilk.h (enum cilk_tree_index): Added two enums:
 CILK_TI_F_LOOP_32
   and CILK_TI_F_LOOP_64.
   (enum add_variable_type): Moved here from c-family/cilk.c
   (enum cilk_block_type): Likewise.
   (struct wrapper_data): Likewise.
   (struct cilk_for_desc): New struct.
   (cilk_for_32_fndecl): New #define.
   (cilk_for_64_fndecl): Likewise.
   * tree.h (CILK_FOR_INIT): Likewise.
   (CILK_FOR_COND): Likewise.
   (CILK_FOR_EXPR): Likewise.
   (CILK_FOR_BODY): Likewise.
   (CILK_FOR_SCOPE): Likewise.
   (CILK_FOR_GRAIN): Likewise.
   (CILK_FOR_VAR): Likewise.
   * gimplify.c (gimplify_expr): Added CILK_FOR_STMT case.
   * tree-pretty-print.c (dump_generic_node): Likewise.
   * langhooks-def.h (LANG_HOOKS_CILKPLUS_GIMPLIFY_CILK_FOR):
 New
   #define.
   (LANG_HOOKS_CILKPLUS): Added
 LANG_HOOKS_CILKPLUS_GIMPLIFY_FOR field.
   * langhooks.h (struct lang_hooks_for_cilkplus): Added a new field
   gimplify_cilk_for.
   * tree.def: Added a new tree CILK_FOR_STMT.
 
 gcc/c-family/ChangeLog.
 2013-11-18  Balaji V. Iyer  balaji.v.i...@intel.com
 
   * c-cilkplus.c (c_check_cilk_loop_incr): New function.
   (c_validate_cilk_plus_loop): Likewise.
   (c_check_cilk_loop): Likewise.
   (c_finish_cilk_for_loops): Likewise.
   (cp_finish_cilk_for_loops): Likewise.
   * c-common.c (c_common_resword): Added _Cilk_for keyword.
   * c-common.h (enum rid): Added RID_CILK_FOR.
   (cp_finish_cilk_for_loop): New prototype.
   (c_finish_cilk_for_loop): Likewise.
   (c_validate_cilk_loop): Likewise.
   (c_check_cilk_loop): Likewise.
   (cilk_init_fd): Likewise.
   (cilk_extract_free_variables): Likewise.
   (cilk_create_cilk_helper_decl): Likewise.
   (cilk_call_graph_add_fn): Likewise.
   (cilk_outline_body): Likewise.
   (cilk_check_loop_difference_type): Likewise.
   (declare_cilk_for_parms): Likewise.
   (declare_cilk_for_vars): Likewise.
   (cilk_loop_convert): Likewise.
   (cilk_divide_count): Likewise.
   (cilk_calc_forward_div_op): Likewise.
   (cilk_compute_loop_count): Likewise.
   (insert_cilk_for_nested_fn): Likewise.
   (cilk_compute_loop_var): Likewise.
   (cilk_set_inclusive_and_direction): Likewise.
   (cilk_set_iter_difftype): Likewise.
   (cilk_set_incr_info): Likewise.
   (cilk_set_init_info): Likewise.
   (clk_simplify_tree): Likewise.
   (cilk_find_code_from_call): Likewise.
   (cilk_tree_operand_noconv): Likewise.
   (cilk_resolve_continue_stmts): Likewise.
   * c-pragma.c (init_pragma): Added pragma grainsize.
   * c-pragma.h (enum pragma_kind): Added
 PRAGMA_CILK_GRAINSIZE.
   * cilk.c (enum add_variable_type): Moved to ../cilk.h.
   (enum cilk_block_type): Likewise.
   (struct wrapper_data): Likewise.
   (cilk_call_graph_add_fn): New function.
   (cilk_create_cilk_helper_decl): Likewise.
   (cilk_outline): Renamed to cilk_outline_body.  Also added a
 parameter
   to hold throw flag for C

Re: [PATCH] _Cilk_for for C and C++

2013-11-22 Thread Jason Merrill

On 11/18/2013 04:50 PM, Iyer, Balaji V wrote:

+  int flags = LOOKUP_PROTECT | LOOKUP_ONLYCONVERTING;


Why not LOOKUP_NORMAL? LOOKUP_ONLYCONVERTING isn't relevant in this context.


+  tree exp = build_new_op (EXPR_LOCATION (op1), code, flags, op0, op1,
+  NULL_TREE, NULL, 0);


Use tf_none instead of 0.


+  if (exp == error_mark_node)
+exp = build_x_modify_expr (EXPR_LOCATION (op1), op0, code, op1, tf_none);
+  if (exp  exp != error_mark_node)
+return exp;


This doesn't make sense to me.  build_x_modify_expr takes codes like 
PLUS_EXPR and then does an assignment afterward; we don't want to 
quietly do += just because there's some error with the evaluation of the 
+ operation.  What is this code trying to do?



+/* Handler for iterator to compute the loop variable.  ADD_OP indicates
+   whether we need a '+' or '-' operation. LOW indicates the starting point
+   and LOOP_VAR is the induction variable.  Returns an expression (or a
+   STATEMENT_LIST of expressions).  If it is unable to find the appropriate
+   iteration, then it returns an error mark node and its parent will set
+   the loop as invalid.  */


This doesn't explain what VAR2 is.  And it seems like you're also using 
LOW as the increment?



+  tree new_stmt = build_x_modify_expr (loc, new_var, INIT_EXPR,
+  build_zero_cst (TREE_TYPE (new_var)),
+  tf_warning_or_error);
+  if (new_stmt == error_mark_node)
+   return error_mark_node;
+  append_to_statement_list (new_stmt, exp);
+  new_stmt = build_x_modify_expr (loc, new_var, NOP_EXPR, low,
+ tf_warning_or_error);


Why assign 0 if you're going to immediately assign low afterwards?


+  /* We have to manually create this loop for two reasons:
+ a. We need to have access to continue and start label since we need
+to resolve continue and breaks by hand.


Why do you need to resolve them by hand?  It looks like break isn't even 
allowed.



+ b. C++ doesn't provide a c_finish_loop function like C does.  */


Why is that important?


   sk_for,   /* The scope of the variable declared in a
for-init-statement.  */
+  sk_cilk_for,   /* The scope of the variable declared in _Cilk_for init
+   statement.  */


How is this different from a normal for-init-statement?  Nothing seems 
to use it.


Jason



Re: [PATCH] _Cilk_for for C and C++

2013-11-19 Thread Aldy Hernandez



One small thing that I have not done that Jakub and several other
have asked me before is that, there are no tests in c-c++-common for
_Cilk_for. The reason being that the syntax between C and C++
implementations are different. In C++, the induction variable must be
defined in the initializer (e.g. it should start wth _Cilk_for (int
ii = 0)). In C, this is not allowed (e.g. it should start as
_Cilk_for (ii = 0; ii  10; ii++)).


For pragma simd what I did was put the tests in c-c++-common and pass 
-std=c99 to the C tests.  That should allow declaration at initialization.


Can you do this?

Aldy


Re: [PATCH] _Cilk_for for C and C++

2013-11-15 Thread Aldy Hernandez

On 11/15/13 12:23, Iyer, Balaji V wrote:


This patch is dependent on the following patches:

#pragma simd work (they both share the same parser routines)


I have just committed this to trunk, so it shouldn't be a blocker.

Also, in the past 2 days the #pragma simd parsing has been merged with 
the OpenMP parsing routines, so please adjust your patch accordingly.


Thanks.
Aldy