Re: [ewg] Interop test failure using OFED-3.5 RC4

2013-01-14 Thread Elken, Tom
BTW,
Mike posted an alternate patch to the Bug 2410, which removed hard-coded values 
for _all_ HCAs by using ibv_query_device() to query the HCA.  
Thankfully, Ido used that alternate patch.

-Tom

> -Original Message-
> From: Marciniszyn, Mike
> Sent: Monday, January 14, 2013 9:58 AM
> To: Woodruff, Robert J; Ido Shamai
> Cc: Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean; Mascarenhas, Edward;
> Tziporet Koren; rsda...@soft-forge.com
> Subject: RE: Interop test failure using OFED-3.5 RC4
> 
> The new package has been posted, and I verified that the qib <-> qib issue is
> gone with the new tar ball.Ido has RESOLVED bz 2410 as well.
> 
> Interop could be done with the new perftest/rc4 or just wait for the next RC.
> 
> Mike
> 
> > -Original Message-
> > From: Woodruff, Robert J
> > Sent: Monday, January 14, 2013 12:52 PM
> > To: Ido Shamai; Marciniszyn, Mike
> > Cc: Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean; Mascarenhas,
> Edward;
> > Tziporet Koren
> > Subject: RE: Interop test failure using OFED-3.5 RC4
> >
> > Were you able to get the new package posted yet ?
> >
> > We need this ASAP so we can do another OFED-3.5 RC.
> >
> > Woody
> >
> >
> > -Original Message-
> > From: Ido Shamai [mailto:i...@dev.mellanox.co.il]
> > Sent: Friday, January 11, 2013 12:32 PM
> > To: Marciniszyn, Mike
> > Cc: Woodruff, Robert J; Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean;
> > Mascarenhas, Edward
> > Subject: Re: Interop test failure using OFED-3.5 RC4
> >
> > On 1/11/2013 9:36 PM, Marciniszyn, Mike wrote:
> > > I've opened OFED bz 2410 for this issue.
> > >
> > > Mike
> >
> > Great thanks.
> > I will apply the patch and release a new version to OFED website tomorrow
> > morning.
> >
> > Ido
> >
> > >> -Original Message-
> > >> From: Woodruff, Robert J
> > >> Sent: Friday, January 11, 2013 1:30 PM
> > >> To: Marciniszyn, Mike; Elken, Tom; ewg@lists.openfabrics.org; Ido
> > >> Shamai
> > >> Subject: RE: Interop test failure using OFED-3.5 RC4
> > >>
> > >>
> > >> Adding Shamai from Mellanox to this thread.
> > >>
> > >> Woody
> > >>
> > >> -Original Message-
> > >> From: ewg-boun...@lists.openfabrics.org [mailto:ewg-
> > >> boun...@lists.openfabrics.org] On Behalf Of Marciniszyn, Mike
> > >> Sent: Friday, January 11, 2013 7:51 AM
> > >> To: Elken, Tom; ewg@lists.openfabrics.org
> > >> Subject: Re: [ewg] Interop test failure using OFED-3.5 RC4
> > >>
> > >> This is definitely a perftest bug.
> > >>
> > >> This is a significant re-write of these utilities and this bug is a
> > >> regression in the routine ctx_set_out_reads().
> > >>
> > >> In 1.4 the code is this:
> > >>
> >
> /
> > >> **
> > >>   *
> > >>
> > >>
> >
> 
> > >> **/
> > >> static int ctx_set_out_reads(struct ibv_context *context,int
> > >> num_user_reads) {
> > >>
> > >>
> > >>  int max_reads;
> > >>
> > >>  max_reads = (is_dev_hermon(context) == HERMON) ?
> > >> MAX_OUT_READ_HERMON : MAX_OUT_READ;<---
> > >>
> > >>  if (num_user_reads > max_reads) {
> > >>  fprintf(stderr," Number of outstanding reads is
> > >> above max = %d\n",max_reads);
> > >>  fprintf(stderr," Changing to that max value\n");
> > >>  num_user_reads = max_reads;
> > >>  }
> > >>  else if (num_user_reads <= 0) {
> > >>  num_user_reads = max_reads;
> > >>  }
> > >>
> > >>  return num_user_reads;
> > >> }
> > >>
> > >> The new 2.0 code is:
> > >>
> >
> /
> > >> **
> > >>   *
> > >>
> > >>
> >
> 
> > >> **/
> > 

Re: [ewg] Interop test failure using OFED-3.5 RC4

2013-01-14 Thread Woodruff, Robert J
Does anyone know of any other show stopper bugs that are yet to be resolved ?

If not, we can do an RC5 for final testing.

-Original Message-
From: Marciniszyn, Mike 
Sent: Monday, January 14, 2013 9:58 AM
To: Woodruff, Robert J; Ido Shamai
Cc: Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean; Mascarenhas, Edward; 
Tziporet Koren; rsda...@soft-forge.com
Subject: RE: Interop test failure using OFED-3.5 RC4

The new package has been posted, and I verified that the qib <-> qib issue is 
gone with the new tar ball.Ido has RESOLVED bz 2410 as well.

Interop could be done with the new perftest/rc4 or just wait for the next RC.

Mike

> -Original Message-
> From: Woodruff, Robert J
> Sent: Monday, January 14, 2013 12:52 PM
> To: Ido Shamai; Marciniszyn, Mike
> Cc: Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean; Mascarenhas, Edward;
> Tziporet Koren
> Subject: RE: Interop test failure using OFED-3.5 RC4
> 
> Were you able to get the new package posted yet ?
> 
> We need this ASAP so we can do another OFED-3.5 RC.
> 
> Woody
> 
> 
> -Original Message-
> From: Ido Shamai [mailto:i...@dev.mellanox.co.il]
> Sent: Friday, January 11, 2013 12:32 PM
> To: Marciniszyn, Mike
> Cc: Woodruff, Robert J; Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean;
> Mascarenhas, Edward
> Subject: Re: Interop test failure using OFED-3.5 RC4
> 
> On 1/11/2013 9:36 PM, Marciniszyn, Mike wrote:
> > I've opened OFED bz 2410 for this issue.
> >
> > Mike
> 
> Great thanks.
> I will apply the patch and release a new version to OFED website tomorrow
> morning.
> 
> Ido
> 
> >> -Original Message-
> >> From: Woodruff, Robert J
> >> Sent: Friday, January 11, 2013 1:30 PM
> >> To: Marciniszyn, Mike; Elken, Tom; ewg@lists.openfabrics.org; Ido
> >> Shamai
> >> Subject: RE: Interop test failure using OFED-3.5 RC4
> >>
> >>
> >> Adding Shamai from Mellanox to this thread.
> >>
> >> Woody
> >>
> >> -Original Message-----
> >> From: ewg-boun...@lists.openfabrics.org [mailto:ewg-
> >> boun...@lists.openfabrics.org] On Behalf Of Marciniszyn, Mike
> >> Sent: Friday, January 11, 2013 7:51 AM
> >> To: Elken, Tom; ewg@lists.openfabrics.org
> >> Subject: Re: [ewg] Interop test failure using OFED-3.5 RC4
> >>
> >> This is definitely a perftest bug.
> >>
> >> This is a significant re-write of these utilities and this bug is a
> >> regression in the routine ctx_set_out_reads().
> >>
> >> In 1.4 the code is this:
> >>
> /
> >> **
> >>   *
> >>
> >>
> 
> >> **/
> >> static int ctx_set_out_reads(struct ibv_context *context,int
> >> num_user_reads) {
> >>
> >>
> >>  int max_reads;
> >>
> >>  max_reads = (is_dev_hermon(context) == HERMON) ?
> >> MAX_OUT_READ_HERMON : MAX_OUT_READ;<---
> >>
> >>  if (num_user_reads > max_reads) {
> >>  fprintf(stderr," Number of outstanding reads is
> >> above max = %d\n",max_reads);
> >>  fprintf(stderr," Changing to that max value\n");
> >>  num_user_reads = max_reads;
> >>  }
> >>  else if (num_user_reads <= 0) {
> >>  num_user_reads = max_reads;
> >>  }
> >>
> >>  return num_user_reads;
> >> }
> >>
> >> The new 2.0 code is:
> >>
> /
> >> **
> >>   *
> >>
> >>
> 
> >> **/
> >> static int ctx_set_out_reads(struct ibv_context *context,int
> >> num_user_reads) {
> >>
> >>
> >>  int max_reads;
> >>
> >>  Device ib_fdev = ib_dev_name(context);
> >>
> >>  switch (ib_fdev) {
> >>  case CONNECTIB : ;
> >>  case CONNECTX3 : ;
> >>  case CONNECTX2 : ;
> >>  case CONNECTX : max_reads = MAX_OUT_READ_HERMON;
> break;
> >>  case LEGACY : max_reads = MAX_OUT_READ; break;

Re: [ewg] Interop test failure using OFED-3.5 RC4

2013-01-14 Thread Marciniszyn, Mike
The new package has been posted, and I verified that the qib <-> qib issue is 
gone with the new tar ball.Ido has RESOLVED bz 2410 as well.

Interop could be done with the new perftest/rc4 or just wait for the next RC.

Mike

> -Original Message-
> From: Woodruff, Robert J
> Sent: Monday, January 14, 2013 12:52 PM
> To: Ido Shamai; Marciniszyn, Mike
> Cc: Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean; Mascarenhas, Edward;
> Tziporet Koren
> Subject: RE: Interop test failure using OFED-3.5 RC4
> 
> Were you able to get the new package posted yet ?
> 
> We need this ASAP so we can do another OFED-3.5 RC.
> 
> Woody
> 
> 
> -Original Message-
> From: Ido Shamai [mailto:i...@dev.mellanox.co.il]
> Sent: Friday, January 11, 2013 12:32 PM
> To: Marciniszyn, Mike
> Cc: Woodruff, Robert J; Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean;
> Mascarenhas, Edward
> Subject: Re: Interop test failure using OFED-3.5 RC4
> 
> On 1/11/2013 9:36 PM, Marciniszyn, Mike wrote:
> > I've opened OFED bz 2410 for this issue.
> >
> > Mike
> 
> Great thanks.
> I will apply the patch and release a new version to OFED website tomorrow
> morning.
> 
> Ido
> 
> >> -Original Message-
> >> From: Woodruff, Robert J
> >> Sent: Friday, January 11, 2013 1:30 PM
> >> To: Marciniszyn, Mike; Elken, Tom; ewg@lists.openfabrics.org; Ido
> >> Shamai
> >> Subject: RE: Interop test failure using OFED-3.5 RC4
> >>
> >>
> >> Adding Shamai from Mellanox to this thread.
> >>
> >> Woody
> >>
> >> -Original Message-----
> >> From: ewg-boun...@lists.openfabrics.org [mailto:ewg-
> >> boun...@lists.openfabrics.org] On Behalf Of Marciniszyn, Mike
> >> Sent: Friday, January 11, 2013 7:51 AM
> >> To: Elken, Tom; ewg@lists.openfabrics.org
> >> Subject: Re: [ewg] Interop test failure using OFED-3.5 RC4
> >>
> >> This is definitely a perftest bug.
> >>
> >> This is a significant re-write of these utilities and this bug is a
> >> regression in the routine ctx_set_out_reads().
> >>
> >> In 1.4 the code is this:
> >>
> /
> >> **
> >>   *
> >>
> >>
> 
> >> **/
> >> static int ctx_set_out_reads(struct ibv_context *context,int
> >> num_user_reads) {
> >>
> >>
> >>  int max_reads;
> >>
> >>  max_reads = (is_dev_hermon(context) == HERMON) ?
> >> MAX_OUT_READ_HERMON : MAX_OUT_READ;<---
> >>
> >>  if (num_user_reads > max_reads) {
> >>  fprintf(stderr," Number of outstanding reads is
> >> above max = %d\n",max_reads);
> >>  fprintf(stderr," Changing to that max value\n");
> >>  num_user_reads = max_reads;
> >>  }
> >>  else if (num_user_reads <= 0) {
> >>  num_user_reads = max_reads;
> >>  }
> >>
> >>  return num_user_reads;
> >> }
> >>
> >> The new 2.0 code is:
> >>
> /
> >> **
> >>   *
> >>
> >>
> 
> >> **/
> >> static int ctx_set_out_reads(struct ibv_context *context,int
> >> num_user_reads) {
> >>
> >>
> >>  int max_reads;
> >>
> >>  Device ib_fdev = ib_dev_name(context);
> >>
> >>  switch (ib_fdev) {
> >>  case CONNECTIB : ;
> >>  case CONNECTX3 : ;
> >>  case CONNECTX2 : ;
> >>  case CONNECTX : max_reads = MAX_OUT_READ_HERMON;
> break;
> >>  case LEGACY : max_reads = MAX_OUT_READ; break;
> >>  default : max_reads = 0; <
> >>  }
> >>
> >>  if (num_user_reads > max_reads) {
> >>  printf(RESULT_LINE);
> >>  fprintf(stderr," Number of outstanding reads is
> >> above max = %d\n",max_reads);
> >>  fprintf(stderr," Changing to that max value\n");
> >>  num_user_reads

Re: [ewg] Interop test failure using OFED-3.5 RC4

2013-01-14 Thread Woodruff, Robert J
Were you able to get the new package posted yet ?

We need this ASAP so we can do another OFED-3.5 RC.

Woody


-Original Message-
From: Ido Shamai [mailto:i...@dev.mellanox.co.il] 
Sent: Friday, January 11, 2013 12:32 PM
To: Marciniszyn, Mike
Cc: Woodruff, Robert J; Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean; 
Mascarenhas, Edward
Subject: Re: Interop test failure using OFED-3.5 RC4

On 1/11/2013 9:36 PM, Marciniszyn, Mike wrote:
> I've opened OFED bz 2410 for this issue.
>
> Mike

Great thanks.
I will apply the patch and release a new version to OFED website 
tomorrow morning.

Ido

>> -Original Message-
>> From: Woodruff, Robert J
>> Sent: Friday, January 11, 2013 1:30 PM
>> To: Marciniszyn, Mike; Elken, Tom; ewg@lists.openfabrics.org; Ido Shamai
>> Subject: RE: Interop test failure using OFED-3.5 RC4
>>
>>
>> Adding Shamai from Mellanox to this thread.
>>
>> Woody
>>
>> -Original Message-
>> From: ewg-boun...@lists.openfabrics.org [mailto:ewg-
>> boun...@lists.openfabrics.org] On Behalf Of Marciniszyn, Mike
>> Sent: Friday, January 11, 2013 7:51 AM
>> To: Elken, Tom; ewg@lists.openfabrics.org
>> Subject: Re: [ewg] Interop test failure using OFED-3.5 RC4
>>
>> This is definitely a perftest bug.
>>
>> This is a significant re-write of these utilities and this bug is a 
>> regression in the
>> routine ctx_set_out_reads().
>>
>> In 1.4 the code is this:
>> /
>> **
>>   *
>>
>> 
>> **/
>> static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) 
>> {
>>
>>
>>  int max_reads;
>>
>>  max_reads = (is_dev_hermon(context) == HERMON) ?
>> MAX_OUT_READ_HERMON : MAX_OUT_READ;<---
>>
>>  if (num_user_reads > max_reads) {
>>  fprintf(stderr," Number of outstanding reads is above max =
>> %d\n",max_reads);
>>  fprintf(stderr," Changing to that max value\n");
>>  num_user_reads = max_reads;
>>  }
>>  else if (num_user_reads <= 0) {
>>  num_user_reads = max_reads;
>>  }
>>
>>  return num_user_reads;
>> }
>>
>> The new 2.0 code is:
>> /
>> **
>>   *
>>
>> 
>> **/
>> static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) 
>> {
>>
>>
>>  int max_reads;
>>
>>  Device ib_fdev = ib_dev_name(context);
>>
>>  switch (ib_fdev) {
>>  case CONNECTIB : ;
>>  case CONNECTX3 : ;
>>  case CONNECTX2 : ;
>>  case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break;
>>  case LEGACY : max_reads = MAX_OUT_READ; break;
>>  default : max_reads = 0; <
>>  }
>>
>>  if (num_user_reads > max_reads) {
>>  printf(RESULT_LINE);
>>  fprintf(stderr," Number of outstanding reads is above max =
>> %d\n",max_reads);
>>  fprintf(stderr," Changing to that max value\n");
>>  num_user_reads = max_reads;
>>  }
>>  else if (num_user_reads <= 0) {
>>  num_user_reads = max_reads;
>>  }
>>
>>  return num_user_reads;
>> }
>>
>> The old code will return MAX_OUT_READ, while the new code for any other
>> HCAs (qib and probably others), will return 0.
>>
>> I have a patch that works, while preserving the desired hardcoded values for
>> "known/legacy" devices:
>> +
>> +/***
>> ***
>> +
>> + *
>> +
>> +***
>> 
>> +***/ static int device_max_reads(struct ibv_context *context) {
>> +   struct ibv_device_attr attr;
>> +   int ret = 0;
>> +
>> +   if (!ibv_query_device(context,&attr)) {
>> +   ret = attr.max_qp_rd_atom;
>> +   }
>> +   return ret;
>> +}
>> +
>>
>> /

Re: [ewg] Interop test failure using OFED-3.5 RC4

2013-01-12 Thread Marciniszyn, Mike
I'm curious why the device query value cannot be used in all cases?

Mike

> -Original Message-
> From: Ido Shamai [mailto:i...@dev.mellanox.co.il]
> Sent: Friday, January 11, 2013 3:32 PM
> To: Marciniszyn, Mike
> Cc: Woodruff, Robert J; Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean;
> Mascarenhas, Edward
> Subject: Re: Interop test failure using OFED-3.5 RC4
> 
> On 1/11/2013 9:36 PM, Marciniszyn, Mike wrote:
> > I've opened OFED bz 2410 for this issue.
> >
> > Mike
> 
> Great thanks.
> I will apply the patch and release a new version to OFED website tomorrow
> morning.
> 
> Ido
> 
> >> -Original Message-
> >> From: Woodruff, Robert J
> >> Sent: Friday, January 11, 2013 1:30 PM
> >> To: Marciniszyn, Mike; Elken, Tom; ewg@lists.openfabrics.org; Ido
> >> Shamai
> >> Subject: RE: Interop test failure using OFED-3.5 RC4
> >>
> >>
> >> Adding Shamai from Mellanox to this thread.
> >>
> >> Woody
> >>
> >> -Original Message-
> >> From: ewg-boun...@lists.openfabrics.org [mailto:ewg-
> >> boun...@lists.openfabrics.org] On Behalf Of Marciniszyn, Mike
> >> Sent: Friday, January 11, 2013 7:51 AM
> >> To: Elken, Tom; ewg@lists.openfabrics.org
> >> Subject: Re: [ewg] Interop test failure using OFED-3.5 RC4
> >>
> >> This is definitely a perftest bug.
> >>
> >> This is a significant re-write of these utilities and this bug is a
> >> regression in the routine ctx_set_out_reads().
> >>
> >> In 1.4 the code is this:
> >>
> /
> >> **
> >>   *
> >>
> >>
> 
> >> **/
> >> static int ctx_set_out_reads(struct ibv_context *context,int
> >> num_user_reads) {
> >>
> >>
> >>  int max_reads;
> >>
> >>  max_reads = (is_dev_hermon(context) == HERMON) ?
> >> MAX_OUT_READ_HERMON : MAX_OUT_READ;<---
> >>
> >>  if (num_user_reads > max_reads) {
> >>  fprintf(stderr," Number of outstanding reads is
> >> above max = %d\n",max_reads);
> >>  fprintf(stderr," Changing to that max value\n");
> >>  num_user_reads = max_reads;
> >>  }
> >>  else if (num_user_reads <= 0) {
> >>  num_user_reads = max_reads;
> >>  }
> >>
> >>  return num_user_reads;
> >> }
> >>
> >> The new 2.0 code is:
> >>
> /
> >> **
> >>   *
> >>
> >>
> 
> >> **/
> >> static int ctx_set_out_reads(struct ibv_context *context,int
> >> num_user_reads) {
> >>
> >>
> >>  int max_reads;
> >>
> >>  Device ib_fdev = ib_dev_name(context);
> >>
> >>  switch (ib_fdev) {
> >>  case CONNECTIB : ;
> >>  case CONNECTX3 : ;
> >>  case CONNECTX2 : ;
> >>  case CONNECTX : max_reads = MAX_OUT_READ_HERMON;
> break;
> >>  case LEGACY : max_reads = MAX_OUT_READ; break;
> >>  default : max_reads = 0; <
> >>  }
> >>
> >>  if (num_user_reads > max_reads) {
> >>  printf(RESULT_LINE);
> >>  fprintf(stderr," Number of outstanding reads is
> >> above max = %d\n",max_reads);
> >>  fprintf(stderr," Changing to that max value\n");
> >>  num_user_reads = max_reads;
> >>  }
> >>  else if (num_user_reads <= 0) {
> >>  num_user_reads = max_reads;
> >>  }
> >>
> >>  return num_user_reads;
> >> }
> >>
> >> The old code will return MAX_OUT_READ, while the new code for any
> >> other HCAs (qib and probably others), will return 0.
> >>
> >> I have a patch that works, while preserving the desired hardcoded
> >> values for "known/legacy" devices:
> >> +
> >>
> +/***

Re: [ewg] Interop test failure using OFED-3.5 RC4

2013-01-11 Thread Ido Shamai

On 1/11/2013 9:36 PM, Marciniszyn, Mike wrote:

I've opened OFED bz 2410 for this issue.

Mike


Great thanks.
I will apply the patch and release a new version to OFED website 
tomorrow morning.


Ido


-Original Message-
From: Woodruff, Robert J
Sent: Friday, January 11, 2013 1:30 PM
To: Marciniszyn, Mike; Elken, Tom; ewg@lists.openfabrics.org; Ido Shamai
Subject: RE: Interop test failure using OFED-3.5 RC4


Adding Shamai from Mellanox to this thread.

Woody

-Original Message-
From: ewg-boun...@lists.openfabrics.org [mailto:ewg-
boun...@lists.openfabrics.org] On Behalf Of Marciniszyn, Mike
Sent: Friday, January 11, 2013 7:51 AM
To: Elken, Tom; ewg@lists.openfabrics.org
Subject: Re: [ewg] Interop test failure using OFED-3.5 RC4

This is definitely a perftest bug.

This is a significant re-write of these utilities and this bug is a regression 
in the
routine ctx_set_out_reads().

In 1.4 the code is this:
/
**
  *


**/
static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) {


 int max_reads;

 max_reads = (is_dev_hermon(context) == HERMON) ?
MAX_OUT_READ_HERMON : MAX_OUT_READ;<---

 if (num_user_reads > max_reads) {
 fprintf(stderr," Number of outstanding reads is above max =
%d\n",max_reads);
 fprintf(stderr," Changing to that max value\n");
 num_user_reads = max_reads;
 }
 else if (num_user_reads <= 0) {
 num_user_reads = max_reads;
 }

 return num_user_reads;
}

The new 2.0 code is:
/
**
  *


**/
static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) {


 int max_reads;

 Device ib_fdev = ib_dev_name(context);

 switch (ib_fdev) {
 case CONNECTIB : ;
 case CONNECTX3 : ;
 case CONNECTX2 : ;
 case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break;
 case LEGACY : max_reads = MAX_OUT_READ; break;
 default : max_reads = 0; <
 }

 if (num_user_reads > max_reads) {
 printf(RESULT_LINE);
 fprintf(stderr," Number of outstanding reads is above max =
%d\n",max_reads);
 fprintf(stderr," Changing to that max value\n");
 num_user_reads = max_reads;
 }
 else if (num_user_reads <= 0) {
 num_user_reads = max_reads;
 }

 return num_user_reads;
}

The old code will return MAX_OUT_READ, while the new code for any other
HCAs (qib and probably others), will return 0.

I have a patch that works, while preserving the desired hardcoded values for
"known/legacy" devices:
+
+/***
***
+
+ *
+
+***

+***/ static int device_max_reads(struct ibv_context *context) {
+   struct ibv_device_attr attr;
+   int ret = 0;
+
+   if (!ibv_query_device(context,&attr)) {
+   ret = attr.max_qp_rd_atom;
+   }
+   return ret;
+}
+

/
**
   *


**/
@@ -496,7 +510,7 @@ static int ctx_set_out_reads(struct ibv_
 case CONNECTX2 : ;
 case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break;
 case LEGACY : max_reads = MAX_OUT_READ; break;
-   default : max_reads = 0;
+   default : max_reads = device_max_reads(context);
 }

 if (num_user_reads > max_reads) {

I'm curious why the old and new code used hardcoded values?

Mike
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Interop test failure using OFED-3.5 RC4

2013-01-11 Thread Marciniszyn, Mike
I've opened OFED bz 2410 for this issue.

Mike

> -Original Message-
> From: Woodruff, Robert J
> Sent: Friday, January 11, 2013 1:30 PM
> To: Marciniszyn, Mike; Elken, Tom; ewg@lists.openfabrics.org; Ido Shamai
> Subject: RE: Interop test failure using OFED-3.5 RC4
> 
> 
> Adding Shamai from Mellanox to this thread.
> 
> Woody
> 
> -Original Message-
> From: ewg-boun...@lists.openfabrics.org [mailto:ewg-
> boun...@lists.openfabrics.org] On Behalf Of Marciniszyn, Mike
> Sent: Friday, January 11, 2013 7:51 AM
> To: Elken, Tom; ewg@lists.openfabrics.org
> Subject: Re: [ewg] Interop test failure using OFED-3.5 RC4
> 
> This is definitely a perftest bug.
> 
> This is a significant re-write of these utilities and this bug is a 
> regression in the
> routine ctx_set_out_reads().
> 
> In 1.4 the code is this:
> /
> **
>  *
> 
> 
> **/
> static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) {
> 
> 
> int max_reads;
> 
> max_reads = (is_dev_hermon(context) == HERMON) ?
> MAX_OUT_READ_HERMON : MAX_OUT_READ;<---
> 
> if (num_user_reads > max_reads) {
> fprintf(stderr," Number of outstanding reads is above max =
> %d\n",max_reads);
> fprintf(stderr," Changing to that max value\n");
> num_user_reads = max_reads;
> }
> else if (num_user_reads <= 0) {
> num_user_reads = max_reads;
> }
> 
> return num_user_reads;
> }
> 
> The new 2.0 code is:
> /
> **
>  *
> 
> 
> **/
> static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) {
> 
> 
> int max_reads;
> 
> Device ib_fdev = ib_dev_name(context);
> 
> switch (ib_fdev) {
> case CONNECTIB : ;
> case CONNECTX3 : ;
> case CONNECTX2 : ;
> case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break;
> case LEGACY : max_reads = MAX_OUT_READ; break;
> default : max_reads = 0; <
> }
> 
> if (num_user_reads > max_reads) {
> printf(RESULT_LINE);
> fprintf(stderr," Number of outstanding reads is above max =
> %d\n",max_reads);
> fprintf(stderr," Changing to that max value\n");
> num_user_reads = max_reads;
> }
> else if (num_user_reads <= 0) {
> num_user_reads = max_reads;
> }
> 
> return num_user_reads;
> }
> 
> The old code will return MAX_OUT_READ, while the new code for any other
> HCAs (qib and probably others), will return 0.
> 
> I have a patch that works, while preserving the desired hardcoded values for
> "known/legacy" devices:
> +
> +/***
> ***
> +
> + *
> +
> +***
> 
> +***/ static int device_max_reads(struct ibv_context *context) {
> +   struct ibv_device_attr attr;
> +   int ret = 0;
> +
> +   if (!ibv_query_device(context,&attr)) {
> +   ret = attr.max_qp_rd_atom;
> +   }
> +   return ret;
> +}
> +
> 
> /
> **
>   *
> 
> 
> **/
> @@ -496,7 +510,7 @@ static int ctx_set_out_reads(struct ibv_
> case CONNECTX2 : ;
> case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break;
> case LEGACY : max_reads = MAX_OUT_READ; break;
> -   default : max_reads = 0;
> +   default : max_reads = device_max_reads(context);
> }
> 
> if (num_user_reads > max_reads) {
> 
> I'm curious why the old and new code used hardcoded values?
> 
> Mike
> ___
> ewg mailing list
> ewg@lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Interop test failure using OFED-3.5 RC4

2013-01-11 Thread Woodruff, Robert J

Adding Shamai from Mellanox to this thread.

Woody

-Original Message-
From: ewg-boun...@lists.openfabrics.org 
[mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of Marciniszyn, Mike
Sent: Friday, January 11, 2013 7:51 AM
To: Elken, Tom; ewg@lists.openfabrics.org
Subject: Re: [ewg] Interop test failure using OFED-3.5 RC4

This is definitely a perftest bug.

This is a significant re-write of these utilities and this bug is a regression 
in the routine ctx_set_out_reads().

In 1.4 the code is this:
/**
 *
 **/
static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) {


int max_reads;

max_reads = (is_dev_hermon(context) == HERMON) ? MAX_OUT_READ_HERMON : 
MAX_OUT_READ;<---

if (num_user_reads > max_reads) {
fprintf(stderr," Number of outstanding reads is above max = 
%d\n",max_reads);
fprintf(stderr," Changing to that max value\n");
num_user_reads = max_reads;
}
else if (num_user_reads <= 0) {
num_user_reads = max_reads;
}

return num_user_reads;
}

The new 2.0 code is:
/**
 *
 **/
static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) {


int max_reads;

Device ib_fdev = ib_dev_name(context);

switch (ib_fdev) {
case CONNECTIB : ;
case CONNECTX3 : ;
case CONNECTX2 : ;
case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break;
case LEGACY : max_reads = MAX_OUT_READ; break;
default : max_reads = 0; <
}

if (num_user_reads > max_reads) {
printf(RESULT_LINE);
fprintf(stderr," Number of outstanding reads is above max = 
%d\n",max_reads);
fprintf(stderr," Changing to that max value\n");
num_user_reads = max_reads;
}
else if (num_user_reads <= 0) {
num_user_reads = max_reads;
}

return num_user_reads;
}

The old code will return MAX_OUT_READ, while the new code for any other HCAs 
(qib and probably others), will return 0.

I have a patch that works, while preserving the desired hardcoded values for 
"known/legacy" devices:
+
+/**
+ *
+ 
**/
+static int device_max_reads(struct ibv_context *context) {
+   struct ibv_device_attr attr;
+   int ret = 0;
+
+   if (!ibv_query_device(context,&attr)) {
+   ret = attr.max_qp_rd_atom;
+   }
+   return ret;
+}
+
 /**
  *
  
**/
@@ -496,7 +510,7 @@ static int ctx_set_out_reads(struct ibv_
case CONNECTX2 : ;
case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break;
case LEGACY : max_reads = MAX_OUT_READ; break;
-   default : max_reads = 0;
+   default : max_reads = device_max_reads(context);
}

if (num_user_reads > max_reads) {

I'm curious why the old and new code used hardcoded values?

Mike
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Interop test failure using OFED-3.5 RC4

2013-01-11 Thread Ido Shamai

On 1/11/2013 7:20 AM, Hefty, Sean wrote:

We have investigated and found that perftest was upgraded from v1.8 to v2.0
on 11/19/12, between RC3 and RC4.

Hi,

We did move from perftest-1.4 to perftest-2.0 last month.
It has the same logic and results as the older version + plenty of new 
features.

Can u tell me more of the problem?

Ido


Er, I meant "between RC2 and RC3."

Why would there be a _major_ version change in any component done in the middle 
of a release cycle?!
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg



___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Interop test failure using OFED-3.5 RC4

2013-01-11 Thread Marciniszyn, Mike
This is definitely a perftest bug.

This is a significant re-write of these utilities and this bug is a regression 
in the routine ctx_set_out_reads().

In 1.4 the code is this:
/**
 *
 **/
static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) {


int max_reads;

max_reads = (is_dev_hermon(context) == HERMON) ? MAX_OUT_READ_HERMON : 
MAX_OUT_READ;<---

if (num_user_reads > max_reads) {
fprintf(stderr," Number of outstanding reads is above max = 
%d\n",max_reads);
fprintf(stderr," Changing to that max value\n");
num_user_reads = max_reads;
}
else if (num_user_reads <= 0) {
num_user_reads = max_reads;
}

return num_user_reads;
}

The new 2.0 code is:
/**
 *
 **/
static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) {


int max_reads;

Device ib_fdev = ib_dev_name(context);

switch (ib_fdev) {
case CONNECTIB : ;
case CONNECTX3 : ;
case CONNECTX2 : ;
case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break;
case LEGACY : max_reads = MAX_OUT_READ; break;
default : max_reads = 0; <
}

if (num_user_reads > max_reads) {
printf(RESULT_LINE);
fprintf(stderr," Number of outstanding reads is above max = 
%d\n",max_reads);
fprintf(stderr," Changing to that max value\n");
num_user_reads = max_reads;
}
else if (num_user_reads <= 0) {
num_user_reads = max_reads;
}

return num_user_reads;
}

The old code will return MAX_OUT_READ, while the new code for any other HCAs 
(qib and probably others), will return 0.

I have a patch that works, while preserving the desired hardcoded values for 
"known/legacy" devices:
+
+/**
+ *
+ 
**/
+static int device_max_reads(struct ibv_context *context) {
+   struct ibv_device_attr attr;
+   int ret = 0;
+
+   if (!ibv_query_device(context,&attr)) {
+   ret = attr.max_qp_rd_atom;
+   }
+   return ret;
+}
+
 /**
  *
  
**/
@@ -496,7 +510,7 @@ static int ctx_set_out_reads(struct ibv_
case CONNECTX2 : ;
case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break;
case LEGACY : max_reads = MAX_OUT_READ; break;
-   default : max_reads = 0;
+   default : max_reads = device_max_reads(context);
}

if (num_user_reads > max_reads) {

I'm curious why the old and new code used hardcoded values?

Mike
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Interop test failure using OFED-3.5 RC4

2013-01-11 Thread Woodruff, Robert J
Tom wrote,
> The EWG standard practice is that if a significant bug fix goes in, we would 
> need another RC to enable others to easily test it.
> But perhaps it depends on whether the bug is in perftest, qib or elsewhere.  
> In any case, we don't want a GA build until this > issue is solved.


Yes, this will require another RC.

Woody

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Interop test failure using OFED-3.5 RC4

2013-01-11 Thread Marciniszyn, Mike
> We have investigated and found that perftest was upgraded from v1.8 to v2.0

Tom, I was mistaken.   The older perftest version is 1.4.

Mike
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Interop test failure using OFED-3.5 RC4

2013-01-10 Thread Hefty, Sean
> > We have investigated and found that perftest was upgraded from v1.8 to v2.0
> > on 11/19/12, between RC3 and RC4.
> 
> Er, I meant "between RC2 and RC3."

Why would there be a _major_ version change in any component done in the middle 
of a release cycle?!
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Interop test failure using OFED-3.5 RC4

2013-01-10 Thread Elken, Tom
Rupert and the UNH-IOL pointed out that an Interop test which uses the  
ib_read_bw (perftest) benchmark fails on Intel True Scale HCAs.
This test was succeeding with RC2, and started failing with RC3.  I am sorry 
that our QA team did not find this bug with RC3.

We have investigated and found that perftest was upgraded from v1.8 to v2.0 on 
11/19/12, between RC3 and RC4.
We verified that with the qib driver in OFED-3.5 RC4 and the perftest RPM from 
RC2, we pass the tests.
We also ran a similar qperf RDMA read test with qperf and qib from RC4 and that 
test passed.

We are working to isolate the bug and develop a fix.  We suspect the perftest 
changes, but the ib_read_* benchmarks may just have changed enough to start 
checking a part of the spec which hasn't been tested before in Interop tests.  
So it may be a qib driver issue.

The EWG standard practice is that if a significant bug fix goes in, we would 
need another RC to enable others to easily test it.
But perhaps it depends on whether the bug is in perftest, qib or elsewhere.  In 
any case, we don't want a GA build until this issue is solved.

Regards,
Tom 
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Interop test failure using OFED-3.5 RC4

2013-01-10 Thread Elken, Tom
> Rupert and the UNH-IOL pointed out that an Interop test which uses the
> ib_read_bw (perftest) benchmark fails on Intel True Scale HCAs.
> This test was succeeding with RC2, and started failing with RC3.  I am sorry 
> that
> our QA team did not find this bug with RC3.
> 
> We have investigated and found that perftest was upgraded from v1.8 to v2.0
> on 11/19/12, between RC3 and RC4.
 
Er, I meant "between RC2 and RC3."

-Tom

> We verified that with the qib driver in OFED-3.5 RC4 and the perftest RPM from
> RC2, we pass the tests.
> We also ran a similar qperf RDMA read test with qperf and qib from RC4 and 
> that
> test passed.
> 
> We are working to isolate the bug and develop a fix.  We suspect the perftest
> changes, but the ib_read_* benchmarks may just have changed enough to start
> checking a part of the spec which hasn't been tested before in Interop tests. 
>  So
> it may be a qib driver issue.
> 
> The EWG standard practice is that if a significant bug fix goes in, we would 
> need
> another RC to enable others to easily test it.
> But perhaps it depends on whether the bug is in perftest, qib or elsewhere.  
> In
> any case, we don't want a GA build until this issue is solved.
> 
> Regards,
> Tom
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg