Re: [ewg] Interop test failure using OFED-3.5 RC4

2013-01-14 Thread Woodruff, Robert J
Were you able to get the new package posted yet ?

We need this ASAP so we can do another OFED-3.5 RC.

Woody


-Original Message-
From: Ido Shamai [mailto:i...@dev.mellanox.co.il] 
Sent: Friday, January 11, 2013 12:32 PM
To: Marciniszyn, Mike
Cc: Woodruff, Robert J; Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean; 
Mascarenhas, Edward
Subject: Re: Interop test failure using OFED-3.5 RC4

On 1/11/2013 9:36 PM, Marciniszyn, Mike wrote:
 I've opened OFED bz 2410 for this issue.

 Mike

Great thanks.
I will apply the patch and release a new version to OFED website 
tomorrow morning.

Ido

 -Original Message-
 From: Woodruff, Robert J
 Sent: Friday, January 11, 2013 1:30 PM
 To: Marciniszyn, Mike; Elken, Tom; ewg@lists.openfabrics.org; Ido Shamai
 Subject: RE: Interop test failure using OFED-3.5 RC4


 Adding Shamai from Mellanox to this thread.

 Woody

 -Original Message-
 From: ewg-boun...@lists.openfabrics.org [mailto:ewg-
 boun...@lists.openfabrics.org] On Behalf Of Marciniszyn, Mike
 Sent: Friday, January 11, 2013 7:51 AM
 To: Elken, Tom; ewg@lists.openfabrics.org
 Subject: Re: [ewg] Interop test failure using OFED-3.5 RC4

 This is definitely a perftest bug.

 This is a significant re-write of these utilities and this bug is a 
 regression in the
 routine ctx_set_out_reads().

 In 1.4 the code is this:
 /
 **
   *

 
 **/
 static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) 
 {


  int max_reads;

  max_reads = (is_dev_hermon(context) == HERMON) ?
 MAX_OUT_READ_HERMON : MAX_OUT_READ;---

  if (num_user_reads  max_reads) {
  fprintf(stderr, Number of outstanding reads is above max =
 %d\n,max_reads);
  fprintf(stderr, Changing to that max value\n);
  num_user_reads = max_reads;
  }
  else if (num_user_reads = 0) {
  num_user_reads = max_reads;
  }

  return num_user_reads;
 }

 The new 2.0 code is:
 /
 **
   *

 
 **/
 static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) 
 {


  int max_reads;

  Device ib_fdev = ib_dev_name(context);

  switch (ib_fdev) {
  case CONNECTIB : ;
  case CONNECTX3 : ;
  case CONNECTX2 : ;
  case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break;
  case LEGACY : max_reads = MAX_OUT_READ; break;
  default : max_reads = 0; 
  }

  if (num_user_reads  max_reads) {
  printf(RESULT_LINE);
  fprintf(stderr, Number of outstanding reads is above max =
 %d\n,max_reads);
  fprintf(stderr, Changing to that max value\n);
  num_user_reads = max_reads;
  }
  else if (num_user_reads = 0) {
  num_user_reads = max_reads;
  }

  return num_user_reads;
 }

 The old code will return MAX_OUT_READ, while the new code for any other
 HCAs (qib and probably others), will return 0.

 I have a patch that works, while preserving the desired hardcoded values for
 known/legacy devices:
 +
 +/***
 ***
 +
 + *
 +
 +***
 
 +***/ static int device_max_reads(struct ibv_context *context) {
 +   struct ibv_device_attr attr;
 +   int ret = 0;
 +
 +   if (!ibv_query_device(context,attr)) {
 +   ret = attr.max_qp_rd_atom;
 +   }
 +   return ret;
 +}
 +

 /
 **
*

 
 **/
 @@ -496,7 +510,7 @@ static int ctx_set_out_reads(struct ibv_
  case CONNECTX2 : ;
  case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break;
  case LEGACY : max_reads = MAX_OUT_READ; break;
 -   default : max_reads = 0;
 +   default : max_reads = device_max_reads(context);
  }

  if (num_user_reads  max_reads) {

 I'm curious why the old and new code used hardcoded values?

 Mike
 ___
 ewg mailing list
 ewg@lists.openfabrics.org
 http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Interop test failure using OFED-3.5 RC4

2013-01-14 Thread Marciniszyn, Mike
The new package has been posted, and I verified that the qib - qib issue is 
gone with the new tar ball.Ido has RESOLVED bz 2410 as well.

Interop could be done with the new perftest/rc4 or just wait for the next RC.

Mike

 -Original Message-
 From: Woodruff, Robert J
 Sent: Monday, January 14, 2013 12:52 PM
 To: Ido Shamai; Marciniszyn, Mike
 Cc: Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean; Mascarenhas, Edward;
 Tziporet Koren
 Subject: RE: Interop test failure using OFED-3.5 RC4
 
 Were you able to get the new package posted yet ?
 
 We need this ASAP so we can do another OFED-3.5 RC.
 
 Woody
 
 
 -Original Message-
 From: Ido Shamai [mailto:i...@dev.mellanox.co.il]
 Sent: Friday, January 11, 2013 12:32 PM
 To: Marciniszyn, Mike
 Cc: Woodruff, Robert J; Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean;
 Mascarenhas, Edward
 Subject: Re: Interop test failure using OFED-3.5 RC4
 
 On 1/11/2013 9:36 PM, Marciniszyn, Mike wrote:
  I've opened OFED bz 2410 for this issue.
 
  Mike
 
 Great thanks.
 I will apply the patch and release a new version to OFED website tomorrow
 morning.
 
 Ido
 
  -Original Message-
  From: Woodruff, Robert J
  Sent: Friday, January 11, 2013 1:30 PM
  To: Marciniszyn, Mike; Elken, Tom; ewg@lists.openfabrics.org; Ido
  Shamai
  Subject: RE: Interop test failure using OFED-3.5 RC4
 
 
  Adding Shamai from Mellanox to this thread.
 
  Woody
 
  -Original Message-
  From: ewg-boun...@lists.openfabrics.org [mailto:ewg-
  boun...@lists.openfabrics.org] On Behalf Of Marciniszyn, Mike
  Sent: Friday, January 11, 2013 7:51 AM
  To: Elken, Tom; ewg@lists.openfabrics.org
  Subject: Re: [ewg] Interop test failure using OFED-3.5 RC4
 
  This is definitely a perftest bug.
 
  This is a significant re-write of these utilities and this bug is a
  regression in the routine ctx_set_out_reads().
 
  In 1.4 the code is this:
 
 /
  **
*
 
 
 
  **/
  static int ctx_set_out_reads(struct ibv_context *context,int
  num_user_reads) {
 
 
   int max_reads;
 
   max_reads = (is_dev_hermon(context) == HERMON) ?
  MAX_OUT_READ_HERMON : MAX_OUT_READ;---
 
   if (num_user_reads  max_reads) {
   fprintf(stderr, Number of outstanding reads is
  above max = %d\n,max_reads);
   fprintf(stderr, Changing to that max value\n);
   num_user_reads = max_reads;
   }
   else if (num_user_reads = 0) {
   num_user_reads = max_reads;
   }
 
   return num_user_reads;
  }
 
  The new 2.0 code is:
 
 /
  **
*
 
 
 
  **/
  static int ctx_set_out_reads(struct ibv_context *context,int
  num_user_reads) {
 
 
   int max_reads;
 
   Device ib_fdev = ib_dev_name(context);
 
   switch (ib_fdev) {
   case CONNECTIB : ;
   case CONNECTX3 : ;
   case CONNECTX2 : ;
   case CONNECTX : max_reads = MAX_OUT_READ_HERMON;
 break;
   case LEGACY : max_reads = MAX_OUT_READ; break;
   default : max_reads = 0; 
   }
 
   if (num_user_reads  max_reads) {
   printf(RESULT_LINE);
   fprintf(stderr, Number of outstanding reads is
  above max = %d\n,max_reads);
   fprintf(stderr, Changing to that max value\n);
   num_user_reads = max_reads;
   }
   else if (num_user_reads = 0) {
   num_user_reads = max_reads;
   }
 
   return num_user_reads;
  }
 
  The old code will return MAX_OUT_READ, while the new code for any
  other HCAs (qib and probably others), will return 0.
 
  I have a patch that works, while preserving the desired hardcoded
  values for known/legacy devices:
  +
 
 +/***
  ***
  +
  + *
  +
 
 +***
  
  +***/ static int device_max_reads(struct ibv_context *context) {
  +   struct ibv_device_attr attr;
  +   int ret = 0;
  +
  +   if (!ibv_query_device(context,attr)) {
  +   ret = attr.max_qp_rd_atom;
  +   }
  +   return ret;
  +}
  +
 
 
 /
  **
 *
 
 
 
  **/
  @@ -496,7 +510,7 @@ static int ctx_set_out_reads(struct ibv_
   case CONNECTX2 : ;
   case CONNECTX : max_reads = MAX_OUT_READ_HERMON;
 break;
   case LEGACY

Re: [ewg] Interop test failure using OFED-3.5 RC4

2013-01-14 Thread Woodruff, Robert J
Does anyone know of any other show stopper bugs that are yet to be resolved ?

If not, we can do an RC5 for final testing.

-Original Message-
From: Marciniszyn, Mike 
Sent: Monday, January 14, 2013 9:58 AM
To: Woodruff, Robert J; Ido Shamai
Cc: Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean; Mascarenhas, Edward; 
Tziporet Koren; rsda...@soft-forge.com
Subject: RE: Interop test failure using OFED-3.5 RC4

The new package has been posted, and I verified that the qib - qib issue is 
gone with the new tar ball.Ido has RESOLVED bz 2410 as well.

Interop could be done with the new perftest/rc4 or just wait for the next RC.

Mike

 -Original Message-
 From: Woodruff, Robert J
 Sent: Monday, January 14, 2013 12:52 PM
 To: Ido Shamai; Marciniszyn, Mike
 Cc: Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean; Mascarenhas, Edward;
 Tziporet Koren
 Subject: RE: Interop test failure using OFED-3.5 RC4
 
 Were you able to get the new package posted yet ?
 
 We need this ASAP so we can do another OFED-3.5 RC.
 
 Woody
 
 
 -Original Message-
 From: Ido Shamai [mailto:i...@dev.mellanox.co.il]
 Sent: Friday, January 11, 2013 12:32 PM
 To: Marciniszyn, Mike
 Cc: Woodruff, Robert J; Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean;
 Mascarenhas, Edward
 Subject: Re: Interop test failure using OFED-3.5 RC4
 
 On 1/11/2013 9:36 PM, Marciniszyn, Mike wrote:
  I've opened OFED bz 2410 for this issue.
 
  Mike
 
 Great thanks.
 I will apply the patch and release a new version to OFED website tomorrow
 morning.
 
 Ido
 
  -Original Message-
  From: Woodruff, Robert J
  Sent: Friday, January 11, 2013 1:30 PM
  To: Marciniszyn, Mike; Elken, Tom; ewg@lists.openfabrics.org; Ido
  Shamai
  Subject: RE: Interop test failure using OFED-3.5 RC4
 
 
  Adding Shamai from Mellanox to this thread.
 
  Woody
 
  -Original Message-
  From: ewg-boun...@lists.openfabrics.org [mailto:ewg-
  boun...@lists.openfabrics.org] On Behalf Of Marciniszyn, Mike
  Sent: Friday, January 11, 2013 7:51 AM
  To: Elken, Tom; ewg@lists.openfabrics.org
  Subject: Re: [ewg] Interop test failure using OFED-3.5 RC4
 
  This is definitely a perftest bug.
 
  This is a significant re-write of these utilities and this bug is a
  regression in the routine ctx_set_out_reads().
 
  In 1.4 the code is this:
 
 /
  **
*
 
 
 
  **/
  static int ctx_set_out_reads(struct ibv_context *context,int
  num_user_reads) {
 
 
   int max_reads;
 
   max_reads = (is_dev_hermon(context) == HERMON) ?
  MAX_OUT_READ_HERMON : MAX_OUT_READ;---
 
   if (num_user_reads  max_reads) {
   fprintf(stderr, Number of outstanding reads is
  above max = %d\n,max_reads);
   fprintf(stderr, Changing to that max value\n);
   num_user_reads = max_reads;
   }
   else if (num_user_reads = 0) {
   num_user_reads = max_reads;
   }
 
   return num_user_reads;
  }
 
  The new 2.0 code is:
 
 /
  **
*
 
 
 
  **/
  static int ctx_set_out_reads(struct ibv_context *context,int
  num_user_reads) {
 
 
   int max_reads;
 
   Device ib_fdev = ib_dev_name(context);
 
   switch (ib_fdev) {
   case CONNECTIB : ;
   case CONNECTX3 : ;
   case CONNECTX2 : ;
   case CONNECTX : max_reads = MAX_OUT_READ_HERMON;
 break;
   case LEGACY : max_reads = MAX_OUT_READ; break;
   default : max_reads = 0; 
   }
 
   if (num_user_reads  max_reads) {
   printf(RESULT_LINE);
   fprintf(stderr, Number of outstanding reads is
  above max = %d\n,max_reads);
   fprintf(stderr, Changing to that max value\n);
   num_user_reads = max_reads;
   }
   else if (num_user_reads = 0) {
   num_user_reads = max_reads;
   }
 
   return num_user_reads;
  }
 
  The old code will return MAX_OUT_READ, while the new code for any
  other HCAs (qib and probably others), will return 0.
 
  I have a patch that works, while preserving the desired hardcoded
  values for known/legacy devices:
  +
 
 +/***
  ***
  +
  + *
  +
 
 +***
  
  +***/ static int device_max_reads(struct ibv_context *context) {
  +   struct ibv_device_attr attr;
  +   int ret = 0;
  +
  +   if (!ibv_query_device(context,attr)) {
  +   ret = attr.max_qp_rd_atom

Re: [ewg] Interop test failure using OFED-3.5 RC4

2013-01-14 Thread Elken, Tom
BTW,
Mike posted an alternate patch to the Bug 2410, which removed hard-coded values 
for _all_ HCAs by using ibv_query_device() to query the HCA.  
Thankfully, Ido used that alternate patch.

-Tom

 -Original Message-
 From: Marciniszyn, Mike
 Sent: Monday, January 14, 2013 9:58 AM
 To: Woodruff, Robert J; Ido Shamai
 Cc: Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean; Mascarenhas, Edward;
 Tziporet Koren; rsda...@soft-forge.com
 Subject: RE: Interop test failure using OFED-3.5 RC4
 
 The new package has been posted, and I verified that the qib - qib issue is
 gone with the new tar ball.Ido has RESOLVED bz 2410 as well.
 
 Interop could be done with the new perftest/rc4 or just wait for the next RC.
 
 Mike
 
  -Original Message-
  From: Woodruff, Robert J
  Sent: Monday, January 14, 2013 12:52 PM
  To: Ido Shamai; Marciniszyn, Mike
  Cc: Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean; Mascarenhas,
 Edward;
  Tziporet Koren
  Subject: RE: Interop test failure using OFED-3.5 RC4
 
  Were you able to get the new package posted yet ?
 
  We need this ASAP so we can do another OFED-3.5 RC.
 
  Woody
 
 
  -Original Message-
  From: Ido Shamai [mailto:i...@dev.mellanox.co.il]
  Sent: Friday, January 11, 2013 12:32 PM
  To: Marciniszyn, Mike
  Cc: Woodruff, Robert J; Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean;
  Mascarenhas, Edward
  Subject: Re: Interop test failure using OFED-3.5 RC4
 
  On 1/11/2013 9:36 PM, Marciniszyn, Mike wrote:
   I've opened OFED bz 2410 for this issue.
  
   Mike
 
  Great thanks.
  I will apply the patch and release a new version to OFED website tomorrow
  morning.
 
  Ido
 
   -Original Message-
   From: Woodruff, Robert J
   Sent: Friday, January 11, 2013 1:30 PM
   To: Marciniszyn, Mike; Elken, Tom; ewg@lists.openfabrics.org; Ido
   Shamai
   Subject: RE: Interop test failure using OFED-3.5 RC4
  
  
   Adding Shamai from Mellanox to this thread.
  
   Woody
  
   -Original Message-
   From: ewg-boun...@lists.openfabrics.org [mailto:ewg-
   boun...@lists.openfabrics.org] On Behalf Of Marciniszyn, Mike
   Sent: Friday, January 11, 2013 7:51 AM
   To: Elken, Tom; ewg@lists.openfabrics.org
   Subject: Re: [ewg] Interop test failure using OFED-3.5 RC4
  
   This is definitely a perftest bug.
  
   This is a significant re-write of these utilities and this bug is a
   regression in the routine ctx_set_out_reads().
  
   In 1.4 the code is this:
  
 
 /
   **
 *
  
  
 
 
   **/
   static int ctx_set_out_reads(struct ibv_context *context,int
   num_user_reads) {
  
  
int max_reads;
  
max_reads = (is_dev_hermon(context) == HERMON) ?
   MAX_OUT_READ_HERMON : MAX_OUT_READ;---
  
if (num_user_reads  max_reads) {
fprintf(stderr, Number of outstanding reads is
   above max = %d\n,max_reads);
fprintf(stderr, Changing to that max value\n);
num_user_reads = max_reads;
}
else if (num_user_reads = 0) {
num_user_reads = max_reads;
}
  
return num_user_reads;
   }
  
   The new 2.0 code is:
  
 
 /
   **
 *
  
  
 
 
   **/
   static int ctx_set_out_reads(struct ibv_context *context,int
   num_user_reads) {
  
  
int max_reads;
  
Device ib_fdev = ib_dev_name(context);
  
switch (ib_fdev) {
case CONNECTIB : ;
case CONNECTX3 : ;
case CONNECTX2 : ;
case CONNECTX : max_reads = MAX_OUT_READ_HERMON;
  break;
case LEGACY : max_reads = MAX_OUT_READ; break;
default : max_reads = 0; 
}
  
if (num_user_reads  max_reads) {
printf(RESULT_LINE);
fprintf(stderr, Number of outstanding reads is
   above max = %d\n,max_reads);
fprintf(stderr, Changing to that max value\n);
num_user_reads = max_reads;
}
else if (num_user_reads = 0) {
num_user_reads = max_reads;
}
  
return num_user_reads;
   }
  
   The old code will return MAX_OUT_READ, while the new code for any
   other HCAs (qib and probably others), will return 0.
  
   I have a patch that works, while preserving the desired hardcoded
   values for known/legacy devices

Re: [ewg] Interop test failure using OFED-3.5 RC4

2013-01-12 Thread Marciniszyn, Mike
I'm curious why the device query value cannot be used in all cases?

Mike

 -Original Message-
 From: Ido Shamai [mailto:i...@dev.mellanox.co.il]
 Sent: Friday, January 11, 2013 3:32 PM
 To: Marciniszyn, Mike
 Cc: Woodruff, Robert J; Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean;
 Mascarenhas, Edward
 Subject: Re: Interop test failure using OFED-3.5 RC4
 
 On 1/11/2013 9:36 PM, Marciniszyn, Mike wrote:
  I've opened OFED bz 2410 for this issue.
 
  Mike
 
 Great thanks.
 I will apply the patch and release a new version to OFED website tomorrow
 morning.
 
 Ido
 
  -Original Message-
  From: Woodruff, Robert J
  Sent: Friday, January 11, 2013 1:30 PM
  To: Marciniszyn, Mike; Elken, Tom; ewg@lists.openfabrics.org; Ido
  Shamai
  Subject: RE: Interop test failure using OFED-3.5 RC4
 
 
  Adding Shamai from Mellanox to this thread.
 
  Woody
 
  -Original Message-
  From: ewg-boun...@lists.openfabrics.org [mailto:ewg-
  boun...@lists.openfabrics.org] On Behalf Of Marciniszyn, Mike
  Sent: Friday, January 11, 2013 7:51 AM
  To: Elken, Tom; ewg@lists.openfabrics.org
  Subject: Re: [ewg] Interop test failure using OFED-3.5 RC4
 
  This is definitely a perftest bug.
 
  This is a significant re-write of these utilities and this bug is a
  regression in the routine ctx_set_out_reads().
 
  In 1.4 the code is this:
 
 /
  **
*
 
 
 
  **/
  static int ctx_set_out_reads(struct ibv_context *context,int
  num_user_reads) {
 
 
   int max_reads;
 
   max_reads = (is_dev_hermon(context) == HERMON) ?
  MAX_OUT_READ_HERMON : MAX_OUT_READ;---
 
   if (num_user_reads  max_reads) {
   fprintf(stderr, Number of outstanding reads is
  above max = %d\n,max_reads);
   fprintf(stderr, Changing to that max value\n);
   num_user_reads = max_reads;
   }
   else if (num_user_reads = 0) {
   num_user_reads = max_reads;
   }
 
   return num_user_reads;
  }
 
  The new 2.0 code is:
 
 /
  **
*
 
 
 
  **/
  static int ctx_set_out_reads(struct ibv_context *context,int
  num_user_reads) {
 
 
   int max_reads;
 
   Device ib_fdev = ib_dev_name(context);
 
   switch (ib_fdev) {
   case CONNECTIB : ;
   case CONNECTX3 : ;
   case CONNECTX2 : ;
   case CONNECTX : max_reads = MAX_OUT_READ_HERMON;
 break;
   case LEGACY : max_reads = MAX_OUT_READ; break;
   default : max_reads = 0; 
   }
 
   if (num_user_reads  max_reads) {
   printf(RESULT_LINE);
   fprintf(stderr, Number of outstanding reads is
  above max = %d\n,max_reads);
   fprintf(stderr, Changing to that max value\n);
   num_user_reads = max_reads;
   }
   else if (num_user_reads = 0) {
   num_user_reads = max_reads;
   }
 
   return num_user_reads;
  }
 
  The old code will return MAX_OUT_READ, while the new code for any
  other HCAs (qib and probably others), will return 0.
 
  I have a patch that works, while preserving the desired hardcoded
  values for known/legacy devices:
  +
 
 +/***
  ***
  +
  + *
  +
 
 +***
  
  +***/ static int device_max_reads(struct ibv_context *context) {
  +   struct ibv_device_attr attr;
  +   int ret = 0;
  +
  +   if (!ibv_query_device(context,attr)) {
  +   ret = attr.max_qp_rd_atom;
  +   }
  +   return ret;
  +}
  +
 
 
 /
  **
 *
 
 
 
  **/
  @@ -496,7 +510,7 @@ static int ctx_set_out_reads(struct ibv_
   case CONNECTX2 : ;
   case CONNECTX : max_reads = MAX_OUT_READ_HERMON;
 break;
   case LEGACY : max_reads = MAX_OUT_READ; break;
  -   default : max_reads = 0;
  +   default : max_reads = device_max_reads(context);
   }
 
   if (num_user_reads  max_reads) {
 
  I'm curious why the old and new code used hardcoded values?
 
  Mike
  ___
  ewg mailing list
  ewg@lists.openfabrics.org
  http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org

Re: [ewg] Interop test failure using OFED-3.5 RC4

2013-01-11 Thread Marciniszyn, Mike
 We have investigated and found that perftest was upgraded from v1.8 to v2.0

Tom, I was mistaken.   The older perftest version is 1.4.

Mike
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Interop test failure using OFED-3.5 RC4

2013-01-11 Thread Woodruff, Robert J
Tom wrote,
 The EWG standard practice is that if a significant bug fix goes in, we would 
 need another RC to enable others to easily test it.
 But perhaps it depends on whether the bug is in perftest, qib or elsewhere.  
 In any case, we don't want a GA build until this  issue is solved.


Yes, this will require another RC.

Woody

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Interop test failure using OFED-3.5 RC4

2013-01-11 Thread Marciniszyn, Mike
This is definitely a perftest bug.

This is a significant re-write of these utilities and this bug is a regression 
in the routine ctx_set_out_reads().

In 1.4 the code is this:
/**
 *
 **/
static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) {


int max_reads;

max_reads = (is_dev_hermon(context) == HERMON) ? MAX_OUT_READ_HERMON : 
MAX_OUT_READ;---

if (num_user_reads  max_reads) {
fprintf(stderr, Number of outstanding reads is above max = 
%d\n,max_reads);
fprintf(stderr, Changing to that max value\n);
num_user_reads = max_reads;
}
else if (num_user_reads = 0) {
num_user_reads = max_reads;
}

return num_user_reads;
}

The new 2.0 code is:
/**
 *
 **/
static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) {


int max_reads;

Device ib_fdev = ib_dev_name(context);

switch (ib_fdev) {
case CONNECTIB : ;
case CONNECTX3 : ;
case CONNECTX2 : ;
case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break;
case LEGACY : max_reads = MAX_OUT_READ; break;
default : max_reads = 0; 
}

if (num_user_reads  max_reads) {
printf(RESULT_LINE);
fprintf(stderr, Number of outstanding reads is above max = 
%d\n,max_reads);
fprintf(stderr, Changing to that max value\n);
num_user_reads = max_reads;
}
else if (num_user_reads = 0) {
num_user_reads = max_reads;
}

return num_user_reads;
}

The old code will return MAX_OUT_READ, while the new code for any other HCAs 
(qib and probably others), will return 0.

I have a patch that works, while preserving the desired hardcoded values for 
known/legacy devices:
+
+/**
+ *
+ 
**/
+static int device_max_reads(struct ibv_context *context) {
+   struct ibv_device_attr attr;
+   int ret = 0;
+
+   if (!ibv_query_device(context,attr)) {
+   ret = attr.max_qp_rd_atom;
+   }
+   return ret;
+}
+
 /**
  *
  
**/
@@ -496,7 +510,7 @@ static int ctx_set_out_reads(struct ibv_
case CONNECTX2 : ;
case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break;
case LEGACY : max_reads = MAX_OUT_READ; break;
-   default : max_reads = 0;
+   default : max_reads = device_max_reads(context);
}

if (num_user_reads  max_reads) {

I'm curious why the old and new code used hardcoded values?

Mike
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Interop test failure using OFED-3.5 RC4

2013-01-11 Thread Ido Shamai

On 1/11/2013 7:20 AM, Hefty, Sean wrote:

We have investigated and found that perftest was upgraded from v1.8 to v2.0
on 11/19/12, between RC3 and RC4.

Hi,

We did move from perftest-1.4 to perftest-2.0 last month.
It has the same logic and results as the older version + plenty of new 
features.

Can u tell me more of the problem?

Ido


Er, I meant between RC2 and RC3.

Why would there be a _major_ version change in any component done in the middle 
of a release cycle?!
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg



___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Interop test failure using OFED-3.5 RC4

2013-01-11 Thread Woodruff, Robert J

Adding Shamai from Mellanox to this thread.

Woody

-Original Message-
From: ewg-boun...@lists.openfabrics.org 
[mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of Marciniszyn, Mike
Sent: Friday, January 11, 2013 7:51 AM
To: Elken, Tom; ewg@lists.openfabrics.org
Subject: Re: [ewg] Interop test failure using OFED-3.5 RC4

This is definitely a perftest bug.

This is a significant re-write of these utilities and this bug is a regression 
in the routine ctx_set_out_reads().

In 1.4 the code is this:
/**
 *
 **/
static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) {


int max_reads;

max_reads = (is_dev_hermon(context) == HERMON) ? MAX_OUT_READ_HERMON : 
MAX_OUT_READ;---

if (num_user_reads  max_reads) {
fprintf(stderr, Number of outstanding reads is above max = 
%d\n,max_reads);
fprintf(stderr, Changing to that max value\n);
num_user_reads = max_reads;
}
else if (num_user_reads = 0) {
num_user_reads = max_reads;
}

return num_user_reads;
}

The new 2.0 code is:
/**
 *
 **/
static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) {


int max_reads;

Device ib_fdev = ib_dev_name(context);

switch (ib_fdev) {
case CONNECTIB : ;
case CONNECTX3 : ;
case CONNECTX2 : ;
case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break;
case LEGACY : max_reads = MAX_OUT_READ; break;
default : max_reads = 0; 
}

if (num_user_reads  max_reads) {
printf(RESULT_LINE);
fprintf(stderr, Number of outstanding reads is above max = 
%d\n,max_reads);
fprintf(stderr, Changing to that max value\n);
num_user_reads = max_reads;
}
else if (num_user_reads = 0) {
num_user_reads = max_reads;
}

return num_user_reads;
}

The old code will return MAX_OUT_READ, while the new code for any other HCAs 
(qib and probably others), will return 0.

I have a patch that works, while preserving the desired hardcoded values for 
known/legacy devices:
+
+/**
+ *
+ 
**/
+static int device_max_reads(struct ibv_context *context) {
+   struct ibv_device_attr attr;
+   int ret = 0;
+
+   if (!ibv_query_device(context,attr)) {
+   ret = attr.max_qp_rd_atom;
+   }
+   return ret;
+}
+
 /**
  *
  
**/
@@ -496,7 +510,7 @@ static int ctx_set_out_reads(struct ibv_
case CONNECTX2 : ;
case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break;
case LEGACY : max_reads = MAX_OUT_READ; break;
-   default : max_reads = 0;
+   default : max_reads = device_max_reads(context);
}

if (num_user_reads  max_reads) {

I'm curious why the old and new code used hardcoded values?

Mike
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Interop test failure using OFED-3.5 RC4

2013-01-11 Thread Marciniszyn, Mike
I've opened OFED bz 2410 for this issue.

Mike

 -Original Message-
 From: Woodruff, Robert J
 Sent: Friday, January 11, 2013 1:30 PM
 To: Marciniszyn, Mike; Elken, Tom; ewg@lists.openfabrics.org; Ido Shamai
 Subject: RE: Interop test failure using OFED-3.5 RC4
 
 
 Adding Shamai from Mellanox to this thread.
 
 Woody
 
 -Original Message-
 From: ewg-boun...@lists.openfabrics.org [mailto:ewg-
 boun...@lists.openfabrics.org] On Behalf Of Marciniszyn, Mike
 Sent: Friday, January 11, 2013 7:51 AM
 To: Elken, Tom; ewg@lists.openfabrics.org
 Subject: Re: [ewg] Interop test failure using OFED-3.5 RC4
 
 This is definitely a perftest bug.
 
 This is a significant re-write of these utilities and this bug is a 
 regression in the
 routine ctx_set_out_reads().
 
 In 1.4 the code is this:
 /
 **
  *
 
 
 **/
 static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) {
 
 
 int max_reads;
 
 max_reads = (is_dev_hermon(context) == HERMON) ?
 MAX_OUT_READ_HERMON : MAX_OUT_READ;---
 
 if (num_user_reads  max_reads) {
 fprintf(stderr, Number of outstanding reads is above max =
 %d\n,max_reads);
 fprintf(stderr, Changing to that max value\n);
 num_user_reads = max_reads;
 }
 else if (num_user_reads = 0) {
 num_user_reads = max_reads;
 }
 
 return num_user_reads;
 }
 
 The new 2.0 code is:
 /
 **
  *
 
 
 **/
 static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) {
 
 
 int max_reads;
 
 Device ib_fdev = ib_dev_name(context);
 
 switch (ib_fdev) {
 case CONNECTIB : ;
 case CONNECTX3 : ;
 case CONNECTX2 : ;
 case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break;
 case LEGACY : max_reads = MAX_OUT_READ; break;
 default : max_reads = 0; 
 }
 
 if (num_user_reads  max_reads) {
 printf(RESULT_LINE);
 fprintf(stderr, Number of outstanding reads is above max =
 %d\n,max_reads);
 fprintf(stderr, Changing to that max value\n);
 num_user_reads = max_reads;
 }
 else if (num_user_reads = 0) {
 num_user_reads = max_reads;
 }
 
 return num_user_reads;
 }
 
 The old code will return MAX_OUT_READ, while the new code for any other
 HCAs (qib and probably others), will return 0.
 
 I have a patch that works, while preserving the desired hardcoded values for
 known/legacy devices:
 +
 +/***
 ***
 +
 + *
 +
 +***
 
 +***/ static int device_max_reads(struct ibv_context *context) {
 +   struct ibv_device_attr attr;
 +   int ret = 0;
 +
 +   if (!ibv_query_device(context,attr)) {
 +   ret = attr.max_qp_rd_atom;
 +   }
 +   return ret;
 +}
 +
 
 /
 **
   *
 
 
 **/
 @@ -496,7 +510,7 @@ static int ctx_set_out_reads(struct ibv_
 case CONNECTX2 : ;
 case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break;
 case LEGACY : max_reads = MAX_OUT_READ; break;
 -   default : max_reads = 0;
 +   default : max_reads = device_max_reads(context);
 }
 
 if (num_user_reads  max_reads) {
 
 I'm curious why the old and new code used hardcoded values?
 
 Mike
 ___
 ewg mailing list
 ewg@lists.openfabrics.org
 http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Interop test failure using OFED-3.5 RC4

2013-01-11 Thread Ido Shamai

On 1/11/2013 9:36 PM, Marciniszyn, Mike wrote:

I've opened OFED bz 2410 for this issue.

Mike


Great thanks.
I will apply the patch and release a new version to OFED website 
tomorrow morning.


Ido


-Original Message-
From: Woodruff, Robert J
Sent: Friday, January 11, 2013 1:30 PM
To: Marciniszyn, Mike; Elken, Tom; ewg@lists.openfabrics.org; Ido Shamai
Subject: RE: Interop test failure using OFED-3.5 RC4


Adding Shamai from Mellanox to this thread.

Woody

-Original Message-
From: ewg-boun...@lists.openfabrics.org [mailto:ewg-
boun...@lists.openfabrics.org] On Behalf Of Marciniszyn, Mike
Sent: Friday, January 11, 2013 7:51 AM
To: Elken, Tom; ewg@lists.openfabrics.org
Subject: Re: [ewg] Interop test failure using OFED-3.5 RC4

This is definitely a perftest bug.

This is a significant re-write of these utilities and this bug is a regression 
in the
routine ctx_set_out_reads().

In 1.4 the code is this:
/
**
  *


**/
static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) {


 int max_reads;

 max_reads = (is_dev_hermon(context) == HERMON) ?
MAX_OUT_READ_HERMON : MAX_OUT_READ;---

 if (num_user_reads  max_reads) {
 fprintf(stderr, Number of outstanding reads is above max =
%d\n,max_reads);
 fprintf(stderr, Changing to that max value\n);
 num_user_reads = max_reads;
 }
 else if (num_user_reads = 0) {
 num_user_reads = max_reads;
 }

 return num_user_reads;
}

The new 2.0 code is:
/
**
  *


**/
static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) {


 int max_reads;

 Device ib_fdev = ib_dev_name(context);

 switch (ib_fdev) {
 case CONNECTIB : ;
 case CONNECTX3 : ;
 case CONNECTX2 : ;
 case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break;
 case LEGACY : max_reads = MAX_OUT_READ; break;
 default : max_reads = 0; 
 }

 if (num_user_reads  max_reads) {
 printf(RESULT_LINE);
 fprintf(stderr, Number of outstanding reads is above max =
%d\n,max_reads);
 fprintf(stderr, Changing to that max value\n);
 num_user_reads = max_reads;
 }
 else if (num_user_reads = 0) {
 num_user_reads = max_reads;
 }

 return num_user_reads;
}

The old code will return MAX_OUT_READ, while the new code for any other
HCAs (qib and probably others), will return 0.

I have a patch that works, while preserving the desired hardcoded values for
known/legacy devices:
+
+/***
***
+
+ *
+
+***

+***/ static int device_max_reads(struct ibv_context *context) {
+   struct ibv_device_attr attr;
+   int ret = 0;
+
+   if (!ibv_query_device(context,attr)) {
+   ret = attr.max_qp_rd_atom;
+   }
+   return ret;
+}
+

/
**
   *


**/
@@ -496,7 +510,7 @@ static int ctx_set_out_reads(struct ibv_
 case CONNECTX2 : ;
 case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break;
 case LEGACY : max_reads = MAX_OUT_READ; break;
-   default : max_reads = 0;
+   default : max_reads = device_max_reads(context);
 }

 if (num_user_reads  max_reads) {

I'm curious why the old and new code used hardcoded values?

Mike
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Interop test failure using OFED-3.5 RC4

2013-01-10 Thread Elken, Tom
 Rupert and the UNH-IOL pointed out that an Interop test which uses the
 ib_read_bw (perftest) benchmark fails on Intel True Scale HCAs.
 This test was succeeding with RC2, and started failing with RC3.  I am sorry 
 that
 our QA team did not find this bug with RC3.
 
 We have investigated and found that perftest was upgraded from v1.8 to v2.0
 on 11/19/12, between RC3 and RC4.
 
Er, I meant between RC2 and RC3.

-Tom

 We verified that with the qib driver in OFED-3.5 RC4 and the perftest RPM from
 RC2, we pass the tests.
 We also ran a similar qperf RDMA read test with qperf and qib from RC4 and 
 that
 test passed.
 
 We are working to isolate the bug and develop a fix.  We suspect the perftest
 changes, but the ib_read_* benchmarks may just have changed enough to start
 checking a part of the spec which hasn't been tested before in Interop tests. 
  So
 it may be a qib driver issue.
 
 The EWG standard practice is that if a significant bug fix goes in, we would 
 need
 another RC to enable others to easily test it.
 But perhaps it depends on whether the bug is in perftest, qib or elsewhere.  
 In
 any case, we don't want a GA build until this issue is solved.
 
 Regards,
 Tom
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] Interop test failure using OFED-3.5 RC4

2013-01-10 Thread Elken, Tom
Rupert and the UNH-IOL pointed out that an Interop test which uses the  
ib_read_bw (perftest) benchmark fails on Intel True Scale HCAs.
This test was succeeding with RC2, and started failing with RC3.  I am sorry 
that our QA team did not find this bug with RC3.

We have investigated and found that perftest was upgraded from v1.8 to v2.0 on 
11/19/12, between RC3 and RC4.
We verified that with the qib driver in OFED-3.5 RC4 and the perftest RPM from 
RC2, we pass the tests.
We also ran a similar qperf RDMA read test with qperf and qib from RC4 and that 
test passed.

We are working to isolate the bug and develop a fix.  We suspect the perftest 
changes, but the ib_read_* benchmarks may just have changed enough to start 
checking a part of the spec which hasn't been tested before in Interop tests.  
So it may be a qib driver issue.

The EWG standard practice is that if a significant bug fix goes in, we would 
need another RC to enable others to easily test it.
But perhaps it depends on whether the bug is in perftest, qib or elsewhere.  In 
any case, we don't want a GA build until this issue is solved.

Regards,
Tom 
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] Interop test failure using OFED-3.5 RC4

2013-01-10 Thread Hefty, Sean
  We have investigated and found that perftest was upgraded from v1.8 to v2.0
  on 11/19/12, between RC3 and RC4.
 
 Er, I meant between RC2 and RC3.

Why would there be a _major_ version change in any component done in the middle 
of a release cycle?!
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg