Re: [ewg] Interop test failure using OFED-3.5 RC4
Were you able to get the new package posted yet ? We need this ASAP so we can do another OFED-3.5 RC. Woody -Original Message- From: Ido Shamai [mailto:i...@dev.mellanox.co.il] Sent: Friday, January 11, 2013 12:32 PM To: Marciniszyn, Mike Cc: Woodruff, Robert J; Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean; Mascarenhas, Edward Subject: Re: Interop test failure using OFED-3.5 RC4 On 1/11/2013 9:36 PM, Marciniszyn, Mike wrote: I've opened OFED bz 2410 for this issue. Mike Great thanks. I will apply the patch and release a new version to OFED website tomorrow morning. Ido -Original Message- From: Woodruff, Robert J Sent: Friday, January 11, 2013 1:30 PM To: Marciniszyn, Mike; Elken, Tom; ewg@lists.openfabrics.org; Ido Shamai Subject: RE: Interop test failure using OFED-3.5 RC4 Adding Shamai from Mellanox to this thread. Woody -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg- boun...@lists.openfabrics.org] On Behalf Of Marciniszyn, Mike Sent: Friday, January 11, 2013 7:51 AM To: Elken, Tom; ewg@lists.openfabrics.org Subject: Re: [ewg] Interop test failure using OFED-3.5 RC4 This is definitely a perftest bug. This is a significant re-write of these utilities and this bug is a regression in the routine ctx_set_out_reads(). In 1.4 the code is this: / ** * **/ static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) { int max_reads; max_reads = (is_dev_hermon(context) == HERMON) ? MAX_OUT_READ_HERMON : MAX_OUT_READ;--- if (num_user_reads max_reads) { fprintf(stderr, Number of outstanding reads is above max = %d\n,max_reads); fprintf(stderr, Changing to that max value\n); num_user_reads = max_reads; } else if (num_user_reads = 0) { num_user_reads = max_reads; } return num_user_reads; } The new 2.0 code is: / ** * **/ static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) { int max_reads; Device ib_fdev = ib_dev_name(context); switch (ib_fdev) { case CONNECTIB : ; case CONNECTX3 : ; case CONNECTX2 : ; case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break; case LEGACY : max_reads = MAX_OUT_READ; break; default : max_reads = 0; } if (num_user_reads max_reads) { printf(RESULT_LINE); fprintf(stderr, Number of outstanding reads is above max = %d\n,max_reads); fprintf(stderr, Changing to that max value\n); num_user_reads = max_reads; } else if (num_user_reads = 0) { num_user_reads = max_reads; } return num_user_reads; } The old code will return MAX_OUT_READ, while the new code for any other HCAs (qib and probably others), will return 0. I have a patch that works, while preserving the desired hardcoded values for known/legacy devices: + +/*** *** + + * + +*** +***/ static int device_max_reads(struct ibv_context *context) { + struct ibv_device_attr attr; + int ret = 0; + + if (!ibv_query_device(context,attr)) { + ret = attr.max_qp_rd_atom; + } + return ret; +} + / ** * **/ @@ -496,7 +510,7 @@ static int ctx_set_out_reads(struct ibv_ case CONNECTX2 : ; case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break; case LEGACY : max_reads = MAX_OUT_READ; break; - default : max_reads = 0; + default : max_reads = device_max_reads(context); } if (num_user_reads max_reads) { I'm curious why the old and new code used hardcoded values? Mike ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Interop test failure using OFED-3.5 RC4
The new package has been posted, and I verified that the qib - qib issue is gone with the new tar ball.Ido has RESOLVED bz 2410 as well. Interop could be done with the new perftest/rc4 or just wait for the next RC. Mike -Original Message- From: Woodruff, Robert J Sent: Monday, January 14, 2013 12:52 PM To: Ido Shamai; Marciniszyn, Mike Cc: Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean; Mascarenhas, Edward; Tziporet Koren Subject: RE: Interop test failure using OFED-3.5 RC4 Were you able to get the new package posted yet ? We need this ASAP so we can do another OFED-3.5 RC. Woody -Original Message- From: Ido Shamai [mailto:i...@dev.mellanox.co.il] Sent: Friday, January 11, 2013 12:32 PM To: Marciniszyn, Mike Cc: Woodruff, Robert J; Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean; Mascarenhas, Edward Subject: Re: Interop test failure using OFED-3.5 RC4 On 1/11/2013 9:36 PM, Marciniszyn, Mike wrote: I've opened OFED bz 2410 for this issue. Mike Great thanks. I will apply the patch and release a new version to OFED website tomorrow morning. Ido -Original Message- From: Woodruff, Robert J Sent: Friday, January 11, 2013 1:30 PM To: Marciniszyn, Mike; Elken, Tom; ewg@lists.openfabrics.org; Ido Shamai Subject: RE: Interop test failure using OFED-3.5 RC4 Adding Shamai from Mellanox to this thread. Woody -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg- boun...@lists.openfabrics.org] On Behalf Of Marciniszyn, Mike Sent: Friday, January 11, 2013 7:51 AM To: Elken, Tom; ewg@lists.openfabrics.org Subject: Re: [ewg] Interop test failure using OFED-3.5 RC4 This is definitely a perftest bug. This is a significant re-write of these utilities and this bug is a regression in the routine ctx_set_out_reads(). In 1.4 the code is this: / ** * **/ static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) { int max_reads; max_reads = (is_dev_hermon(context) == HERMON) ? MAX_OUT_READ_HERMON : MAX_OUT_READ;--- if (num_user_reads max_reads) { fprintf(stderr, Number of outstanding reads is above max = %d\n,max_reads); fprintf(stderr, Changing to that max value\n); num_user_reads = max_reads; } else if (num_user_reads = 0) { num_user_reads = max_reads; } return num_user_reads; } The new 2.0 code is: / ** * **/ static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) { int max_reads; Device ib_fdev = ib_dev_name(context); switch (ib_fdev) { case CONNECTIB : ; case CONNECTX3 : ; case CONNECTX2 : ; case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break; case LEGACY : max_reads = MAX_OUT_READ; break; default : max_reads = 0; } if (num_user_reads max_reads) { printf(RESULT_LINE); fprintf(stderr, Number of outstanding reads is above max = %d\n,max_reads); fprintf(stderr, Changing to that max value\n); num_user_reads = max_reads; } else if (num_user_reads = 0) { num_user_reads = max_reads; } return num_user_reads; } The old code will return MAX_OUT_READ, while the new code for any other HCAs (qib and probably others), will return 0. I have a patch that works, while preserving the desired hardcoded values for known/legacy devices: + +/*** *** + + * + +*** +***/ static int device_max_reads(struct ibv_context *context) { + struct ibv_device_attr attr; + int ret = 0; + + if (!ibv_query_device(context,attr)) { + ret = attr.max_qp_rd_atom; + } + return ret; +} + / ** * **/ @@ -496,7 +510,7 @@ static int ctx_set_out_reads(struct ibv_ case CONNECTX2 : ; case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break; case LEGACY
Re: [ewg] Interop test failure using OFED-3.5 RC4
Does anyone know of any other show stopper bugs that are yet to be resolved ? If not, we can do an RC5 for final testing. -Original Message- From: Marciniszyn, Mike Sent: Monday, January 14, 2013 9:58 AM To: Woodruff, Robert J; Ido Shamai Cc: Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean; Mascarenhas, Edward; Tziporet Koren; rsda...@soft-forge.com Subject: RE: Interop test failure using OFED-3.5 RC4 The new package has been posted, and I verified that the qib - qib issue is gone with the new tar ball.Ido has RESOLVED bz 2410 as well. Interop could be done with the new perftest/rc4 or just wait for the next RC. Mike -Original Message- From: Woodruff, Robert J Sent: Monday, January 14, 2013 12:52 PM To: Ido Shamai; Marciniszyn, Mike Cc: Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean; Mascarenhas, Edward; Tziporet Koren Subject: RE: Interop test failure using OFED-3.5 RC4 Were you able to get the new package posted yet ? We need this ASAP so we can do another OFED-3.5 RC. Woody -Original Message- From: Ido Shamai [mailto:i...@dev.mellanox.co.il] Sent: Friday, January 11, 2013 12:32 PM To: Marciniszyn, Mike Cc: Woodruff, Robert J; Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean; Mascarenhas, Edward Subject: Re: Interop test failure using OFED-3.5 RC4 On 1/11/2013 9:36 PM, Marciniszyn, Mike wrote: I've opened OFED bz 2410 for this issue. Mike Great thanks. I will apply the patch and release a new version to OFED website tomorrow morning. Ido -Original Message- From: Woodruff, Robert J Sent: Friday, January 11, 2013 1:30 PM To: Marciniszyn, Mike; Elken, Tom; ewg@lists.openfabrics.org; Ido Shamai Subject: RE: Interop test failure using OFED-3.5 RC4 Adding Shamai from Mellanox to this thread. Woody -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg- boun...@lists.openfabrics.org] On Behalf Of Marciniszyn, Mike Sent: Friday, January 11, 2013 7:51 AM To: Elken, Tom; ewg@lists.openfabrics.org Subject: Re: [ewg] Interop test failure using OFED-3.5 RC4 This is definitely a perftest bug. This is a significant re-write of these utilities and this bug is a regression in the routine ctx_set_out_reads(). In 1.4 the code is this: / ** * **/ static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) { int max_reads; max_reads = (is_dev_hermon(context) == HERMON) ? MAX_OUT_READ_HERMON : MAX_OUT_READ;--- if (num_user_reads max_reads) { fprintf(stderr, Number of outstanding reads is above max = %d\n,max_reads); fprintf(stderr, Changing to that max value\n); num_user_reads = max_reads; } else if (num_user_reads = 0) { num_user_reads = max_reads; } return num_user_reads; } The new 2.0 code is: / ** * **/ static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) { int max_reads; Device ib_fdev = ib_dev_name(context); switch (ib_fdev) { case CONNECTIB : ; case CONNECTX3 : ; case CONNECTX2 : ; case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break; case LEGACY : max_reads = MAX_OUT_READ; break; default : max_reads = 0; } if (num_user_reads max_reads) { printf(RESULT_LINE); fprintf(stderr, Number of outstanding reads is above max = %d\n,max_reads); fprintf(stderr, Changing to that max value\n); num_user_reads = max_reads; } else if (num_user_reads = 0) { num_user_reads = max_reads; } return num_user_reads; } The old code will return MAX_OUT_READ, while the new code for any other HCAs (qib and probably others), will return 0. I have a patch that works, while preserving the desired hardcoded values for known/legacy devices: + +/*** *** + + * + +*** +***/ static int device_max_reads(struct ibv_context *context) { + struct ibv_device_attr attr; + int ret = 0; + + if (!ibv_query_device(context,attr)) { + ret = attr.max_qp_rd_atom
Re: [ewg] Interop test failure using OFED-3.5 RC4
BTW, Mike posted an alternate patch to the Bug 2410, which removed hard-coded values for _all_ HCAs by using ibv_query_device() to query the HCA. Thankfully, Ido used that alternate patch. -Tom -Original Message- From: Marciniszyn, Mike Sent: Monday, January 14, 2013 9:58 AM To: Woodruff, Robert J; Ido Shamai Cc: Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean; Mascarenhas, Edward; Tziporet Koren; rsda...@soft-forge.com Subject: RE: Interop test failure using OFED-3.5 RC4 The new package has been posted, and I verified that the qib - qib issue is gone with the new tar ball.Ido has RESOLVED bz 2410 as well. Interop could be done with the new perftest/rc4 or just wait for the next RC. Mike -Original Message- From: Woodruff, Robert J Sent: Monday, January 14, 2013 12:52 PM To: Ido Shamai; Marciniszyn, Mike Cc: Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean; Mascarenhas, Edward; Tziporet Koren Subject: RE: Interop test failure using OFED-3.5 RC4 Were you able to get the new package posted yet ? We need this ASAP so we can do another OFED-3.5 RC. Woody -Original Message- From: Ido Shamai [mailto:i...@dev.mellanox.co.il] Sent: Friday, January 11, 2013 12:32 PM To: Marciniszyn, Mike Cc: Woodruff, Robert J; Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean; Mascarenhas, Edward Subject: Re: Interop test failure using OFED-3.5 RC4 On 1/11/2013 9:36 PM, Marciniszyn, Mike wrote: I've opened OFED bz 2410 for this issue. Mike Great thanks. I will apply the patch and release a new version to OFED website tomorrow morning. Ido -Original Message- From: Woodruff, Robert J Sent: Friday, January 11, 2013 1:30 PM To: Marciniszyn, Mike; Elken, Tom; ewg@lists.openfabrics.org; Ido Shamai Subject: RE: Interop test failure using OFED-3.5 RC4 Adding Shamai from Mellanox to this thread. Woody -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg- boun...@lists.openfabrics.org] On Behalf Of Marciniszyn, Mike Sent: Friday, January 11, 2013 7:51 AM To: Elken, Tom; ewg@lists.openfabrics.org Subject: Re: [ewg] Interop test failure using OFED-3.5 RC4 This is definitely a perftest bug. This is a significant re-write of these utilities and this bug is a regression in the routine ctx_set_out_reads(). In 1.4 the code is this: / ** * **/ static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) { int max_reads; max_reads = (is_dev_hermon(context) == HERMON) ? MAX_OUT_READ_HERMON : MAX_OUT_READ;--- if (num_user_reads max_reads) { fprintf(stderr, Number of outstanding reads is above max = %d\n,max_reads); fprintf(stderr, Changing to that max value\n); num_user_reads = max_reads; } else if (num_user_reads = 0) { num_user_reads = max_reads; } return num_user_reads; } The new 2.0 code is: / ** * **/ static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) { int max_reads; Device ib_fdev = ib_dev_name(context); switch (ib_fdev) { case CONNECTIB : ; case CONNECTX3 : ; case CONNECTX2 : ; case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break; case LEGACY : max_reads = MAX_OUT_READ; break; default : max_reads = 0; } if (num_user_reads max_reads) { printf(RESULT_LINE); fprintf(stderr, Number of outstanding reads is above max = %d\n,max_reads); fprintf(stderr, Changing to that max value\n); num_user_reads = max_reads; } else if (num_user_reads = 0) { num_user_reads = max_reads; } return num_user_reads; } The old code will return MAX_OUT_READ, while the new code for any other HCAs (qib and probably others), will return 0. I have a patch that works, while preserving the desired hardcoded values for known/legacy devices
Re: [ewg] Interop test failure using OFED-3.5 RC4
I'm curious why the device query value cannot be used in all cases? Mike -Original Message- From: Ido Shamai [mailto:i...@dev.mellanox.co.il] Sent: Friday, January 11, 2013 3:32 PM To: Marciniszyn, Mike Cc: Woodruff, Robert J; Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean; Mascarenhas, Edward Subject: Re: Interop test failure using OFED-3.5 RC4 On 1/11/2013 9:36 PM, Marciniszyn, Mike wrote: I've opened OFED bz 2410 for this issue. Mike Great thanks. I will apply the patch and release a new version to OFED website tomorrow morning. Ido -Original Message- From: Woodruff, Robert J Sent: Friday, January 11, 2013 1:30 PM To: Marciniszyn, Mike; Elken, Tom; ewg@lists.openfabrics.org; Ido Shamai Subject: RE: Interop test failure using OFED-3.5 RC4 Adding Shamai from Mellanox to this thread. Woody -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg- boun...@lists.openfabrics.org] On Behalf Of Marciniszyn, Mike Sent: Friday, January 11, 2013 7:51 AM To: Elken, Tom; ewg@lists.openfabrics.org Subject: Re: [ewg] Interop test failure using OFED-3.5 RC4 This is definitely a perftest bug. This is a significant re-write of these utilities and this bug is a regression in the routine ctx_set_out_reads(). In 1.4 the code is this: / ** * **/ static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) { int max_reads; max_reads = (is_dev_hermon(context) == HERMON) ? MAX_OUT_READ_HERMON : MAX_OUT_READ;--- if (num_user_reads max_reads) { fprintf(stderr, Number of outstanding reads is above max = %d\n,max_reads); fprintf(stderr, Changing to that max value\n); num_user_reads = max_reads; } else if (num_user_reads = 0) { num_user_reads = max_reads; } return num_user_reads; } The new 2.0 code is: / ** * **/ static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) { int max_reads; Device ib_fdev = ib_dev_name(context); switch (ib_fdev) { case CONNECTIB : ; case CONNECTX3 : ; case CONNECTX2 : ; case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break; case LEGACY : max_reads = MAX_OUT_READ; break; default : max_reads = 0; } if (num_user_reads max_reads) { printf(RESULT_LINE); fprintf(stderr, Number of outstanding reads is above max = %d\n,max_reads); fprintf(stderr, Changing to that max value\n); num_user_reads = max_reads; } else if (num_user_reads = 0) { num_user_reads = max_reads; } return num_user_reads; } The old code will return MAX_OUT_READ, while the new code for any other HCAs (qib and probably others), will return 0. I have a patch that works, while preserving the desired hardcoded values for known/legacy devices: + +/*** *** + + * + +*** +***/ static int device_max_reads(struct ibv_context *context) { + struct ibv_device_attr attr; + int ret = 0; + + if (!ibv_query_device(context,attr)) { + ret = attr.max_qp_rd_atom; + } + return ret; +} + / ** * **/ @@ -496,7 +510,7 @@ static int ctx_set_out_reads(struct ibv_ case CONNECTX2 : ; case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break; case LEGACY : max_reads = MAX_OUT_READ; break; - default : max_reads = 0; + default : max_reads = device_max_reads(context); } if (num_user_reads max_reads) { I'm curious why the old and new code used hardcoded values? Mike ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org
Re: [ewg] Interop test failure using OFED-3.5 RC4
We have investigated and found that perftest was upgraded from v1.8 to v2.0 Tom, I was mistaken. The older perftest version is 1.4. Mike ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Interop test failure using OFED-3.5 RC4
Tom wrote, The EWG standard practice is that if a significant bug fix goes in, we would need another RC to enable others to easily test it. But perhaps it depends on whether the bug is in perftest, qib or elsewhere. In any case, we don't want a GA build until this issue is solved. Yes, this will require another RC. Woody ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Interop test failure using OFED-3.5 RC4
This is definitely a perftest bug. This is a significant re-write of these utilities and this bug is a regression in the routine ctx_set_out_reads(). In 1.4 the code is this: /** * **/ static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) { int max_reads; max_reads = (is_dev_hermon(context) == HERMON) ? MAX_OUT_READ_HERMON : MAX_OUT_READ;--- if (num_user_reads max_reads) { fprintf(stderr, Number of outstanding reads is above max = %d\n,max_reads); fprintf(stderr, Changing to that max value\n); num_user_reads = max_reads; } else if (num_user_reads = 0) { num_user_reads = max_reads; } return num_user_reads; } The new 2.0 code is: /** * **/ static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) { int max_reads; Device ib_fdev = ib_dev_name(context); switch (ib_fdev) { case CONNECTIB : ; case CONNECTX3 : ; case CONNECTX2 : ; case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break; case LEGACY : max_reads = MAX_OUT_READ; break; default : max_reads = 0; } if (num_user_reads max_reads) { printf(RESULT_LINE); fprintf(stderr, Number of outstanding reads is above max = %d\n,max_reads); fprintf(stderr, Changing to that max value\n); num_user_reads = max_reads; } else if (num_user_reads = 0) { num_user_reads = max_reads; } return num_user_reads; } The old code will return MAX_OUT_READ, while the new code for any other HCAs (qib and probably others), will return 0. I have a patch that works, while preserving the desired hardcoded values for known/legacy devices: + +/** + * + **/ +static int device_max_reads(struct ibv_context *context) { + struct ibv_device_attr attr; + int ret = 0; + + if (!ibv_query_device(context,attr)) { + ret = attr.max_qp_rd_atom; + } + return ret; +} + /** * **/ @@ -496,7 +510,7 @@ static int ctx_set_out_reads(struct ibv_ case CONNECTX2 : ; case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break; case LEGACY : max_reads = MAX_OUT_READ; break; - default : max_reads = 0; + default : max_reads = device_max_reads(context); } if (num_user_reads max_reads) { I'm curious why the old and new code used hardcoded values? Mike ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Interop test failure using OFED-3.5 RC4
On 1/11/2013 7:20 AM, Hefty, Sean wrote: We have investigated and found that perftest was upgraded from v1.8 to v2.0 on 11/19/12, between RC3 and RC4. Hi, We did move from perftest-1.4 to perftest-2.0 last month. It has the same logic and results as the older version + plenty of new features. Can u tell me more of the problem? Ido Er, I meant between RC2 and RC3. Why would there be a _major_ version change in any component done in the middle of a release cycle?! ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Interop test failure using OFED-3.5 RC4
Adding Shamai from Mellanox to this thread. Woody -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of Marciniszyn, Mike Sent: Friday, January 11, 2013 7:51 AM To: Elken, Tom; ewg@lists.openfabrics.org Subject: Re: [ewg] Interop test failure using OFED-3.5 RC4 This is definitely a perftest bug. This is a significant re-write of these utilities and this bug is a regression in the routine ctx_set_out_reads(). In 1.4 the code is this: /** * **/ static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) { int max_reads; max_reads = (is_dev_hermon(context) == HERMON) ? MAX_OUT_READ_HERMON : MAX_OUT_READ;--- if (num_user_reads max_reads) { fprintf(stderr, Number of outstanding reads is above max = %d\n,max_reads); fprintf(stderr, Changing to that max value\n); num_user_reads = max_reads; } else if (num_user_reads = 0) { num_user_reads = max_reads; } return num_user_reads; } The new 2.0 code is: /** * **/ static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) { int max_reads; Device ib_fdev = ib_dev_name(context); switch (ib_fdev) { case CONNECTIB : ; case CONNECTX3 : ; case CONNECTX2 : ; case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break; case LEGACY : max_reads = MAX_OUT_READ; break; default : max_reads = 0; } if (num_user_reads max_reads) { printf(RESULT_LINE); fprintf(stderr, Number of outstanding reads is above max = %d\n,max_reads); fprintf(stderr, Changing to that max value\n); num_user_reads = max_reads; } else if (num_user_reads = 0) { num_user_reads = max_reads; } return num_user_reads; } The old code will return MAX_OUT_READ, while the new code for any other HCAs (qib and probably others), will return 0. I have a patch that works, while preserving the desired hardcoded values for known/legacy devices: + +/** + * + **/ +static int device_max_reads(struct ibv_context *context) { + struct ibv_device_attr attr; + int ret = 0; + + if (!ibv_query_device(context,attr)) { + ret = attr.max_qp_rd_atom; + } + return ret; +} + /** * **/ @@ -496,7 +510,7 @@ static int ctx_set_out_reads(struct ibv_ case CONNECTX2 : ; case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break; case LEGACY : max_reads = MAX_OUT_READ; break; - default : max_reads = 0; + default : max_reads = device_max_reads(context); } if (num_user_reads max_reads) { I'm curious why the old and new code used hardcoded values? Mike ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Interop test failure using OFED-3.5 RC4
I've opened OFED bz 2410 for this issue. Mike -Original Message- From: Woodruff, Robert J Sent: Friday, January 11, 2013 1:30 PM To: Marciniszyn, Mike; Elken, Tom; ewg@lists.openfabrics.org; Ido Shamai Subject: RE: Interop test failure using OFED-3.5 RC4 Adding Shamai from Mellanox to this thread. Woody -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg- boun...@lists.openfabrics.org] On Behalf Of Marciniszyn, Mike Sent: Friday, January 11, 2013 7:51 AM To: Elken, Tom; ewg@lists.openfabrics.org Subject: Re: [ewg] Interop test failure using OFED-3.5 RC4 This is definitely a perftest bug. This is a significant re-write of these utilities and this bug is a regression in the routine ctx_set_out_reads(). In 1.4 the code is this: / ** * **/ static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) { int max_reads; max_reads = (is_dev_hermon(context) == HERMON) ? MAX_OUT_READ_HERMON : MAX_OUT_READ;--- if (num_user_reads max_reads) { fprintf(stderr, Number of outstanding reads is above max = %d\n,max_reads); fprintf(stderr, Changing to that max value\n); num_user_reads = max_reads; } else if (num_user_reads = 0) { num_user_reads = max_reads; } return num_user_reads; } The new 2.0 code is: / ** * **/ static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) { int max_reads; Device ib_fdev = ib_dev_name(context); switch (ib_fdev) { case CONNECTIB : ; case CONNECTX3 : ; case CONNECTX2 : ; case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break; case LEGACY : max_reads = MAX_OUT_READ; break; default : max_reads = 0; } if (num_user_reads max_reads) { printf(RESULT_LINE); fprintf(stderr, Number of outstanding reads is above max = %d\n,max_reads); fprintf(stderr, Changing to that max value\n); num_user_reads = max_reads; } else if (num_user_reads = 0) { num_user_reads = max_reads; } return num_user_reads; } The old code will return MAX_OUT_READ, while the new code for any other HCAs (qib and probably others), will return 0. I have a patch that works, while preserving the desired hardcoded values for known/legacy devices: + +/*** *** + + * + +*** +***/ static int device_max_reads(struct ibv_context *context) { + struct ibv_device_attr attr; + int ret = 0; + + if (!ibv_query_device(context,attr)) { + ret = attr.max_qp_rd_atom; + } + return ret; +} + / ** * **/ @@ -496,7 +510,7 @@ static int ctx_set_out_reads(struct ibv_ case CONNECTX2 : ; case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break; case LEGACY : max_reads = MAX_OUT_READ; break; - default : max_reads = 0; + default : max_reads = device_max_reads(context); } if (num_user_reads max_reads) { I'm curious why the old and new code used hardcoded values? Mike ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Interop test failure using OFED-3.5 RC4
On 1/11/2013 9:36 PM, Marciniszyn, Mike wrote: I've opened OFED bz 2410 for this issue. Mike Great thanks. I will apply the patch and release a new version to OFED website tomorrow morning. Ido -Original Message- From: Woodruff, Robert J Sent: Friday, January 11, 2013 1:30 PM To: Marciniszyn, Mike; Elken, Tom; ewg@lists.openfabrics.org; Ido Shamai Subject: RE: Interop test failure using OFED-3.5 RC4 Adding Shamai from Mellanox to this thread. Woody -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg- boun...@lists.openfabrics.org] On Behalf Of Marciniszyn, Mike Sent: Friday, January 11, 2013 7:51 AM To: Elken, Tom; ewg@lists.openfabrics.org Subject: Re: [ewg] Interop test failure using OFED-3.5 RC4 This is definitely a perftest bug. This is a significant re-write of these utilities and this bug is a regression in the routine ctx_set_out_reads(). In 1.4 the code is this: / ** * **/ static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) { int max_reads; max_reads = (is_dev_hermon(context) == HERMON) ? MAX_OUT_READ_HERMON : MAX_OUT_READ;--- if (num_user_reads max_reads) { fprintf(stderr, Number of outstanding reads is above max = %d\n,max_reads); fprintf(stderr, Changing to that max value\n); num_user_reads = max_reads; } else if (num_user_reads = 0) { num_user_reads = max_reads; } return num_user_reads; } The new 2.0 code is: / ** * **/ static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) { int max_reads; Device ib_fdev = ib_dev_name(context); switch (ib_fdev) { case CONNECTIB : ; case CONNECTX3 : ; case CONNECTX2 : ; case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break; case LEGACY : max_reads = MAX_OUT_READ; break; default : max_reads = 0; } if (num_user_reads max_reads) { printf(RESULT_LINE); fprintf(stderr, Number of outstanding reads is above max = %d\n,max_reads); fprintf(stderr, Changing to that max value\n); num_user_reads = max_reads; } else if (num_user_reads = 0) { num_user_reads = max_reads; } return num_user_reads; } The old code will return MAX_OUT_READ, while the new code for any other HCAs (qib and probably others), will return 0. I have a patch that works, while preserving the desired hardcoded values for known/legacy devices: + +/*** *** + + * + +*** +***/ static int device_max_reads(struct ibv_context *context) { + struct ibv_device_attr attr; + int ret = 0; + + if (!ibv_query_device(context,attr)) { + ret = attr.max_qp_rd_atom; + } + return ret; +} + / ** * **/ @@ -496,7 +510,7 @@ static int ctx_set_out_reads(struct ibv_ case CONNECTX2 : ; case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break; case LEGACY : max_reads = MAX_OUT_READ; break; - default : max_reads = 0; + default : max_reads = device_max_reads(context); } if (num_user_reads max_reads) { I'm curious why the old and new code used hardcoded values? Mike ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Interop test failure using OFED-3.5 RC4
Rupert and the UNH-IOL pointed out that an Interop test which uses the ib_read_bw (perftest) benchmark fails on Intel True Scale HCAs. This test was succeeding with RC2, and started failing with RC3. I am sorry that our QA team did not find this bug with RC3. We have investigated and found that perftest was upgraded from v1.8 to v2.0 on 11/19/12, between RC3 and RC4. Er, I meant between RC2 and RC3. -Tom We verified that with the qib driver in OFED-3.5 RC4 and the perftest RPM from RC2, we pass the tests. We also ran a similar qperf RDMA read test with qperf and qib from RC4 and that test passed. We are working to isolate the bug and develop a fix. We suspect the perftest changes, but the ib_read_* benchmarks may just have changed enough to start checking a part of the spec which hasn't been tested before in Interop tests. So it may be a qib driver issue. The EWG standard practice is that if a significant bug fix goes in, we would need another RC to enable others to easily test it. But perhaps it depends on whether the bug is in perftest, qib or elsewhere. In any case, we don't want a GA build until this issue is solved. Regards, Tom ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Interop test failure using OFED-3.5 RC4
Rupert and the UNH-IOL pointed out that an Interop test which uses the ib_read_bw (perftest) benchmark fails on Intel True Scale HCAs. This test was succeeding with RC2, and started failing with RC3. I am sorry that our QA team did not find this bug with RC3. We have investigated and found that perftest was upgraded from v1.8 to v2.0 on 11/19/12, between RC3 and RC4. We verified that with the qib driver in OFED-3.5 RC4 and the perftest RPM from RC2, we pass the tests. We also ran a similar qperf RDMA read test with qperf and qib from RC4 and that test passed. We are working to isolate the bug and develop a fix. We suspect the perftest changes, but the ib_read_* benchmarks may just have changed enough to start checking a part of the spec which hasn't been tested before in Interop tests. So it may be a qib driver issue. The EWG standard practice is that if a significant bug fix goes in, we would need another RC to enable others to easily test it. But perhaps it depends on whether the bug is in perftest, qib or elsewhere. In any case, we don't want a GA build until this issue is solved. Regards, Tom ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Interop test failure using OFED-3.5 RC4
We have investigated and found that perftest was upgraded from v1.8 to v2.0 on 11/19/12, between RC3 and RC4. Er, I meant between RC2 and RC3. Why would there be a _major_ version change in any component done in the middle of a release cycle?! ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg