Re: [ewg] Interop test failure using OFED-3.5 RC4
BTW, Mike posted an alternate patch to the Bug 2410, which removed hard-coded values for _all_ HCAs by using ibv_query_device() to query the HCA. Thankfully, Ido used that alternate patch. -Tom > -Original Message- > From: Marciniszyn, Mike > Sent: Monday, January 14, 2013 9:58 AM > To: Woodruff, Robert J; Ido Shamai > Cc: Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean; Mascarenhas, Edward; > Tziporet Koren; rsda...@soft-forge.com > Subject: RE: Interop test failure using OFED-3.5 RC4 > > The new package has been posted, and I verified that the qib <-> qib issue is > gone with the new tar ball.Ido has RESOLVED bz 2410 as well. > > Interop could be done with the new perftest/rc4 or just wait for the next RC. > > Mike > > > -Original Message- > > From: Woodruff, Robert J > > Sent: Monday, January 14, 2013 12:52 PM > > To: Ido Shamai; Marciniszyn, Mike > > Cc: Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean; Mascarenhas, > Edward; > > Tziporet Koren > > Subject: RE: Interop test failure using OFED-3.5 RC4 > > > > Were you able to get the new package posted yet ? > > > > We need this ASAP so we can do another OFED-3.5 RC. > > > > Woody > > > > > > -Original Message- > > From: Ido Shamai [mailto:i...@dev.mellanox.co.il] > > Sent: Friday, January 11, 2013 12:32 PM > > To: Marciniszyn, Mike > > Cc: Woodruff, Robert J; Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean; > > Mascarenhas, Edward > > Subject: Re: Interop test failure using OFED-3.5 RC4 > > > > On 1/11/2013 9:36 PM, Marciniszyn, Mike wrote: > > > I've opened OFED bz 2410 for this issue. > > > > > > Mike > > > > Great thanks. > > I will apply the patch and release a new version to OFED website tomorrow > > morning. > > > > Ido > > > > >> -Original Message- > > >> From: Woodruff, Robert J > > >> Sent: Friday, January 11, 2013 1:30 PM > > >> To: Marciniszyn, Mike; Elken, Tom; ewg@lists.openfabrics.org; Ido > > >> Shamai > > >> Subject: RE: Interop test failure using OFED-3.5 RC4 > > >> > > >> > > >> Adding Shamai from Mellanox to this thread. > > >> > > >> Woody > > >> > > >> -Original Message- > > >> From: ewg-boun...@lists.openfabrics.org [mailto:ewg- > > >> boun...@lists.openfabrics.org] On Behalf Of Marciniszyn, Mike > > >> Sent: Friday, January 11, 2013 7:51 AM > > >> To: Elken, Tom; ewg@lists.openfabrics.org > > >> Subject: Re: [ewg] Interop test failure using OFED-3.5 RC4 > > >> > > >> This is definitely a perftest bug. > > >> > > >> This is a significant re-write of these utilities and this bug is a > > >> regression in the routine ctx_set_out_reads(). > > >> > > >> In 1.4 the code is this: > > >> > > > / > > >> ** > > >> * > > >> > > >> > > > > > >> **/ > > >> static int ctx_set_out_reads(struct ibv_context *context,int > > >> num_user_reads) { > > >> > > >> > > >> int max_reads; > > >> > > >> max_reads = (is_dev_hermon(context) == HERMON) ? > > >> MAX_OUT_READ_HERMON : MAX_OUT_READ;<--- > > >> > > >> if (num_user_reads > max_reads) { > > >> fprintf(stderr," Number of outstanding reads is > > >> above max = %d\n",max_reads); > > >> fprintf(stderr," Changing to that max value\n"); > > >> num_user_reads = max_reads; > > >> } > > >> else if (num_user_reads <= 0) { > > >> num_user_reads = max_reads; > > >> } > > >> > > >> return num_user_reads; > > >> } > > >> > > >> The new 2.0 code is: > > >> > > > / > > >> ** > > >> * > > >> > > >> > > > > > >> **/ > >
Re: [ewg] Interop test failure using OFED-3.5 RC4
Does anyone know of any other show stopper bugs that are yet to be resolved ? If not, we can do an RC5 for final testing. -Original Message- From: Marciniszyn, Mike Sent: Monday, January 14, 2013 9:58 AM To: Woodruff, Robert J; Ido Shamai Cc: Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean; Mascarenhas, Edward; Tziporet Koren; rsda...@soft-forge.com Subject: RE: Interop test failure using OFED-3.5 RC4 The new package has been posted, and I verified that the qib <-> qib issue is gone with the new tar ball.Ido has RESOLVED bz 2410 as well. Interop could be done with the new perftest/rc4 or just wait for the next RC. Mike > -Original Message- > From: Woodruff, Robert J > Sent: Monday, January 14, 2013 12:52 PM > To: Ido Shamai; Marciniszyn, Mike > Cc: Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean; Mascarenhas, Edward; > Tziporet Koren > Subject: RE: Interop test failure using OFED-3.5 RC4 > > Were you able to get the new package posted yet ? > > We need this ASAP so we can do another OFED-3.5 RC. > > Woody > > > -Original Message- > From: Ido Shamai [mailto:i...@dev.mellanox.co.il] > Sent: Friday, January 11, 2013 12:32 PM > To: Marciniszyn, Mike > Cc: Woodruff, Robert J; Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean; > Mascarenhas, Edward > Subject: Re: Interop test failure using OFED-3.5 RC4 > > On 1/11/2013 9:36 PM, Marciniszyn, Mike wrote: > > I've opened OFED bz 2410 for this issue. > > > > Mike > > Great thanks. > I will apply the patch and release a new version to OFED website tomorrow > morning. > > Ido > > >> -Original Message- > >> From: Woodruff, Robert J > >> Sent: Friday, January 11, 2013 1:30 PM > >> To: Marciniszyn, Mike; Elken, Tom; ewg@lists.openfabrics.org; Ido > >> Shamai > >> Subject: RE: Interop test failure using OFED-3.5 RC4 > >> > >> > >> Adding Shamai from Mellanox to this thread. > >> > >> Woody > >> > >> -Original Message----- > >> From: ewg-boun...@lists.openfabrics.org [mailto:ewg- > >> boun...@lists.openfabrics.org] On Behalf Of Marciniszyn, Mike > >> Sent: Friday, January 11, 2013 7:51 AM > >> To: Elken, Tom; ewg@lists.openfabrics.org > >> Subject: Re: [ewg] Interop test failure using OFED-3.5 RC4 > >> > >> This is definitely a perftest bug. > >> > >> This is a significant re-write of these utilities and this bug is a > >> regression in the routine ctx_set_out_reads(). > >> > >> In 1.4 the code is this: > >> > / > >> ** > >> * > >> > >> > > >> **/ > >> static int ctx_set_out_reads(struct ibv_context *context,int > >> num_user_reads) { > >> > >> > >> int max_reads; > >> > >> max_reads = (is_dev_hermon(context) == HERMON) ? > >> MAX_OUT_READ_HERMON : MAX_OUT_READ;<--- > >> > >> if (num_user_reads > max_reads) { > >> fprintf(stderr," Number of outstanding reads is > >> above max = %d\n",max_reads); > >> fprintf(stderr," Changing to that max value\n"); > >> num_user_reads = max_reads; > >> } > >> else if (num_user_reads <= 0) { > >> num_user_reads = max_reads; > >> } > >> > >> return num_user_reads; > >> } > >> > >> The new 2.0 code is: > >> > / > >> ** > >> * > >> > >> > > >> **/ > >> static int ctx_set_out_reads(struct ibv_context *context,int > >> num_user_reads) { > >> > >> > >> int max_reads; > >> > >> Device ib_fdev = ib_dev_name(context); > >> > >> switch (ib_fdev) { > >> case CONNECTIB : ; > >> case CONNECTX3 : ; > >> case CONNECTX2 : ; > >> case CONNECTX : max_reads = MAX_OUT_READ_HERMON; > break; > >> case LEGACY : max_reads = MAX_OUT_READ; break;
Re: [ewg] Interop test failure using OFED-3.5 RC4
The new package has been posted, and I verified that the qib <-> qib issue is gone with the new tar ball.Ido has RESOLVED bz 2410 as well. Interop could be done with the new perftest/rc4 or just wait for the next RC. Mike > -Original Message- > From: Woodruff, Robert J > Sent: Monday, January 14, 2013 12:52 PM > To: Ido Shamai; Marciniszyn, Mike > Cc: Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean; Mascarenhas, Edward; > Tziporet Koren > Subject: RE: Interop test failure using OFED-3.5 RC4 > > Were you able to get the new package posted yet ? > > We need this ASAP so we can do another OFED-3.5 RC. > > Woody > > > -Original Message- > From: Ido Shamai [mailto:i...@dev.mellanox.co.il] > Sent: Friday, January 11, 2013 12:32 PM > To: Marciniszyn, Mike > Cc: Woodruff, Robert J; Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean; > Mascarenhas, Edward > Subject: Re: Interop test failure using OFED-3.5 RC4 > > On 1/11/2013 9:36 PM, Marciniszyn, Mike wrote: > > I've opened OFED bz 2410 for this issue. > > > > Mike > > Great thanks. > I will apply the patch and release a new version to OFED website tomorrow > morning. > > Ido > > >> -Original Message- > >> From: Woodruff, Robert J > >> Sent: Friday, January 11, 2013 1:30 PM > >> To: Marciniszyn, Mike; Elken, Tom; ewg@lists.openfabrics.org; Ido > >> Shamai > >> Subject: RE: Interop test failure using OFED-3.5 RC4 > >> > >> > >> Adding Shamai from Mellanox to this thread. > >> > >> Woody > >> > >> -Original Message----- > >> From: ewg-boun...@lists.openfabrics.org [mailto:ewg- > >> boun...@lists.openfabrics.org] On Behalf Of Marciniszyn, Mike > >> Sent: Friday, January 11, 2013 7:51 AM > >> To: Elken, Tom; ewg@lists.openfabrics.org > >> Subject: Re: [ewg] Interop test failure using OFED-3.5 RC4 > >> > >> This is definitely a perftest bug. > >> > >> This is a significant re-write of these utilities and this bug is a > >> regression in the routine ctx_set_out_reads(). > >> > >> In 1.4 the code is this: > >> > / > >> ** > >> * > >> > >> > > >> **/ > >> static int ctx_set_out_reads(struct ibv_context *context,int > >> num_user_reads) { > >> > >> > >> int max_reads; > >> > >> max_reads = (is_dev_hermon(context) == HERMON) ? > >> MAX_OUT_READ_HERMON : MAX_OUT_READ;<--- > >> > >> if (num_user_reads > max_reads) { > >> fprintf(stderr," Number of outstanding reads is > >> above max = %d\n",max_reads); > >> fprintf(stderr," Changing to that max value\n"); > >> num_user_reads = max_reads; > >> } > >> else if (num_user_reads <= 0) { > >> num_user_reads = max_reads; > >> } > >> > >> return num_user_reads; > >> } > >> > >> The new 2.0 code is: > >> > / > >> ** > >> * > >> > >> > > >> **/ > >> static int ctx_set_out_reads(struct ibv_context *context,int > >> num_user_reads) { > >> > >> > >> int max_reads; > >> > >> Device ib_fdev = ib_dev_name(context); > >> > >> switch (ib_fdev) { > >> case CONNECTIB : ; > >> case CONNECTX3 : ; > >> case CONNECTX2 : ; > >> case CONNECTX : max_reads = MAX_OUT_READ_HERMON; > break; > >> case LEGACY : max_reads = MAX_OUT_READ; break; > >> default : max_reads = 0; < > >> } > >> > >> if (num_user_reads > max_reads) { > >> printf(RESULT_LINE); > >> fprintf(stderr," Number of outstanding reads is > >> above max = %d\n",max_reads); > >> fprintf(stderr," Changing to that max value\n"); > >> num_user_reads
Re: [ewg] Interop test failure using OFED-3.5 RC4
Were you able to get the new package posted yet ? We need this ASAP so we can do another OFED-3.5 RC. Woody -Original Message- From: Ido Shamai [mailto:i...@dev.mellanox.co.il] Sent: Friday, January 11, 2013 12:32 PM To: Marciniszyn, Mike Cc: Woodruff, Robert J; Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean; Mascarenhas, Edward Subject: Re: Interop test failure using OFED-3.5 RC4 On 1/11/2013 9:36 PM, Marciniszyn, Mike wrote: > I've opened OFED bz 2410 for this issue. > > Mike Great thanks. I will apply the patch and release a new version to OFED website tomorrow morning. Ido >> -Original Message- >> From: Woodruff, Robert J >> Sent: Friday, January 11, 2013 1:30 PM >> To: Marciniszyn, Mike; Elken, Tom; ewg@lists.openfabrics.org; Ido Shamai >> Subject: RE: Interop test failure using OFED-3.5 RC4 >> >> >> Adding Shamai from Mellanox to this thread. >> >> Woody >> >> -Original Message- >> From: ewg-boun...@lists.openfabrics.org [mailto:ewg- >> boun...@lists.openfabrics.org] On Behalf Of Marciniszyn, Mike >> Sent: Friday, January 11, 2013 7:51 AM >> To: Elken, Tom; ewg@lists.openfabrics.org >> Subject: Re: [ewg] Interop test failure using OFED-3.5 RC4 >> >> This is definitely a perftest bug. >> >> This is a significant re-write of these utilities and this bug is a >> regression in the >> routine ctx_set_out_reads(). >> >> In 1.4 the code is this: >> / >> ** >> * >> >> >> **/ >> static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) >> { >> >> >> int max_reads; >> >> max_reads = (is_dev_hermon(context) == HERMON) ? >> MAX_OUT_READ_HERMON : MAX_OUT_READ;<--- >> >> if (num_user_reads > max_reads) { >> fprintf(stderr," Number of outstanding reads is above max = >> %d\n",max_reads); >> fprintf(stderr," Changing to that max value\n"); >> num_user_reads = max_reads; >> } >> else if (num_user_reads <= 0) { >> num_user_reads = max_reads; >> } >> >> return num_user_reads; >> } >> >> The new 2.0 code is: >> / >> ** >> * >> >> >> **/ >> static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) >> { >> >> >> int max_reads; >> >> Device ib_fdev = ib_dev_name(context); >> >> switch (ib_fdev) { >> case CONNECTIB : ; >> case CONNECTX3 : ; >> case CONNECTX2 : ; >> case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break; >> case LEGACY : max_reads = MAX_OUT_READ; break; >> default : max_reads = 0; < >> } >> >> if (num_user_reads > max_reads) { >> printf(RESULT_LINE); >> fprintf(stderr," Number of outstanding reads is above max = >> %d\n",max_reads); >> fprintf(stderr," Changing to that max value\n"); >> num_user_reads = max_reads; >> } >> else if (num_user_reads <= 0) { >> num_user_reads = max_reads; >> } >> >> return num_user_reads; >> } >> >> The old code will return MAX_OUT_READ, while the new code for any other >> HCAs (qib and probably others), will return 0. >> >> I have a patch that works, while preserving the desired hardcoded values for >> "known/legacy" devices: >> + >> +/*** >> *** >> + >> + * >> + >> +*** >> >> +***/ static int device_max_reads(struct ibv_context *context) { >> + struct ibv_device_attr attr; >> + int ret = 0; >> + >> + if (!ibv_query_device(context,&attr)) { >> + ret = attr.max_qp_rd_atom; >> + } >> + return ret; >> +} >> + >> >> /
Re: [ewg] Interop test failure using OFED-3.5 RC4
I'm curious why the device query value cannot be used in all cases? Mike > -Original Message- > From: Ido Shamai [mailto:i...@dev.mellanox.co.il] > Sent: Friday, January 11, 2013 3:32 PM > To: Marciniszyn, Mike > Cc: Woodruff, Robert J; Elken, Tom; ewg@lists.openfabrics.org; Hefty, Sean; > Mascarenhas, Edward > Subject: Re: Interop test failure using OFED-3.5 RC4 > > On 1/11/2013 9:36 PM, Marciniszyn, Mike wrote: > > I've opened OFED bz 2410 for this issue. > > > > Mike > > Great thanks. > I will apply the patch and release a new version to OFED website tomorrow > morning. > > Ido > > >> -Original Message- > >> From: Woodruff, Robert J > >> Sent: Friday, January 11, 2013 1:30 PM > >> To: Marciniszyn, Mike; Elken, Tom; ewg@lists.openfabrics.org; Ido > >> Shamai > >> Subject: RE: Interop test failure using OFED-3.5 RC4 > >> > >> > >> Adding Shamai from Mellanox to this thread. > >> > >> Woody > >> > >> -Original Message- > >> From: ewg-boun...@lists.openfabrics.org [mailto:ewg- > >> boun...@lists.openfabrics.org] On Behalf Of Marciniszyn, Mike > >> Sent: Friday, January 11, 2013 7:51 AM > >> To: Elken, Tom; ewg@lists.openfabrics.org > >> Subject: Re: [ewg] Interop test failure using OFED-3.5 RC4 > >> > >> This is definitely a perftest bug. > >> > >> This is a significant re-write of these utilities and this bug is a > >> regression in the routine ctx_set_out_reads(). > >> > >> In 1.4 the code is this: > >> > / > >> ** > >> * > >> > >> > > >> **/ > >> static int ctx_set_out_reads(struct ibv_context *context,int > >> num_user_reads) { > >> > >> > >> int max_reads; > >> > >> max_reads = (is_dev_hermon(context) == HERMON) ? > >> MAX_OUT_READ_HERMON : MAX_OUT_READ;<--- > >> > >> if (num_user_reads > max_reads) { > >> fprintf(stderr," Number of outstanding reads is > >> above max = %d\n",max_reads); > >> fprintf(stderr," Changing to that max value\n"); > >> num_user_reads = max_reads; > >> } > >> else if (num_user_reads <= 0) { > >> num_user_reads = max_reads; > >> } > >> > >> return num_user_reads; > >> } > >> > >> The new 2.0 code is: > >> > / > >> ** > >> * > >> > >> > > >> **/ > >> static int ctx_set_out_reads(struct ibv_context *context,int > >> num_user_reads) { > >> > >> > >> int max_reads; > >> > >> Device ib_fdev = ib_dev_name(context); > >> > >> switch (ib_fdev) { > >> case CONNECTIB : ; > >> case CONNECTX3 : ; > >> case CONNECTX2 : ; > >> case CONNECTX : max_reads = MAX_OUT_READ_HERMON; > break; > >> case LEGACY : max_reads = MAX_OUT_READ; break; > >> default : max_reads = 0; < > >> } > >> > >> if (num_user_reads > max_reads) { > >> printf(RESULT_LINE); > >> fprintf(stderr," Number of outstanding reads is > >> above max = %d\n",max_reads); > >> fprintf(stderr," Changing to that max value\n"); > >> num_user_reads = max_reads; > >> } > >> else if (num_user_reads <= 0) { > >> num_user_reads = max_reads; > >> } > >> > >> return num_user_reads; > >> } > >> > >> The old code will return MAX_OUT_READ, while the new code for any > >> other HCAs (qib and probably others), will return 0. > >> > >> I have a patch that works, while preserving the desired hardcoded > >> values for "known/legacy" devices: > >> + > >> > +/***
Re: [ewg] Interop test failure using OFED-3.5 RC4
On 1/11/2013 9:36 PM, Marciniszyn, Mike wrote: I've opened OFED bz 2410 for this issue. Mike Great thanks. I will apply the patch and release a new version to OFED website tomorrow morning. Ido -Original Message- From: Woodruff, Robert J Sent: Friday, January 11, 2013 1:30 PM To: Marciniszyn, Mike; Elken, Tom; ewg@lists.openfabrics.org; Ido Shamai Subject: RE: Interop test failure using OFED-3.5 RC4 Adding Shamai from Mellanox to this thread. Woody -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg- boun...@lists.openfabrics.org] On Behalf Of Marciniszyn, Mike Sent: Friday, January 11, 2013 7:51 AM To: Elken, Tom; ewg@lists.openfabrics.org Subject: Re: [ewg] Interop test failure using OFED-3.5 RC4 This is definitely a perftest bug. This is a significant re-write of these utilities and this bug is a regression in the routine ctx_set_out_reads(). In 1.4 the code is this: / ** * **/ static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) { int max_reads; max_reads = (is_dev_hermon(context) == HERMON) ? MAX_OUT_READ_HERMON : MAX_OUT_READ;<--- if (num_user_reads > max_reads) { fprintf(stderr," Number of outstanding reads is above max = %d\n",max_reads); fprintf(stderr," Changing to that max value\n"); num_user_reads = max_reads; } else if (num_user_reads <= 0) { num_user_reads = max_reads; } return num_user_reads; } The new 2.0 code is: / ** * **/ static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) { int max_reads; Device ib_fdev = ib_dev_name(context); switch (ib_fdev) { case CONNECTIB : ; case CONNECTX3 : ; case CONNECTX2 : ; case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break; case LEGACY : max_reads = MAX_OUT_READ; break; default : max_reads = 0; < } if (num_user_reads > max_reads) { printf(RESULT_LINE); fprintf(stderr," Number of outstanding reads is above max = %d\n",max_reads); fprintf(stderr," Changing to that max value\n"); num_user_reads = max_reads; } else if (num_user_reads <= 0) { num_user_reads = max_reads; } return num_user_reads; } The old code will return MAX_OUT_READ, while the new code for any other HCAs (qib and probably others), will return 0. I have a patch that works, while preserving the desired hardcoded values for "known/legacy" devices: + +/*** *** + + * + +*** +***/ static int device_max_reads(struct ibv_context *context) { + struct ibv_device_attr attr; + int ret = 0; + + if (!ibv_query_device(context,&attr)) { + ret = attr.max_qp_rd_atom; + } + return ret; +} + / ** * **/ @@ -496,7 +510,7 @@ static int ctx_set_out_reads(struct ibv_ case CONNECTX2 : ; case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break; case LEGACY : max_reads = MAX_OUT_READ; break; - default : max_reads = 0; + default : max_reads = device_max_reads(context); } if (num_user_reads > max_reads) { I'm curious why the old and new code used hardcoded values? Mike ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Interop test failure using OFED-3.5 RC4
I've opened OFED bz 2410 for this issue. Mike > -Original Message- > From: Woodruff, Robert J > Sent: Friday, January 11, 2013 1:30 PM > To: Marciniszyn, Mike; Elken, Tom; ewg@lists.openfabrics.org; Ido Shamai > Subject: RE: Interop test failure using OFED-3.5 RC4 > > > Adding Shamai from Mellanox to this thread. > > Woody > > -Original Message- > From: ewg-boun...@lists.openfabrics.org [mailto:ewg- > boun...@lists.openfabrics.org] On Behalf Of Marciniszyn, Mike > Sent: Friday, January 11, 2013 7:51 AM > To: Elken, Tom; ewg@lists.openfabrics.org > Subject: Re: [ewg] Interop test failure using OFED-3.5 RC4 > > This is definitely a perftest bug. > > This is a significant re-write of these utilities and this bug is a > regression in the > routine ctx_set_out_reads(). > > In 1.4 the code is this: > / > ** > * > > > **/ > static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) { > > > int max_reads; > > max_reads = (is_dev_hermon(context) == HERMON) ? > MAX_OUT_READ_HERMON : MAX_OUT_READ;<--- > > if (num_user_reads > max_reads) { > fprintf(stderr," Number of outstanding reads is above max = > %d\n",max_reads); > fprintf(stderr," Changing to that max value\n"); > num_user_reads = max_reads; > } > else if (num_user_reads <= 0) { > num_user_reads = max_reads; > } > > return num_user_reads; > } > > The new 2.0 code is: > / > ** > * > > > **/ > static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) { > > > int max_reads; > > Device ib_fdev = ib_dev_name(context); > > switch (ib_fdev) { > case CONNECTIB : ; > case CONNECTX3 : ; > case CONNECTX2 : ; > case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break; > case LEGACY : max_reads = MAX_OUT_READ; break; > default : max_reads = 0; < > } > > if (num_user_reads > max_reads) { > printf(RESULT_LINE); > fprintf(stderr," Number of outstanding reads is above max = > %d\n",max_reads); > fprintf(stderr," Changing to that max value\n"); > num_user_reads = max_reads; > } > else if (num_user_reads <= 0) { > num_user_reads = max_reads; > } > > return num_user_reads; > } > > The old code will return MAX_OUT_READ, while the new code for any other > HCAs (qib and probably others), will return 0. > > I have a patch that works, while preserving the desired hardcoded values for > "known/legacy" devices: > + > +/*** > *** > + > + * > + > +*** > > +***/ static int device_max_reads(struct ibv_context *context) { > + struct ibv_device_attr attr; > + int ret = 0; > + > + if (!ibv_query_device(context,&attr)) { > + ret = attr.max_qp_rd_atom; > + } > + return ret; > +} > + > > / > ** > * > > > **/ > @@ -496,7 +510,7 @@ static int ctx_set_out_reads(struct ibv_ > case CONNECTX2 : ; > case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break; > case LEGACY : max_reads = MAX_OUT_READ; break; > - default : max_reads = 0; > + default : max_reads = device_max_reads(context); > } > > if (num_user_reads > max_reads) { > > I'm curious why the old and new code used hardcoded values? > > Mike > ___ > ewg mailing list > ewg@lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Interop test failure using OFED-3.5 RC4
Adding Shamai from Mellanox to this thread. Woody -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of Marciniszyn, Mike Sent: Friday, January 11, 2013 7:51 AM To: Elken, Tom; ewg@lists.openfabrics.org Subject: Re: [ewg] Interop test failure using OFED-3.5 RC4 This is definitely a perftest bug. This is a significant re-write of these utilities and this bug is a regression in the routine ctx_set_out_reads(). In 1.4 the code is this: /** * **/ static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) { int max_reads; max_reads = (is_dev_hermon(context) == HERMON) ? MAX_OUT_READ_HERMON : MAX_OUT_READ;<--- if (num_user_reads > max_reads) { fprintf(stderr," Number of outstanding reads is above max = %d\n",max_reads); fprintf(stderr," Changing to that max value\n"); num_user_reads = max_reads; } else if (num_user_reads <= 0) { num_user_reads = max_reads; } return num_user_reads; } The new 2.0 code is: /** * **/ static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) { int max_reads; Device ib_fdev = ib_dev_name(context); switch (ib_fdev) { case CONNECTIB : ; case CONNECTX3 : ; case CONNECTX2 : ; case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break; case LEGACY : max_reads = MAX_OUT_READ; break; default : max_reads = 0; < } if (num_user_reads > max_reads) { printf(RESULT_LINE); fprintf(stderr," Number of outstanding reads is above max = %d\n",max_reads); fprintf(stderr," Changing to that max value\n"); num_user_reads = max_reads; } else if (num_user_reads <= 0) { num_user_reads = max_reads; } return num_user_reads; } The old code will return MAX_OUT_READ, while the new code for any other HCAs (qib and probably others), will return 0. I have a patch that works, while preserving the desired hardcoded values for "known/legacy" devices: + +/** + * + **/ +static int device_max_reads(struct ibv_context *context) { + struct ibv_device_attr attr; + int ret = 0; + + if (!ibv_query_device(context,&attr)) { + ret = attr.max_qp_rd_atom; + } + return ret; +} + /** * **/ @@ -496,7 +510,7 @@ static int ctx_set_out_reads(struct ibv_ case CONNECTX2 : ; case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break; case LEGACY : max_reads = MAX_OUT_READ; break; - default : max_reads = 0; + default : max_reads = device_max_reads(context); } if (num_user_reads > max_reads) { I'm curious why the old and new code used hardcoded values? Mike ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Interop test failure using OFED-3.5 RC4
On 1/11/2013 7:20 AM, Hefty, Sean wrote: We have investigated and found that perftest was upgraded from v1.8 to v2.0 on 11/19/12, between RC3 and RC4. Hi, We did move from perftest-1.4 to perftest-2.0 last month. It has the same logic and results as the older version + plenty of new features. Can u tell me more of the problem? Ido Er, I meant "between RC2 and RC3." Why would there be a _major_ version change in any component done in the middle of a release cycle?! ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Interop test failure using OFED-3.5 RC4
This is definitely a perftest bug. This is a significant re-write of these utilities and this bug is a regression in the routine ctx_set_out_reads(). In 1.4 the code is this: /** * **/ static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) { int max_reads; max_reads = (is_dev_hermon(context) == HERMON) ? MAX_OUT_READ_HERMON : MAX_OUT_READ;<--- if (num_user_reads > max_reads) { fprintf(stderr," Number of outstanding reads is above max = %d\n",max_reads); fprintf(stderr," Changing to that max value\n"); num_user_reads = max_reads; } else if (num_user_reads <= 0) { num_user_reads = max_reads; } return num_user_reads; } The new 2.0 code is: /** * **/ static int ctx_set_out_reads(struct ibv_context *context,int num_user_reads) { int max_reads; Device ib_fdev = ib_dev_name(context); switch (ib_fdev) { case CONNECTIB : ; case CONNECTX3 : ; case CONNECTX2 : ; case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break; case LEGACY : max_reads = MAX_OUT_READ; break; default : max_reads = 0; < } if (num_user_reads > max_reads) { printf(RESULT_LINE); fprintf(stderr," Number of outstanding reads is above max = %d\n",max_reads); fprintf(stderr," Changing to that max value\n"); num_user_reads = max_reads; } else if (num_user_reads <= 0) { num_user_reads = max_reads; } return num_user_reads; } The old code will return MAX_OUT_READ, while the new code for any other HCAs (qib and probably others), will return 0. I have a patch that works, while preserving the desired hardcoded values for "known/legacy" devices: + +/** + * + **/ +static int device_max_reads(struct ibv_context *context) { + struct ibv_device_attr attr; + int ret = 0; + + if (!ibv_query_device(context,&attr)) { + ret = attr.max_qp_rd_atom; + } + return ret; +} + /** * **/ @@ -496,7 +510,7 @@ static int ctx_set_out_reads(struct ibv_ case CONNECTX2 : ; case CONNECTX : max_reads = MAX_OUT_READ_HERMON; break; case LEGACY : max_reads = MAX_OUT_READ; break; - default : max_reads = 0; + default : max_reads = device_max_reads(context); } if (num_user_reads > max_reads) { I'm curious why the old and new code used hardcoded values? Mike ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Interop test failure using OFED-3.5 RC4
Tom wrote, > The EWG standard practice is that if a significant bug fix goes in, we would > need another RC to enable others to easily test it. > But perhaps it depends on whether the bug is in perftest, qib or elsewhere. > In any case, we don't want a GA build until this > issue is solved. Yes, this will require another RC. Woody ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Interop test failure using OFED-3.5 RC4
> We have investigated and found that perftest was upgraded from v1.8 to v2.0 Tom, I was mistaken. The older perftest version is 1.4. Mike ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Interop test failure using OFED-3.5 RC4
> > We have investigated and found that perftest was upgraded from v1.8 to v2.0 > > on 11/19/12, between RC3 and RC4. > > Er, I meant "between RC2 and RC3." Why would there be a _major_ version change in any component done in the middle of a release cycle?! ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] Interop test failure using OFED-3.5 RC4
Rupert and the UNH-IOL pointed out that an Interop test which uses the ib_read_bw (perftest) benchmark fails on Intel True Scale HCAs. This test was succeeding with RC2, and started failing with RC3. I am sorry that our QA team did not find this bug with RC3. We have investigated and found that perftest was upgraded from v1.8 to v2.0 on 11/19/12, between RC3 and RC4. We verified that with the qib driver in OFED-3.5 RC4 and the perftest RPM from RC2, we pass the tests. We also ran a similar qperf RDMA read test with qperf and qib from RC4 and that test passed. We are working to isolate the bug and develop a fix. We suspect the perftest changes, but the ib_read_* benchmarks may just have changed enough to start checking a part of the spec which hasn't been tested before in Interop tests. So it may be a qib driver issue. The EWG standard practice is that if a significant bug fix goes in, we would need another RC to enable others to easily test it. But perhaps it depends on whether the bug is in perftest, qib or elsewhere. In any case, we don't want a GA build until this issue is solved. Regards, Tom ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] Interop test failure using OFED-3.5 RC4
> Rupert and the UNH-IOL pointed out that an Interop test which uses the > ib_read_bw (perftest) benchmark fails on Intel True Scale HCAs. > This test was succeeding with RC2, and started failing with RC3. I am sorry > that > our QA team did not find this bug with RC3. > > We have investigated and found that perftest was upgraded from v1.8 to v2.0 > on 11/19/12, between RC3 and RC4. Er, I meant "between RC2 and RC3." -Tom > We verified that with the qib driver in OFED-3.5 RC4 and the perftest RPM from > RC2, we pass the tests. > We also ran a similar qperf RDMA read test with qperf and qib from RC4 and > that > test passed. > > We are working to isolate the bug and develop a fix. We suspect the perftest > changes, but the ib_read_* benchmarks may just have changed enough to start > checking a part of the spec which hasn't been tested before in Interop tests. > So > it may be a qib driver issue. > > The EWG standard practice is that if a significant bug fix goes in, we would > need > another RC to enable others to easily test it. > But perhaps it depends on whether the bug is in perftest, qib or elsewhere. > In > any case, we don't want a GA build until this issue is solved. > > Regards, > Tom ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg