Yes I think it is a sync problem when close the connection, the sender sent a zero byte message with immediate data. The receier received the message correctly and destroy the coresponding QP immediate. The sender got the completion with status=12.
If I delay the QP destroying, the code works fine. --CQ > -----Original Message----- > From: Dotan Barak [mailto:[EMAIL PROTECTED] > Sent: Monday, October 29, 2007 6:47 AM > To: Tang, Changqing > Cc: Sean Hefty; Roland Dreier; [email protected] > Subject: Re: [ofa-general] message is received but sender > report error. > > If you are not connecting the QPs using CM, maybe you have a > sync problem? > one side (the sender) is in RTS and the other side isn't in > RTR (or a sync problem when closing the connection) > > Dotan > > Tang, Changqing wrote: > > The timeout is 18 (~1sec), and retry is 7 (max). > > > > The error only occurs 1% of runs, sometimes I run the same > hello_world code in a loop, and caught it after 1500 runs. So > I don't think it is a cable issue(but I have not checked the > port error counter). > > > > --CQ > > > > > >> -----Original Message----- > >> From: Dotan Barak [mailto:[EMAIL PROTECTED] > >> Sent: Sunday, October 28, 2007 2:48 AM > >> To: Tang, Changqing > >> Cc: Sean Hefty; Roland Dreier; [email protected] > >> Subject: Re: [ofa-general] message is received but sender report > >> error. > >> > >> Hi. > >> > >> Maybe you should increase your timeout/retry count for your > >> application? > >> can you check the ports error counters (using perfquery) maybe you > >> have bad cables in your subnet .... > >> > >> Dotan > >> > >> Tang, Changqing wrote: > >> > >>> This is Verbs layer code, no IB CM is used. > >>> > >>> --CQ > >>> > >>> > >>> > >>>> -----Original Message----- > >>>> From: Sean Hefty [mailto:[EMAIL PROTECTED] > >>>> Sent: Thursday, October 25, 2007 12:38 PM > >>>> To: Tang, Changqing; Roland Dreier > >>>> Cc: [email protected] > >>>> Subject: RE: [ofa-general] message is received but sender report > >>>> error. > >>>> > >>>> > >>>> > >>>>> If this is the case, how would we fix the problem ? It's > >>>>> > >>>>> > >>>> hard for us to > >>>> > >>>> > >>>>> delay to destroy the QP, because we don't know how long > to delay. > >>>>> The other way is to do something from the driver, or firmware. > >>>>> > >>>>> > >>>> Do you disconnect the QPs using the IB CM? > >>>> > >>>> - Sean > >>>> > >>>> > >>>> > >>> _______________________________________________ > >>> general mailing list > >>> [email protected] > >>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > >>> > >>> To unsubscribe, please visit > >>> http://openib.org/mailman/listinfo/openib-general > >>> > >>> > >>> > >> > > > > > > _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
