I did find the bug that causes the incorrect base version for outgoing MADs while in the receiver termination loop. I fixed the software on Tom's solaris system. I still do see the incoming ACK, but not many other unexpected packets from a brief look at the traces. A couple responses below as well.
- Sandip Hal Rosenstock wrote: > On Thu, 2005-01-20 at 13:45, Tom Duffy wrote: > >>On Wed, 2005-01-19 at 19:15 -0500, Hal Rosenstock wrote: >> >>>The OpenIB side wouldn't send an ACK so this message must indicate >>>something else. Any chance you can get an IB trace of what is going on ? >>>Alternatively can you dump out the MAD where the unsupported base >>>version is printed out ? It will take me a little bit before I am setup >>>to try to recreate this. >> >>Here is the analysis of what is going on (from Solaris's perspective). >> >>Sandip Barua wrote: >> >>>Based on the trace output I obtained this morning, >>>I am seeing a number of the following transactions: >>> >>>1) IBMF makes a request (non-RMPP send) > > > This is the SA GetTable request. > > >>>2) openIB responds with one RMPP MAD which is marked as both >>> the first and the last packet of the RMPP transaction > > > This is the SA GetTableResp. > > >>>3) IBMF sends an ACK > > > Not 100% sure what the OpenIB side would do with this. The ACK is an SA > packet which should have the MAD header properly filled in. > > Can you insert some code to dump the first 32 bytes of the MAD when the > base version is unsupported (in the if clause at line 1463) > and send me the output ? > > Also, is there anything in the OSM logs ? > > >>>At this point, the transaction should be complete, > > > Agreed. > > >> but MADs continue >> >>>to arrive. In one case an ACK arrives which is an >>>illegal packet to receive while in receiver terminate loop, > > > OpenIB doesn't generate ACKs so this is a mystery to me. > > >> so, ibmf >> >>>returns an ABORT. > > > Yes, if the receiver is terminating, receiving an ACK with IsDS 0 should > cause an ABORT to be sent. > > >>In the remaining cases, DATA packets are being >> >>>received after the transaction is complete, > > > This is also a mystery to me. I don't know what these additional DATA > packets from OpenIB would be. > > >> so, ibmf drops them and >> >>>returns ACKs if in the receiver termination loop. > > > Looks to me like these should also be ABORTed with BadT in the receiver > termination loop. My interpretation is that if a DATA packet arrives in the receiver termination loop, an ACK should be sent. If an ACK arrives, and the transaction is not double-sided, an ABORT should be sent. If any other packet arrives, an ABORT should be sent. > > >>>From the IBMF side, the following packets are being sent, >>>o request data packet (non-rmpp, will not have base version set) > > > Do you mean RMPP version (since RMPP is not active) ? Doesn't base > version always need setting ? Yes. I was looking at the wrong version field. Sorry about that. > > >>>o ACK packet on completion of the transaction >>>o ABORT packet on receiving an ACK >>> >>>Based on my review of the code, and the occurrence of certain >>>messages, it looks to me like the base version should be set on all >>>outgoing ACK and ABORT packets. > > > Thanks for the detailed analysis. > > -- Hal > > > - Sandip _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
