Re: [openib-general] Re: RMPP Message Format Errors
On Tue, 2005-09-20 at 07:53, Hal Rosenstock wrote: > On Tue, 2005-09-20 at 06:46, Eitan Zahavi wrote: > > Hal Rosenstock wrote: > > > On Tue, 2005-09-20 at 06:33, Eitan Zahavi wrote: > > > > > >>>Is this what you are referring to ? > > >> > > >>Yes the line of interest is: > > >>__osmv_sa_mad_rcv_cb: Count = 7 = 800 / 112 (16) > > >>This shows 16byte extra in the data size. > > > > > > > > > Should it be 20 for the SA class header size or 0 here ? > > Should be 0. Means an the packet size should accommodate an integer number > > of > > SA records (after removing the headers size). > > OK. There's a problem or problems on the receive side (of RMPP) to look > into but these appear OK for SA client right now. Patch coming shortly for this. -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RMPP Message Format Errors
On Tue, 2005-09-20 at 06:46, Eitan Zahavi wrote: > Hal Rosenstock wrote: > > On Tue, 2005-09-20 at 06:33, Eitan Zahavi wrote: > > > >>>Is this what you are referring to ? > >> > >>Yes the line of interest is: > >>__osmv_sa_mad_rcv_cb: Count = 7 = 800 / 112 (16) > >>This shows 16byte extra in the data size. > > > > > > Should it be 20 for the SA class header size or 0 here ? > Should be 0. Means an the packet size should accommodate an integer number of > SA records (after removing the headers size). OK. There's a problem or problems on the receive side (of RMPP) to look into but these appear OK for SA client right now. > >>>I do also see: > >>>Sep 20 05:16:40 995667 [AB001140] -> osmt_get_service_by_name: ERR > > > > 0370: > > > >>>ib_query failed (IB_REMOTE_ERROR). > >>>Sep 20 05:16:40 995673 [AB001140] -> osmt_get_service_by_name: > > > > Remote > > > >>>error = IB_SA_MAD_STATUS_NO_RECORDS. > >>>Sep 20 05:16:40 995678 [AB001140] -> osmt_get_service_by_name: > > > > Expected > > > >>>num of records is : 1, Found number of records : 0 > >> > >>The full osmtest flow has some intentional errors injected. > >>If it provides the "PASSED" message at the end it means that > >>the errors were intentional and expected. > >> > >>Some of the flows have a special output message that wraps the > >>errors in a section like: > >> > >>"vv" > >>... > >>"^^" > >> > >>We probably need to apply this convention to all the "bad flows". > > > > > > Then is expected number of records in this test 1 rather than 0 ? Will these be fixed ? Are these issues being documented along with other ones previously noted ? -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RMPP Message Format Errors
Hal Rosenstock wrote: On Tue, 2005-09-20 at 06:33, Eitan Zahavi wrote: Is this what you are referring to ? Yes the line of interest is: __osmv_sa_mad_rcv_cb: Count = 7 = 800 / 112 (16) This shows 16byte extra in the data size. Should it be 20 for the SA class header size or 0 here ? Should be 0. Means an the packet size should accommodate an integer number of SA records (after removing the headers size). I do also see: Sep 20 05:16:40 995667 [AB001140] -> osmt_get_service_by_name: ERR 0370: ib_query failed (IB_REMOTE_ERROR). Sep 20 05:16:40 995673 [AB001140] -> osmt_get_service_by_name: Remote error = IB_SA_MAD_STATUS_NO_RECORDS. Sep 20 05:16:40 995678 [AB001140] -> osmt_get_service_by_name: Expected num of records is : 1, Found number of records : 0 The full osmtest flow has some intentional errors injected. If it provides the "PASSED" message at the end it means that the errors were intentional and expected. Some of the flows have a special output message that wraps the errors in a section like: "vv" ... "^^" We probably need to apply this convention to all the "bad flows". Then is expected number of records in this test 1 rather than 0 ? -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RMPP Message Format Errors
On Tue, 2005-09-20 at 06:33, Eitan Zahavi wrote: > > Is this what you are referring to ? > Yes the line of interest is: > __osmv_sa_mad_rcv_cb: Count = 7 = 800 / 112 (16) > This shows 16byte extra in the data size. Should it be 20 for the SA class header size or 0 here ? > > I do also see: > > Sep 20 05:16:40 995667 [AB001140] -> osmt_get_service_by_name: ERR 0370: > > ib_query failed (IB_REMOTE_ERROR). > > Sep 20 05:16:40 995673 [AB001140] -> osmt_get_service_by_name: Remote > > error = IB_SA_MAD_STATUS_NO_RECORDS. > > Sep 20 05:16:40 995678 [AB001140] -> osmt_get_service_by_name: Expected > > num of records is : 1, Found number of records : 0 > The full osmtest flow has some intentional errors injected. > If it provides the "PASSED" message at the end it means that > the errors were intentional and expected. > > Some of the flows have a special output message that wraps the > errors in a section like: > > "vv" > ... > "^^" > > We probably need to apply this convention to all the "bad flows". Then is expected number of records in this test 1 rather than 0 ? -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RMPP Message Format Errors
Hal Rosenstock wrote: On Tue, 2005-09-20 at 04:48, Eitan Zahavi wrote: Hi Hal, Seems like RMPP works ! Is this what you are referring to ? Yes the line of interest is: __osmv_sa_mad_rcv_cb: Count = 7 = 800 / 112 (16) This shows 16byte extra in the data size. I do also see: Sep 20 05:16:40 995667 [AB001140] -> osmt_get_service_by_name: ERR 0370: ib_query failed (IB_REMOTE_ERROR). Sep 20 05:16:40 995673 [AB001140] -> osmt_get_service_by_name: Remote error = IB_SA_MAD_STATUS_NO_RECORDS. Sep 20 05:16:40 995678 [AB001140] -> osmt_get_service_by_name: Expected num of records is : 1, Found number of records : 0 The full osmtest flow has some intentional errors injected. If it provides the "PASSED" message at the end it means that the errors were intentional and expected. Some of the flows have a special output message that wraps the errors in a section like: "vv" ... "^^" We probably need to apply this convention to all the "bad flows". EZ ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RMPP Message Format Errors
Hi Eitan, On Tue, 2005-09-20 at 04:48, Eitan Zahavi wrote: > Hi Hal, > > Seems like RMPP works ! Yippee :-) > This is an important milestone for OpenSM as we are now able to test the > SM/SA with osmtest. and also for Solaris. > There is still some constant 8 bytes remainder in the RMPP number of received > records calculation > (see osmtest -V log file) but this is minor (as no SA record is that small). It sounds like there is still a calculation slightly off. I don't see a constant off by 8 remainder issue. In my configuration most seem fine and the only one which is not off by 20 (SA class header size) is the following: Sep 20 05:17:36 292850 [40FFF960] -> osm_vendor_get: Acquired UMAD 0x53cd40, size = 856. Sep 20 05:17:36 292861 [40FFF960] -> osm_vendor_get: ] Sep 20 05:17:36 292870 [40FFF960] -> osm_mad_pool_get: Acquired p_madw = 0x536190, p_mad = 0x53cd78, size = 856. Sep 20 05:17:36 292880 [40FFF960] -> osm_mad_pool_get: ] Sep 20 05:17:36 292889 [40FFF960] -> __osmv_sa_mad_rcv_cb: [ Sep 20 05:17:36 292899 [40FFF960] -> __osmv_sa_mad_rcv_cb: Count = 7 = 800 / 112 (16) Sep 20 05:17:36 292909 [40FFF960] -> osmtest_query_res_cb: [ Sep 20 05:17:36 292918 [40FFF960] -> osmtest_query_res_cb: ] Sep 20 05:17:36 292932 [40FFF960] -> __osmv_sa_mad_rcv_cb: ] Sep 20 05:17:36 292938 [AB001140] -> __osmv_send_sa_req: ] Sep 20 05:17:36 292971 [AB001140] -> osmv_query_sa: ] Sep 20 05:17:36 292980 [AB001140] -> osmtest_get_all_recs: ] Sep 20 05:17:36 292989 [AB001140] -> osmtest_validate_all_node_recs: Received 7 records. Is this what you are referring to ? I do also see: Sep 20 05:16:40 995667 [AB001140] -> osmt_get_service_by_name: ERR 0370: ib_query failed (IB_REMOTE_ERROR). Sep 20 05:16:40 995673 [AB001140] -> osmt_get_service_by_name: Remote error = IB_SA_MAD_STATUS_NO_RECORDS. Sep 20 05:16:40 995678 [AB001140] -> osmt_get_service_by_name: Expected num of records is : 1, Found number of records : 0 and some timeouts: Sep 20 05:17:40 644730 [40FFF960] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=12) -- dropping. Sep 20 05:17:40 644740 [40FFF960] -> umad_receiver: ERR 5410: class 0x3 LID 0x0 Sep 20 05:17:40 644750 [40FFF960] -> __osmv_sa_mad_err_cb: [ Sep 20 05:17:40 644760 [40FFF960] -> osmtest_query_res_cb: [ Sep 20 05:17:40 644769 [40FFF960] -> osmtest_query_res_cb: ERR 0003: Error on query (IB_TIMEOUT). Sep 20 05:17:40 644787 [40FFF960] -> osmtest_query_res_cb: ] Sep 20 05:17:40 644801 [40FFF960] -> __osmv_sa_mad_err_cb: ] which then resulted in: Sep 20 05:17:40 644955 [AB001140] -> osmtest_wrong_sm_key_ignored: ERR 0011: Did not get a timeout but got (IB_SUCCESS). > Thanks for your continuous support. > > Eitan > > Hal Rosenstock wrote: > > Hi Eitan, > > > > The send side RMPP changes for the truncation of the last SA > > record have now stabilized. With the latest user_mad.c and > > osm_vendor_ibumad.c changes which are in the OpenIB svn tree (svn > > revision 3485), this is ready to be verified again. It safe to come out > > now :-) > > > > -- Hal > > > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RMPP Message Format Errors
On Tue, 2005-09-20 at 04:48, Eitan Zahavi wrote: > Hi Hal, > > Seems like RMPP works ! Yippee :-) > This is an important milestone for OpenSM as we are now able to test the > SM/SA with osmtest. and also for Solaris. > There is still some constant 8 bytes remainder in the RMPP number of received > records calculation > (see osmtest -V log file) but this is minor (as no SA record is that small). It sounds like there is still a calculation slightly off. I don't see a constant off by 8 remainder issue. In my configuration most seem fine and the only one which is not off by 20 (SA class header size) is the following: Sep 20 05:17:36 292850 [40FFF960] -> osm_vendor_get: Acquired UMAD 0x53cd40, size = 856. Sep 20 05:17:36 292861 [40FFF960] -> osm_vendor_get: ] Sep 20 05:17:36 292870 [40FFF960] -> osm_mad_pool_get: Acquired p_madw = 0x536190, p_mad = 0x53cd78, size = 856. Sep 20 05:17:36 292880 [40FFF960] -> osm_mad_pool_get: ] Sep 20 05:17:36 292889 [40FFF960] -> __osmv_sa_mad_rcv_cb: [ Sep 20 05:17:36 292899 [40FFF960] -> __osmv_sa_mad_rcv_cb: Count = 7 = 800 / 112 (16) Sep 20 05:17:36 292909 [40FFF960] -> osmtest_query_res_cb: [ Sep 20 05:17:36 292918 [40FFF960] -> osmtest_query_res_cb: ] Sep 20 05:17:36 292932 [40FFF960] -> __osmv_sa_mad_rcv_cb: ] Sep 20 05:17:36 292938 [AB001140] -> __osmv_send_sa_req: ] Sep 20 05:17:36 292971 [AB001140] -> osmv_query_sa: ] Sep 20 05:17:36 292980 [AB001140] -> osmtest_get_all_recs: ] Sep 20 05:17:36 292989 [AB001140] -> osmtest_validate_all_node_recs: Received 7 records. Is this what you are referring to ? I do also see: Sep 20 05:16:40 995667 [AB001140] -> osmt_get_service_by_name: ERR 0370: ib_query failed (IB_REMOTE_ERROR). Sep 20 05:16:40 995673 [AB001140] -> osmt_get_service_by_name: Remote error = IB_SA_MAD_STATUS_NO_RECORDS. Sep 20 05:16:40 995678 [AB001140] -> osmt_get_service_by_name: Expected num of records is : 1, Found number of records : 0 and some timeouts: Sep 20 05:17:40 644730 [40FFF960] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=12) -- dropping. Sep 20 05:17:40 644740 [40FFF960] -> umad_receiver: ERR 5410: class 0x3 LID 0x0 Sep 20 05:17:40 644750 [40FFF960] -> __osmv_sa_mad_err_cb: [ Sep 20 05:17:40 644760 [40FFF960] -> osmtest_query_res_cb: [ Sep 20 05:17:40 644769 [40FFF960] -> osmtest_query_res_cb: ERR 0003: Error on query (IB_TIMEOUT). Sep 20 05:17:40 644787 [40FFF960] -> osmtest_query_res_cb: ] Sep 20 05:17:40 644801 [40FFF960] -> __osmv_sa_mad_err_cb: ] which then resulted in: Sep 20 05:17:40 644955 [AB001140] -> osmtest_wrong_sm_key_ignored: ERR 0011: Did not get a timeout but got (IB_SUCCESS). > Thanks for your continuous support. > > Eitan > > Hal Rosenstock wrote: > > Hi Eitan, > > > > The send side RMPP changes for the truncation of the last SA > > record have now stabilized. With the latest user_mad.c and > > osm_vendor_ibumad.c changes which are in the OpenIB svn tree (svn > > revision 3485), this is ready to be verified again. It safe to come out > > now :-) > > > > -- Hal > > > ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RMPP Message Format Errors
Hi Hal, Seems like RMPP works ! This is an important milestone for OpenSM as we are now able to test the SM/SA with osmtest. There is still some constant 8 bytes remainder in the RMPP number of received records calculation (see osmtest -V log file) but this is minor (as no SA record is that small). Thanks for your continuous support. Eitan Hal Rosenstock wrote: Hi Eitan, The send side RMPP changes for the truncation of the last SA record have now stabilized. With the latest user_mad.c and osm_vendor_ibumad.c changes which are in the OpenIB svn tree (svn revision 3485), this is ready to be verified again. It safe to come out now :-) -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RMPP fixes for 2.6.14
Quoting r. Roland Dreier <[EMAIL PROTECTED]>: > Subject: Re: RMPP fixes for 2.6.14 > > Michael> Yes, thats what I was referring to. Too late for 2.6.14? > > Probably. I wouldn't be comfortable pushing that into all the arch > trees through my git tree. I think we would need to go through lkml, > and I think 2.6.14 will be closed to this sort of stuff around Friday. It'll wait then. I wont be online this weekend. -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RMPP fixes for 2.6.14
Michael> Yes, thats what I was referring to. Too late for 2.6.14? Probably. I wouldn't be comfortable pushing that into all the arch trees through my git tree. I think we would need to go through lkml, and I think 2.6.14 will be closed to this sort of stuff around Friday. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RMPP fixes for 2.6.14
Quoting r. Roland Dreier <[EMAIL PROTECTED]>: > Subject: Re: RMPP fixes for 2.6.14 > > Michael> Roland, what do you say to the idea of moving > Michael> mthca_doorbell.h to somewhere under include/asm? Its not > Michael> really mthca specific, is it? > > Some of it definitely seems like it could be made generic. I'm not > sure whether mthca_write_db_rec() is worth it, but the write64() > emulation with a lock might be worth it on 32-bit systems. Yes, thats what I was referring to. Too late for 2.6.14? -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RMPP fixes for 2.6.14
Michael> Roland, what do you say to the idea of moving Michael> mthca_doorbell.h to somewhere under include/asm? Its not Michael> really mthca specific, is it? Some of it definitely seems like it could be made generic. I'm not sure whether mthca_write_db_rec() is worth it, but the write64() emulation with a lock might be worth it on 32-bit systems. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RMPP fixes for 2.6.14
Quoting r. Roland Dreier <[EMAIL PROTECTED]>: > But letting me know about anything that I'm missing > would be good. Roland, what do you say to the idea of moving mthca_doorbell.h to somewhere under include/asm? Its not really mthca specific, is it? -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RMPP fixes for 2.6.14
Michael> What should I be looking at? Linus's git? You can look at my git: http://www.kernel.org/git/?p=linux/kernel/git/roland/infiniband.git;a=summary I just pushed a few more things, so it will take a few more minutes to propagate to all the mirrors. In my previous email, I was a little unclear. I was just asking for more RMPP changes specifically, since I know there's something to merge there. But letting me know about anything that I'm missing would be good. Michael> The qp->wait init patch :) Yes, that's in there. I have the following in my git tree on top of what's already in Linus's tree: Michael S. Tsirkin: IPoIB: fix memory leak IB/sa_query: avoid unnecessary list scan IB: Initialize qp->wait Roland Dreier: IB: really reset QPs Sean Hefty: IB: Add user-supplied context to userspace CM ABI - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RMPP fixes for 2.6.14
Quoting r. Roland Dreier <[EMAIL PROTECTED]>: > Subject: RMPP fixes for 2.6.14 > > I found this RMPP difference between the current kernel and our > subversion tree. What should I be looking at? Linus's git? > Is there anything else that needs to be merged for > the kernel 2.6.14 tree? > > - R. The qp->wait init patch :) -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RMPP Message Format Errors (Short Term Plan)
Hi Eitan, On Sat, 2005-08-27 at 10:59, Eitan Zahavi wrote: > Once you think both sender and receiver side issues are resolved please > let us know so I can re-run the test with the IB Analyzer. With r3251, the RMPP issues are resolved as far as I know. [IMO, The only thing waiting for full closure if verification from Greg (MgtWG).] Let me know if it works or if you find any further issues. -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: RMPP Message Format Errors
On Tue, 2005-08-30 at 12:49, Sean Hefty wrote: > The interpretation of payload length for the first segment value looks > correct. For the middle segments, 0 should work in all cases and may be > a slightly cleaner solution. OK. I'm going ahead with these changes. -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: RMPP Message Format Errors
Hal Rosenstock wrote: Hal, can you go ahead and commit your two patches for payload length changes for RMPP? Do you think this is the correct interpretation ? If so, I will go ahead. I was waiting for confirmation. The interpretation of payload length for the first segment value looks correct. For the middle segments, 0 should work in all cases and may be a slightly cleaner solution. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: RMPP Message Format Errors
Hi Sean, On Tue, 2005-08-30 at 12:31, Sean Hefty wrote: > Hal Rosenstock wrote: > > I already submitted a patch for this. It wasn't clear to me what the > > answer for the first segment is from Greg's response (so I sent a > > followup to clarify that). > > Hal, can you go ahead and commit your two patches for payload length > changes for RMPP? Do you think this is the correct interpretation ? If so, I will go ahead. I was waiting for confirmation. -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: RMPP Message Format Errors
Hal Rosenstock wrote: I already submitted a patch for this. It wasn't clear to me what the answer for the first segment is from Greg's response (so I sent a followup to clarify that). Hal, can you go ahead and commit your two patches for payload length changes for RMPP? - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: RMPP Message Format Errors
On Tue, 2005-08-30 at 03:04, Eitan Zahavi wrote: > > It's not a big deal to change it. If the common interpretation is to only > > include the partial data size, I will change it. > I think the common interpretation is that the paylen n the first segment > should present the size of the "valid" data only. I already submitted a patch for this. It wasn't clear to me what the answer for the first segment is from Greg's response (so I sent a followup to clarify that). -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: RMPP Message Format Errors
Sean Hefty wrote: In my interpretation, partial data is indicated by the PayloadLength field in the last segment only. It's quite possible that my interpretation is incorrect, in which case the calculation in the RMPP code is off. I agree the text might be missing an example or two for clarification. Anyway, we probably can use the IB Analyzer as the ultimate interpretation test. Note that there are IB implementations that uses the first segment payload length as the source of packet length and count on it to represent the correct DATA length. We can take your interpretation to discussion in the IBTA MGTWG for further discussion. Is the effort for fixing it big? It's not a big deal to change it. If the common interpretation is to only include the partial data size, I will change it. I think the common interpretation is that the paylen n the first segment should present the size of the "valid" data only. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] Re: RMPP Message Format Errors
>> In my interpretation, partial data is indicated by the PayloadLength field in >> the last segment only. It's quite possible that my interpretation is >incorrect, >> in which case the calculation in the RMPP code is off. >I agree the text might be missing an example or two for clarification. >Anyway, we probably can use the IB Analyzer as the ultimate >interpretation test. Note that there are IB implementations that uses >the first segment payload length as the source of packet length and >count on it to represent the correct DATA length. > >We can take your interpretation to discussion in the IBTA MGTWG for >further discussion. >Is the effort for fixing it big? It's not a big deal to change it. If the common interpretation is to only include the partial data size, I will change it. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RMPP Message Format Errors (Short Term Plan)
Hi Eitan, On Sat, 2005-08-27 at 10:59, Eitan Zahavi wrote: > Thanks for taking care of the RMPP issues. > Once you think both sender and receiver side issues are resolved please > let us know so I can re-run the test with the IB Analyzer. I think they are now resolved with the one line patch (which should help the analyzer which I don't think impacts the end nodes). -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: RMPP Message Format Errors
Hi Eitan, On Sat, 2005-08-27 at 11:14, Eitan Zahavi wrote: > Sean Hefty wrote: > > > > > I believe that the 220 byte payload length is for all RMPP MADs. Only the > > common and RMPP header lengths are ignored. > Yes. > > > > > > > >>Doesn't it need to account for a "partial" rather than full last segment > >>transferred data in the first segment length ? > Yes I think it needs to use the partial length. Agreed. > > What I couldn't easily tell from the spec is whether a partial last segment > > is > > included in the initial payload length or not. I read it as: "PayloadLength > > counts all the bytes in the TransferredData field of the DATA packet > > format." > > In my interpretation, partial data is indicated by the PayloadLength field > > in > > the last segment only. It's quite possible that my interpretation is > > incorrect, > > in which case the calculation in the RMPP code is off. > I agree the text might be missing an example or two for clarification. > Anyway, we probably can use the IB Analyzer as the ultimate > interpretation test. Note that there are IB implementations that uses > the first segment payload length as the source of packet length and > count on it to represent the correct DATA length. > > We can take your interpretation to discussion in the IBTA MGTWG for > further discussion. I think the spec wording is ambiguous and we should take it to the MgtWG. I believe your interpretation is the intent but could not find any specific language other than the valid bytes in terms of the last segment. The first segment length references transferred data which is the whole segment. I'll send something to MgtWG on this and copy openib-general. > Is the effort for fixing it big? It's a one line patch. I sent it previously. -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] Re: RMPP Message Format Errors
Hi Sean, On Fri, 2005-08-26 at 13:54, Sean Hefty wrote: > >The 220 byte payload length is for SA. That's mostly right but assumes > >the last segment will be full (and accounted for by the paylen in the > >last segment). > > I believe that the 220 byte payload length is for all RMPP MADs. Yes, you're right. > Only the > common and RMPP header lengths are ignored. > > > >Doesn't it need to account for a "partial" rather than full last segment > >transferred data in the first segment length ? > > What I couldn't easily tell from the spec is whether a partial last segment is > included in the initial payload length or not. I read it as: "PayloadLength > counts all the bytes in the TransferredData field of the DATA packet format." > In my interpretation, partial data is indicated by the PayloadLength field in > the last segment only. It's quite possible that my interpretation is > incorrect, > in which case the calculation in the RMPP code is off. I'm pretty sure that the intent is that the length in the first segment reflects the valid data (plus the header to be counted) so the last segment doesn't count as a full length (220) unless it is full. Patch for this shortly. -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: RMPP Message Format Errors
Sean Hefty wrote: I believe that the 220 byte payload length is for all RMPP MADs. Only the common and RMPP header lengths are ignored. Yes. Doesn't it need to account for a "partial" rather than full last segment transferred data in the first segment length ? Yes I think it needs to use the partial length. What I couldn't easily tell from the spec is whether a partial last segment is included in the initial payload length or not. I read it as: "PayloadLength counts all the bytes in the TransferredData field of the DATA packet format." In my interpretation, partial data is indicated by the PayloadLength field in the last segment only. It's quite possible that my interpretation is incorrect, in which case the calculation in the RMPP code is off. I agree the text might be missing an example or two for clarification. Anyway, we probably can use the IB Analyzer as the ultimate interpretation test. Note that there are IB implementations that uses the first segment payload length as the source of packet length and count on it to represent the correct DATA length. We can take your interpretation to discussion in the IBTA MGTWG for further discussion. Is the effort for fixing it big? Thanks Eitan ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RMPP Message Format Errors (Short Term Plan)
Hi Hal, Thanks for taking care of the RMPP issues. Once you think both sender and receiver side issues are resolved please let us know so I can re-run the test with the IB Analyzer. Eitan Hal Rosenstock wrote: Hi, I will finish with RMPP and then embark on the 1.8.0 merge. I hope and expect to start the latter early next week. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] Re: RMPP Message Format Errors
>The 220 byte payload length is for SA. That's mostly right but assumes >the last segment will be full (and accounted for by the paylen in the >last segment). I believe that the 220 byte payload length is for all RMPP MADs. Only the common and RMPP header lengths are ignored. >Doesn't it need to account for a "partial" rather than full last segment >transferred data in the first segment length ? What I couldn't easily tell from the spec is whether a partial last segment is included in the initial payload length or not. I read it as: "PayloadLength counts all the bytes in the TransferredData field of the DATA packet format." In my interpretation, partial data is indicated by the PayloadLength field in the last segment only. It's quite possible that my interpretation is incorrect, in which case the calculation in the RMPP code is off. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] Re: RMPP Message Format Errors
On Fri, 2005-08-26 at 01:16, Sean Hefty wrote: > >In any case, doesn't the initial payload length need to be the number of > >segments times (hdr_len - offsetof(struct ib_rmpp_mad, data)) + data_len > >? If so, that's part of the problem. > > I believe that the payload is being calculated correctly. It should be the > number of segments * 220 bytes per packet, or at least that was my > interpretation of the spec. The 220 byte payload length is for SA. That's mostly right but assumes the last segment will be full (and accounted for by the paylen in the last segment). Doesn't it need to account for a "partial" rather than full last segment transferred data in the first segment length ? -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RMPP Message Format Errors
Hi Eitan, On Fri, 2005-08-26 at 06:15, Eitan Zahavi wrote: > I am trying to figure out how a client will figure out the number of > records provided in the mad it gets back from umad. > > Can you describe this? A client would use the received length returned from umad_recv and either the attribute offset in the RMPP header (or expected attribute offset for record type) to calculate this (in the case of an SA client). For other classes, it is class specific. I think there is a problem in osm_vendor_ibumad_sa.c::__osmv_sa_mad_rcv_cb which I will be working on as soon as I sort through the send side issues. -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RMPP Message Format Errors
Hi Hal, I am trying to figure out how a client will figure out the number of records provided in the mad it gets back from umad. Can you describe this? Thanks Eitan Hal Rosenstock wrote: On Mon, 2005-08-22 at 10:34, Eitan Zahavi wrote: It gets a "real" received length provided it supplies a buffer large enough. So I guess the "real receive length" is truncated to the last data record even if the packet sent was 256 bytes? The receive buffer is not truncated. An error is returned if the buffer supplied is too small for a receive is too small and it includes the size of the buffer needed. I don't understand what you mean by "even if the packet sent was 256 bytes". -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] Re: RMPP Message Format Errors
>if (rmpp_active) { > ... >rmpp_mad->rmpp_hdr.paylen_newwin = cpu_to_be32(hdr_len - >offsetof(struct ib_rmpp_mad, data) + data_len); > >Then in mad_rmpp.c::send_next_seg, I see: > >if (mad_send_wr->seg_num == 1) { >rmpp_mad->rmpp_hdr.rmpp_rtime_flags |= IB_MGMT_RMPP_FLAG_FIRST; >rmpp_mad->rmpp_hdr.paylen_newwin = >cpu_to_be32(mad_send_wr->total_seg * >(sizeof(struct ib_rmpp_mad) - > offsetof(struct ib_rmpp_mad, data))); >That appears to me to overwrite the initial paylen but I might have >missed something here. The payload is being overridden, but that's necessary. The payload that's set when creating the MAD is used to indicate the size of the buffer. The payload set with the 1st segment indicates the size of the transfer. They differ because the headers are duplicated in each segment, but only a single copy is provided in the send buffer. >In any case, doesn't the initial payload length need to be the number of >segments times (hdr_len - offsetof(struct ib_rmpp_mad, data)) + data_len >? If so, that's part of the problem. I believe that the payload is being calculated correctly. It should be the number of segments * 220 bytes per packet, or at least that was my interpretation of the spec. >Another alternative would be not to set paylen in the first segment. That would work. I tried to set the value to allow future optimization on the receive side. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RMPP Message Format Errors
Hi Sean, In mad.c::ib_create_send_mad, if rmpp is active, the payload length is calculated as follows: if (rmpp_active) { ... rmpp_mad->rmpp_hdr.paylen_newwin = cpu_to_be32(hdr_len - offsetof(struct ib_rmpp_mad, data) + data_len); Then in mad_rmpp.c::send_next_seg, I see: if (mad_send_wr->seg_num == 1) { rmpp_mad->rmpp_hdr.rmpp_rtime_flags |= IB_MGMT_RMPP_FLAG_FIRST; rmpp_mad->rmpp_hdr.paylen_newwin = cpu_to_be32(mad_send_wr->total_seg * (sizeof(struct ib_rmpp_mad) - offsetof(struct ib_rmpp_mad, data))); That appears to me to overwrite the initial paylen but I might have missed something here. In any case, doesn't the initial payload length need to be the number of segments times (hdr_len - offsetof(struct ib_rmpp_mad, data)) + data_len ? If so, that's part of the problem. Another alternative would be not to set paylen in the first segment. -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] RE: RMPP Message Format Errors
On Wed, 2005-08-24 at 04:07, Sean Hefty wrote: In the below, c /opensm/osm vendor layer/ (It is also used by some SA client code in addition to OpenSM. > Looking through the code, it appears that the proper size of the MAD is being > reported in the kernel and exported up to userspace. If I guessed the > structure > of the opensm code correctly, the length is returned by umad_recv() in > umad_receiver() in osm_vendor_ibumad.c The length is discarded after > umad_receiver() returns. You "guessed" correctly :-) > I guess that one possible solution is for opensm to save the length value into > the payload_length field in the RMPP header before returning from > umad_receiver(). Yes, that is a possible solution if it is needed on the receive side. It looks to me like it is currently unused (based on method, received size, and attribute offset). but it is probably a good idea to do this for the future as another algorithm would work and might be better. I will put this on my TODO list. -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RMPP Message Format Errors
On Mon, 2005-08-22 at 10:34, Eitan Zahavi wrote: > > It gets a "real" received length provided it supplies a buffer large enough. > So I guess the "real receive length" is truncated to the last data > record even if the packet sent was 256 bytes? The receive buffer is not truncated. An error is returned if the buffer supplied is too small for a receive is too small and it includes the size of the buffer needed. I don't understand what you mean by "even if the packet sent was 256 bytes". -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] RE: RMPP Message Format Errors
>> But the receive side needs to calculate back the correct size of the >assembled MAD. >> If it is done in kernel or user it does not matter. To my best knowledge the >only way to calculate how many records are enclosed in an RMPP message is to >use the paylen and offset. >> How can it be done without looking at paylen ? >> >> All Sean is saying is that the receive RMPP ignores a non zero PayLen in a >first segment and uses the last bit (and obviously the PayLen in the last >segment) to determine the received length (of the reassembled MAD). >> >OK, thanks for the clarification. We could use a paylen = 0 at first >(but that is not last) segment Looking through the code, it appears that the proper size of the MAD is being reported in the kernel and exported up to userspace. If I guessed the structure of the opensm code correctly, the length is returned by umad_recv() in umad_receiver() in osm_vendor_ibumad.c The length is discarded after umad_receiver() returns. I guess that one possible solution is for opensm to save the length value into the payload_length field in the RMPP header before returning from umad_receiver(). - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RMPP Message Format Errors
Hal Rosenstock wrote: Hi Eitan, You wrote: "Note that the current implementation of the RMPP code ignores the payload length on the receive side, and instead relies on the last bit to determine the end of a transfer." But the receive side needs to calculate back the correct size of the assembled MAD. If it is done in kernel or user it does not matter. To my best knowledge the only way to calculate how many records are enclosed in an RMPP message is to use the paylen and offset. How can it be done without looking at paylen ? All Sean is saying is that the receive RMPP ignores a non zero PayLen in a first segment and uses the last bit (and obviously the PayLen in the last segment) to determine the received length (of the reassembled MAD). OK, thanks for the clarification. We could use a paylen = 0 at first (but that is not last) segment ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] RE: RMPP Message Format Errors
Hi Eitan, You wrote: "Note that the current implementation of the RMPP code ignores the payload length on the receive side, and instead relies on the last bit to determine the end of a transfer." But the receive side needs to calculate back the correct size of the assembled MAD. If it is done in kernel or user it does not matter. To my best knowledge the only way to calculate how many records are enclosed in an RMPP message is to use the paylen and offset. How can it be done without looking at paylen ? All Sean is saying is that the receive RMPP ignores a non zero PayLen in a first segment and uses the last bit (and obviously the PayLen in the last segment) to determine the received length (of the reassembled MAD). -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] RE: RMPP Message Format Errors
Hi Eitan, >We have started testing RMPP packets with osmtest and opensm (gen2 version). >We did not go very far. The first NodeRecord GetTable of all the nodes in a >"loopback" case, has some issues. Is this loopback between the 2 HCA ports ?(Just so I can recreate this when I get back). > The explanation is below: > 1. NodeRecord MAD size is 112bytes (note the required padding of 4 bytes > at the end of the NodeRec data). > 2. OpenSM log file shows the query should return 2 records one for each > end-port. This really happens: Aug 21 14:59:49 998104 [40D9DBB0] -> __osm_nr_rcv_create_nr: Looking for NodeRecord with LID: 0x0 GUID:0x Aug 21 14:59:49 998224 [40D9DBB0] -> __osm_nr_rcv_new_nr: New NodeRecord: node 0x0002c90217a0 port 0x0002c90217a1, lid 0x1. Aug 21 14:59:49 998327 [40D9DBB0] -> __osm_nr_rcv_new_nr: New NodeRecord: node 0x0002c90217a0 port 0x0002c90217a2, lid 0x2. Aug 21 14:59:49 998395 [40D9DBB0] -> osm_nr_rcv_process: Returning 2 records. > 3. On the wire we see the following (see attached gif for more details): Could you send the raw hex as well ? a. Two data segments were sent and two ACKs were returned. This is OK. b. The first segment reports PayLen = 440bytes. According to the spec the first segment might provide paylen != 0 and when it is done it should be equal to the (class header * Num-Segments) + data length. In our case we have data length = 2*112, and SA extra header = 20byte * 2seg. This leads to peylen=264 and not 440!!! The spec defines that in p775-l37. So this is a violation of the spec. Agreed. It should either be 0 or the real length. c. The last segment (segment 2) provides the paylen field of 100. The expected value for the last segment length should have been: SA extra header + leftover data size from prev segments. Since the first segment has 200bytes for data the left over should have been 112*2 - 200 = 24. With the SA extra header 44bytes. So this is another violation of the spec. Yes, but perhaps related to the first issue. d. The analyzer is confused by the above and reports the result as having 3 NodeRecords. e. <> 4. Following that when we trace the log file of osmtest we find more issues. Probably caused by changes to the vendor layer or the rmpp assembly: It is expected that after assembly the size of the RMPP mad reported to the osm vendor layer will be the rmpp header + SA extra header + data-size. In our case that is 32 + 20 + 2*112 = 276. The log file shows: Aug 21 14:59:49 [40D87BB0] -> __osmv_sa_mad_rcv_cb: Count = 1 = 200 / 112 (88) Aug 21 14:59:49 [4017F6C0] -> osmtest_write_all_node_recs: Received 1 records So this is another problem - probably with the way RMPP results are assembled or pass back to the vendor. This may be a result of the violations on the sending side. > Please let me know if you will have time to dig into these problems or if I > should try and resolve them myself and provide patches. I will look at these shortly after I get back. -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RMPP Message Format Errors
Sean Hefty wrote: The RMPP code returns the size of the receive as sizeof MAD header + sizeof RMPP header + optional sizeof other header (e.g. SA header) + actual payload. This size can be used to allocate a data buffer large enough to hold the reassembled MAD. You should be able to use this to determine the number of records in the payload. Good. But how is that size delivered? I mean through umad to the client. From my first email on this thread you can see there is at least one bug in the chain of events: a. First segment paylen should be either 0 or correct value - it is neither. Should be 264 but is 440 b. Last segment paylen MUST be updated to reflect the size of the data in the MAD (including class header) - should be 24 but is 100. c. In the receiver the re-assembled data size is not correct. OpenSM reports it got a 200 bytes MAD back. Probably a bug in the vendor layer or umad. Here is the full data again. 1. NodeRecord MAD size is 112bytes (note the required padding of 4 bytes at the end of the NodeRec data). 2. OpenSM log file shows the query should return 2 records one for each end-port. This really happens: Aug 21 14:59:49 998104 [40D9DBB0] -> __osm_nr_rcv_create_nr: Looking for NodeRecord with LID: 0x0 GUID:0x Aug 21 14:59:49 998224 [40D9DBB0] -> __osm_nr_rcv_new_nr: New NodeRecord: node 0x0002c90217a0 port 0x0002c90217a1, lid 0x1. Aug 21 14:59:49 998327 [40D9DBB0] -> __osm_nr_rcv_new_nr: New NodeRecord: node 0x0002c90217a0 port 0x0002c90217a2, lid 0x2. Aug 21 14:59:49 998395 [40D9DBB0] -> osm_nr_rcv_process: Returning 2 records. 3. On the wire we see the following (see attached gif for more details): a. Two data segments were sent and two ACKs were returned. This is OK. b. The first segment reports PayLen = 440bytes. According to the spec the first segment might provide paylen != 0 and when it is done it should be equal to the (class header * Num-Segments) + data length. In our case we have data length = 2*112, and SA extra header = 20byte * 2seg. This leads to peylen=264 and not 440!!! The spec defines that in p775-l37. So this is a violation of the spec. c. The last segment (segment 2) provides the paylen field of 100. The expected value for the last segment length should have been: SA extra header + leftover data size from prev segments. Since the first segment has 200bytes for data the left over should have been 112*2 - 200 = 24. With the SA extra header 44bytes. So this is another violation of the spec. d. The analyzer is confused by the above and reports the result as having 3 NodeRecords. e. <> 4. Following that when we trace the log file of osmtest we find more issues. Probably caused by changes to the vendor layer or the rmpp assembly: It is expected that after assembly the size of the RMPP mad reported to the osm vendor layer will be the rmpp header + SA extra header + data-size. In our case that is 32 + 20 + 2*112 = 276. The log file shows: Aug 21 14:59:49 [40D87BB0] -> __osmv_sa_mad_rcv_cb: Count = 1 = 200 / 112 (88) Aug 21 14:59:49 [4017F6C0] -> osmtest_write_all_node_recs: Received 1 records ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] RE: RMPP Message Format Errors
>The number of records is an SA thing and not RMPP thing. This is transparent to >RMPP itself. > >The need to determine the number of records is a consumer issue (SA or SA >client). To do this, AttributeOffset and (at least the last) PayloadLength >field is needed (as one can't rely on the first PayloadLength being non zero). The RMPP code returns the size of the receive as sizeof MAD header + sizeof RMPP header + optional sizeof other header (e.g. SA header) + actual payload. This size can be used to allocate a data buffer large enough to hold the reassembled MAD. You should be able to use this to determine the number of records in the payload. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RMPP Message Format Errors
Hal Rosenstock wrote: Hi again Eitan, The transparency to the RMPP is an RMPP implementation choice. Having incorrect paylen in the first segment is a compliancy violation. It should be either 0 or correct value. Yes, is that what is going on ? I haven't had a chance to look at the GIF you sent and analyze it. Yes that is exactly what I have provided in the first mail: But how would the SA or SA Client that gets an assembled MAD be able to tell the number of records? It gets a "real" received length provided it supplies a buffer large enough. So I guess the "real receive length" is truncated to the last data record even if the packet sent was 256 bytes? Also, does the current implementation let the client do the assembly? No. Good. I hoped this is the case. Anyway, the last segment paylen was incorrect too. OK. That's another thing I'll look at. The first mail I sent had all the analysis in it with exact peylen values for first and second segments. -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] RE: RMPP Message Format Errors
Hi again Eitan, > The transparency to the RMPP is an RMPP implementation choice. > Having incorrect paylen in the first segment is a compliancy violation. > It should be either 0 or correct value. Yes, is that what is going on ? I haven't had a chance to look at the GIF you sent and analyze it. > But how would the SA or SA Client that gets an assembled MAD be > able to tell the number of records? It gets a "real" received length provided it supplies a buffer large enough. > Also, does the current implementation let the client do the assembly? No. > If so how would it handle abort transactions? See previous answer. > If the re-assembly is done by the MAD service then the client only gets > offset in the MAD header and probably mad size which is MAD Header + > RMPP header + SA extra header + data. > Anyway, the last segment paylen was incorrect too. OK. That's another thing I'll look at. -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RMPP Message Format Errors
Hal Rosenstock wrote: The number of records is an SA thing and not RMPP thing. This is transparent to RMPP itself. The transparency to the RMPP is an RMPP implementation choice. Having incorrect paylen in the first segment is a compliancy violation. It should be either 0 or correct value. The need to determine the number of records is a consumer issue (SA or SA client). To do this, AttributeOffset and (at least the last) PayloadLength field is needed (as one can't rely on the first PayloadLength being non zero). True. But how would the SA or SA Client that gets an assembled MAD be able to tell the number of records? Also, does the current implementation let the client do the assembly? If so how would it handle abort transactions? If the re-assembly is done by the MAD service then the client only gets offset in the MAD header and probably mad size which is MAD Header + RMPP header + SA extra header + data. Anyway, the last segment paylen was incorrect too. -- Hal From: Eitan Zahavi [mailto:[EMAIL PROTECTED] Sent: Mon 8/22/2005 1:54 AM To: 'Sean Hefty'; Eitan Zahavi; Hal Rosenstock Cc: OPENIB GENERAL; Liran Sorani; Amit Krig; Aviram Gutman Subject: RE: RMPP Message Format Errors Hi Sean, You wrote: "Note that the current implementation of the RMPP code ignores the payload length on the receive side, and instead relies on the last bit to determine the end of a transfer." But the receive side needs to calculate back the correct size of the assembled MAD. If it is done in kernel or user it does not matter. To my best knowledge the only way to calculate how many records are enclosed in an RMPP message is to use the paylen and offset. How can it be done without looking at paylen ? EZ Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL -Original Message- From: Sean Hefty [mailto:[EMAIL PROTECTED] Sent: Monday, August 22, 2005 1:01 AM To: 'Eitan Zahavi'; Hal Rosenstock Cc: OPENIB GENERAL; Liran Sorani; Amit Krig; Aviram Gutman Subject: RE: RMPP Message Format Errors Please let me know if you will have time to dig into these problems or if I should try and resolve them myself and provide patches. I will not be able to look at this until early next week (with IDF running this week), but I will try to do so. So, it wouldn't surprise me if the receive side accepted an invalid RMPP MAD. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] RE: RMPP Message Format Errors
Hi Eitan, All Sean is saying is that the RMPP code itself only uses the last bit. The number of records is an SA thing and not RMPP thing. This is transparent to RMPP itself. The need to determine the number of records is a consumer issue (SA or SA client). To do this, AttributeOffset and (at least the last) PayloadLength field is needed (as one can't rely on the first PayloadLength being non zero). -- Hal From: Eitan Zahavi [mailto:[EMAIL PROTECTED] Sent: Mon 8/22/2005 1:54 AM To: 'Sean Hefty'; Eitan Zahavi; Hal Rosenstock Cc: OPENIB GENERAL; Liran Sorani; Amit Krig; Aviram Gutman Subject: RE: RMPP Message Format Errors Hi Sean, You wrote: "Note that the current implementation of the RMPP code ignores the payload length on the receive side, and instead relies on the last bit to determine the end of a transfer." But the receive side needs to calculate back the correct size of the assembled MAD. If it is done in kernel or user it does not matter. To my best knowledge the only way to calculate how many records are enclosed in an RMPP message is to use the paylen and offset. How can it be done without looking at paylen ? EZ Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL -Original Message- From: Sean Hefty [mailto:[EMAIL PROTECTED] Sent: Monday, August 22, 2005 1:01 AM To: 'Eitan Zahavi'; Hal Rosenstock Cc: OPENIB GENERAL; Liran Sorani; Amit Krig; Aviram Gutman Subject: RE: RMPP Message Format Errors Please let me know if you will have time to dig into these problems or if I should try and resolve them myself and provide patches. I will not be able to look at this until early next week (with IDF running this week), but I will try to do so. So, it wouldn't surprise me if the receive side accepted an invalid RMPP MAD. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] RE: RMPP Message Format Errors
Title: RMPP Message Format Errors Hi Sean, You wrote: "Note that the current implementation of the RMPP code ignores the payload length on the receive side, and instead relies on the last bit to determine the end of a transfer." But the receive side needs to calculate back the correct size of the assembled MAD. If it is done in kernel or user it does not matter. To my best knowledge the only way to calculate how many records are enclosed in an RMPP message is to use the paylen and offset. How can it be done without looking at paylen ? EZ Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL -Original Message- From: Sean Hefty [mailto:[EMAIL PROTECTED] Sent: Monday, August 22, 2005 1:01 AM To: 'Eitan Zahavi'; Hal Rosenstock Cc: OPENIB GENERAL; Liran Sorani; Amit Krig; Aviram Gutman Subject: RE: RMPP Message Format Errors Please let me know if you will have time to dig into these problems or if I should try and resolve them myself and provide patches. I will not be able to look at this until early next week (with IDF running this week), but I will try to do so. So, it wouldn't surprise me if the receive side accepted an invalid RMPP MAD. - Sean ___ openib-general mailing list [EMAIL PROTECTED] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] RE: RMPP Message Format Errors
Title: RMPP Message Format Errors Please let me know if you will have time to dig into these problems or if I should try and resolve them myself and provide patches. I will not be able to look at this until early next week (with IDF running this week), but I will try to do so. Note that the current implementation of the RMPP code ignores the payload length on the receive side, and instead relies on the last bit to determine the end of a transfer. So, it wouldn’t surprise me if the receive side accepted an invalid RMPP MAD. - Sean ___ openib-general mailing list [EMAIL PROTECTED] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] Re: RMPP
> From: Hal Rosenstock [mailto:[EMAIL PROTECTED] > Sent: Tuesday, June 28, 2005 11:47 AM > I think I see what is going on here... > > In user_mad.c::send_handler > > > if (send_wc->status == IB_WC_RESP_TIMEOUT_ERR) { > packet->mad.hdr.status = ETIMEDOUT; > > if (!queue_packet(file, agent, packet)) > return; > } > > That is what is causing the problem. I think the send side queues the > packet on a timeout and simulates a receive so that a transaction can be > terminated. RMPP sends appear to be a little different in that even non > transactions get timeouts. Why is a receive generated at all? Shouldn't the send completion be enough? It seems odd to just generate a receive indication with the sent data when really, nothing was received. From an app perspective, it seems that handling the IB_WC_RESP_TIMEOUT_ERR case should be sufficient to indicate the transaction is complete. - Fab ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: RMPP
On Tue, 2005-06-28 at 14:07, Hal Rosenstock wrote: > On Tue, 2005-06-28 at 13:48, Hal Rosenstock wrote: > > On Tue, 2005-06-28 at 13:44, Sean Hefty wrote: > > > Hal Rosenstock wrote: > > > > Hi Sean, > > > > > > > > I'm in the process of enabling the receive side RMPP from user space and > > > > this is what I'm seeing in terms of RMPP right now. I have a question > > > > about the OpenSM side. > > > > > > > > SA client OpenSM > > > > SA GetTable (PortInfoRecord) --> > > > > <-- SA GetTableResp (PortInfoRecord) > > > > RMPP active, first > > > > payload length 0x44C > > > > > > > > retries is set to 4 so I see 4 responses (at 2 sec intervals) as the > > > > client is not currently ACKing. All is fine up to that point. > > > > > > > > At that point, OpenSM sees a large receive which appears to be that send > > > > timing out (nothing was sent nor observed on the IB wire). > > > > > > > > Could a timed out RMPP send end up as a receive somehow ? > > > > > > On the side that sent the MAD? > > > > The side that sent the RMPP MAD response (e.g. OpenSM). > > > > > That should be no. > > > > That's what I thought. I'm not sure where the problem is but will start > > to try to narrow it down. > > I do get EINVAL from user_mad.c::ib_umad_read as follows: > > if (count < packet->length + sizeof (struct ib_user_mad)) > ret = -EINVAL; > > as the packet->length is larger than a single MAD (and looks like the > user MAD that was sent by OpenSM). I think I see what is going on here... In user_mad.c::send_handler if (send_wc->status == IB_WC_RESP_TIMEOUT_ERR) { packet->mad.hdr.status = ETIMEDOUT; if (!queue_packet(file, agent, packet)) return; } That is what is causing the problem. I think the send side queues the packet on a timeout and simulates a receive so that a transaction can be terminated. RMPP sends appear to be a little different in that even non transactions get timeouts. -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: RMPP
On Tue, 2005-06-28 at 13:48, Hal Rosenstock wrote: > On Tue, 2005-06-28 at 13:44, Sean Hefty wrote: > > Hal Rosenstock wrote: > > > Hi Sean, > > > > > > I'm in the process of enabling the receive side RMPP from user space and > > > this is what I'm seeing in terms of RMPP right now. I have a question > > > about the OpenSM side. > > > > > > SA client OpenSM > > > SA GetTable (PortInfoRecord) --> > > > <-- SA GetTableResp (PortInfoRecord) > > > RMPP active, first > > > payload length 0x44C > > > > > > retries is set to 4 so I see 4 responses (at 2 sec intervals) as the > > > client is not currently ACKing. All is fine up to that point. > > > > > > At that point, OpenSM sees a large receive which appears to be that send > > > timing out (nothing was sent nor observed on the IB wire). > > > > > > Could a timed out RMPP send end up as a receive somehow ? > > > > On the side that sent the MAD? > > The side that sent the RMPP MAD response (e.g. OpenSM). > > > That should be no. > > That's what I thought. I'm not sure where the problem is but will start > to try to narrow it down. I do get EINVAL from user_mad.c::ib_umad_read as follows: if (count < packet->length + sizeof (struct ib_user_mad)) ret = -EINVAL; as the packet->length is larger than a single MAD (and looks like the user MAD that was sent by OpenSM). -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RMPP
On Tue, 2005-06-28 at 13:44, Sean Hefty wrote: > Hal Rosenstock wrote: > > Hi Sean, > > > > I'm in the process of enabling the receive side RMPP from user space and > > this is what I'm seeing in terms of RMPP right now. I have a question > > about the OpenSM side. > > > > SA client OpenSM > > SA GetTable (PortInfoRecord) --> > > <-- SA GetTableResp (PortInfoRecord) > > RMPP active, first > > payload length 0x44C > > > > retries is set to 4 so I see 4 responses (at 2 sec intervals) as the > > client is not currently ACKing. All is fine up to that point. > > > > At that point, OpenSM sees a large receive which appears to be that send > > timing out (nothing was sent nor observed on the IB wire). > > > > Could a timed out RMPP send end up as a receive somehow ? > > On the side that sent the MAD? The side that sent the RMPP MAD response (e.g. OpenSM). > That should be no. That's what I thought. I'm not sure where the problem is but will start to try to narrow it down. > On the remote side, a send can timeout, but still be received, since the ACK > is unreliable. But I don't think that this is the situation that you're > describing. No. The remote side appears to be behaving as I would expect (at least as far as I have gone). -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RMPP
Hal Rosenstock wrote: Hi Sean, I'm in the process of enabling the receive side RMPP from user space and this is what I'm seeing in terms of RMPP right now. I have a question about the OpenSM side. SA client OpenSM SA GetTable (PortInfoRecord) --> <-- SA GetTableResp (PortInfoRecord) RMPP active, first payload length 0x44C retries is set to 4 so I see 4 responses (at 2 sec intervals) as the client is not currently ACKing. All is fine up to that point. At that point, OpenSM sees a large receive which appears to be that send timing out (nothing was sent nor observed on the IB wire). Could a timed out RMPP send end up as a receive somehow ? On the side that sent the MAD? That should be no. On the remote side, a send can timeout, but still be received, since the ACK is unreliable. But I don't think that this is the situation that you're describing. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: RMPP
Hal Rosenstock wrote: Yes, that looks better in terms of clearing the padding. I still need to double check my math on the PayloadLengths. It turned out that grmpp was clearing the MAD, which was why I wasn't detecting the issue. Also, I've added some documentation on using RMPP to ib_mad.h and added an rmpp_active flag to ib_create_send_mad(). My current solution is to have ib_create_send_mad() format the RMPP header for the user. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] Re: RMPP
On Wed, 2005-05-04 at 17:51, Sean Hefty wrote: > Can you try with this patch? > > Signed-off-by: Sean Hefty <[EMAIL PROTECTED]> > > Index: core/mad.c > === > -- core/mad.c (revision 2256) > +++ core/mad.c(working copy) > @@ -796,9 +796,9 @@ > buf = kmalloc(sizeof *send_buf + buf_size, gfp_mask); > if (!buf) > return ERR_PTR(-ENOMEM); > + memset(buf, 0, sizeof *send_buf + buf_size); > > send_buf = buf + buf_size; > - memset(send_buf, 0, sizeof *send_buf); > send_buf->mad = buf; > > send_buf->sge.addr = dma_map_single(mad_agent->device->dma_device, Yes, that looks better in terms of clearing the padding. I still need to double check my math on the PayloadLengths. Thanks. -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
RE: [openib-general] Re: RMPP
>> > I also see the padding on the last segment of >> > a multipacket send not cleared (I integrated the part of your patch >> > relating to the pad calculation). >> >> I ran some tests, and didn't see any cases where the padding wasn't zero. >> The RMPP code doesn't touch the padding itself, and create_send should >> allocate it zeroed. Are you using an analyzer and seeing that it's not >> zeroed? > >Yes. I stating this from what I see on the IB "wire". Can you try with this patch? Signed-off-by: Sean Hefty <[EMAIL PROTECTED]> Index: core/mad.c === --- core/mad.c (revision 2256) +++ core/mad.c (working copy) @@ -796,9 +796,9 @@ buf = kmalloc(sizeof *send_buf + buf_size, gfp_mask); if (!buf) return ERR_PTR(-ENOMEM); + memset(buf, 0, sizeof *send_buf + buf_size); send_buf = buf + buf_size; - memset(send_buf, 0, sizeof *send_buf); send_buf->mad = buf; send_buf->sge.addr = dma_map_single(mad_agent->device->dma_device, ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RMPP
Hal Rosenstock wrote: On Wed, 2005-05-04 at 16:45, Sean Hefty wrote: Hal Rosenstock wrote: I also see the padding on the last segment of a multipacket send not cleared (I integrated the part of your patch relating to the pad calculation). I ran some tests, and didn't see any cases where the padding wasn't zero. The RMPP code doesn't touch the padding itself, and create_send should allocate it zeroed. Are you using an analyzer and seeing that it's not zeroed? Yes. I stating this from what I see on the IB "wire". I think my tests were just lucky... It looks like create_send_mad zeros only the top portion of the data buffer. - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RMPP
On Wed, 2005-05-04 at 16:45, Sean Hefty wrote: > Hal Rosenstock wrote: > > I also see the padding on the last segment of > > a multipacket send not cleared (I integrated the part of your patch > > relating to the pad calculation). > > I ran some tests, and didn't see any cases where the padding wasn't zero. > The RMPP code doesn't touch the padding itself, and create_send should > allocate it zeroed. Are you using an analyzer and seeing that it's not > zeroed? Yes. I stating this from what I see on the IB "wire". -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RMPP
Hal Rosenstock wrote: I also see the padding on the last segment of a multipacket send not cleared (I integrated the part of your patch relating to the pad calculation). I ran some tests, and didn't see any cases where the padding wasn't zero. The RMPP code doesn't touch the padding itself, and create_send should allocate it zeroed. Are you using an analyzer and seeing that it's not zeroed? - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RMPP
Hal Rosenstock wrote: In addition to passing the hdr_len and data_len to ib_create_send_mad: rmpp_mad->rmpp_hdr.paylen_newwin = cpu_to_be32(hdr_len - offsetof(struct ib_rmpp_mad, data) + data_len); That's not the "real" payload length that would go into the packet; just one segment's worth of class header. Correct ? If the above is correct, I'm not sure the actual payload lengths in the DATA packets are correct. I also see the padding on the last segment of a multipacket send not cleared (I integrated the part of your patch relating to the pad calculation). Running grmpp and using madeye to check the packets, I ran 3 tests: user-data size payload 1st in segment payload in last segment 12001320 (220x6)124 50 54 54 580 660 (220x3) 152 For multi-packet segments, I think that the payload in the 1st segment will always be a multiple of 220 bytes (256 - sizeof common MAD header - sizeof RMPP header). For the tests that I ran, these appear to be correct. Since grmpp uses vendor MADs, it can transfer 216 bytes of user-data per segment. Using the 580 example, the data is segmented into 216 + 216 + 148 bytes. The payload in the last segment includes the extra 4 byte vendor specific header. Likewise, 1200 = 216 x 5 + 120, with an extra 4 bytes added for the header. I haven't looked at why the data isn't cleared yet, but will do so next. The memory allocations in create_send_mad look right to me... - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RMPP
Hal Rosenstock wrote: In one test case, I see the first RMPP DATA segment sent and there is no response (ACK) from the receiver (this was due to a (receiver) test program issue). The transmitter retry depends on retries in the send_wr ud structure. Does it need to known/Is there a way to know that this failed (no ACK, etc.) when retries are exhausted or is this reliant on the receiver rerequesting or is the entire RMPP transaction treated like a UD send (e.g. unreliable) ? The RMPP send is treated as reliable, even if no response is expected. If the send is not ACKed completely, the request will timeout, and a send failure is reported to the user. If the send completes successfully, then the user knows that it was received by the remote side; although, there's no guarantee that it was processed by anyone. Is this on any (RMPP) transmits or just requests ? I will check to see if I can see this. This is true for any RMPP send operation. You can test for this by running a grmpp client, without having the grmpp server loaded on the destination node. You should see a timeout error sending the MAD in the messages log file. You can also see timeouts on both requests and replies if you run grmpp and increase the number of messages to a few thousand. (e.g. insmod ib_grmpp.ko "slid=1" "dlid=2" "message_count=15000" "message_size=100" "responses=1") - Sean ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RMPP
On Wed, 2005-05-04 at 13:43, Sean Hefty wrote: > Hal Rosenstock wrote: > > Hi Sean, > > > > In addition to passing the hdr_len and data_len to ib_create_send_mad: > > rmpp_mad->rmpp_hdr.paylen_newwin = cpu_to_be32(hdr_len - > > offsetof(struct ib_rmpp_mad, data) + data_len); > > That's not the "real" payload length that would go into the packet; just > > one segment's worth of class header. Correct ? > > There are two sizes needed by the RMPP code. First, it needs the total > length of the data buffer, which includes any necessary padding. This is > set in the scatter-gather entry for the work request. Second, it needs to > know how much of the data buffer contains valid user data. Got it. > ib_create_send_mad returns the length of the of the valid user data in a > slightly encoded format. It subtracts the size of the RMPP and common MAD > headers from the size (see below). The intent is that the payload length > value is set to the correct value for MADs that are a single segment in > length. Right. That might be a better way of expressing it than how I did :-) > - > MAD header > - > RMPP header > - > SA/Vendor hdr > - > User data > - > Pad > - > > sge length = size from MAD header to end of pad. > payload = size from SA/Vendor hdr to end of user data. > > > If the above is correct, I'm not sure the actual payload lengths in the > > DATA packets are correct. I also see the padding on the last segment of > > a multipacket send not cleared (I integrated the part of your patch > > relating to the pad calculation). > > I will run some tests and verify the payload values using madeye. I'm not > sure why the padding isn't cleared. It may be an indication that > create_send_mad isn't allocating the correct pad size. Thanks. > > In one test case, I see the first RMPP DATA segment sent and there is no > > response (ACK) from the receiver (this was due to a (receiver) test > > program issue). The transmitter retry depends on retries in the send_wr > > ud structure. Does it need to known/Is there a way to know that this > > failed (no ACK, etc.) when retries are exhausted or is this reliant on > > the receiver rerequesting or is the entire RMPP transaction treated like > > a UD send (e.g. unreliable) ? > > The RMPP send is treated as reliable, even if no response is expected. If > the send is not ACKed completely, the request will timeout, and a send > failure is reported to the user. If the send completes successfully, then > the user knows that it was received by the remote side; although, there's no > guarantee that it was processed by anyone. Is this on any (RMPP) transmits or just requests ? I will check to see if I can see this. > The code will retry a given segment the number of times specified by the > user. If forward progress is made on the send, the retry count is reset. for each subsequent segment. Makes sense. > > Here's a summary of changes so far: > > 1. ib_create_send_mad either needs an additional parameter (RMPP active > > in current send packet) or the paylen_newwin needs to be set by the user > > outside of this routine. > > I have this on my short term to-do list. I'm just not sure on the best > approach yet... OK. I'll wait. I have my workaround right now. > > 2. Some minor ib_mad.h commentary changes for more clarity on the > > assumptions of the RMPP API. Thanks. -- Hal ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: RMPP
Hal Rosenstock wrote: Hi Sean, In addition to passing the hdr_len and data_len to ib_create_send_mad: rmpp_mad->rmpp_hdr.paylen_newwin = cpu_to_be32(hdr_len - offsetof(struct ib_rmpp_mad, data) + data_len); That's not the "real" payload length that would go into the packet; just one segment's worth of class header. Correct ? There are two sizes needed by the RMPP code. First, it needs the total length of the data buffer, which includes any necessary padding. This is set in the scatter-gather entry for the work request. Second, it needs to know how much of the data buffer contains valid user data. ib_create_send_mad returns the length of the of the valid user data in a slightly encoded format. It subtracts the size of the RMPP and common MAD headers from the size (see below). The intent is that the payload length value is set to the correct value for MADs that are a single segment in length. - MAD header - RMPP header - SA/Vendor hdr - User data - Pad - sge length = size from MAD header to end of pad. payload = size from SA/Vendor hdr to end of user data. If the above is correct, I'm not sure the actual payload lengths in the DATA packets are correct. I also see the padding on the last segment of a multipacket send not cleared (I integrated the part of your patch relating to the pad calculation). I will run some tests and verify the payload values using madeye. I'm not sure why the padding isn't cleared. It may be an indication that create_send_mad isn't allocating the correct pad size. In one test case, I see the first RMPP DATA segment sent and there is no response (ACK) from the receiver (this was due to a (receiver) test program issue). The transmitter retry depends on retries in the send_wr ud structure. Does it need to known/Is there a way to know that this failed (no ACK, etc.) when retries are exhausted or is this reliant on the receiver rerequesting or is the entire RMPP transaction treated like a UD send (e.g. unreliable) ? The RMPP send is treated as reliable, even if no response is expected. If the send is not ACKed completely, the request will timeout, and a send failure is reported to the user. If the send completes successfully, then the user knows that it was received by the remote side; although, there's no guarantee that it was processed by anyone. The code will retry a given segment the number of times specified by the user. If forward progress is made on the send, the retry count is reset. Here's a summary of changes so far: 1. ib_create_send_mad either needs an additional parameter (RMPP active in current send packet) or the paylen_newwin needs to be set by the user outside of this routine. I have this on my short term to-do list. I'm just not sure on the best approach yet... 2. Some minor ib_mad.h commentary changes for more clarity on the assumptions of the RMPP API. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general