Re: [openib-general] Re: RMPP Message Format Errors

2005-09-20 Thread Hal Rosenstock
On Tue, 2005-09-20 at 07:53, Hal Rosenstock wrote:
> On Tue, 2005-09-20 at 06:46, Eitan Zahavi wrote:
> > Hal Rosenstock wrote:
> > > On Tue, 2005-09-20 at 06:33, Eitan Zahavi wrote:
> > > 
> > >>>Is this what you are referring to ?
> > >>
> > >>Yes the line of interest is:
> > >>__osmv_sa_mad_rcv_cb: Count = 7 = 800 / 112 (16)
> > >>This shows 16byte extra in the data size.
> > > 
> > > 
> > > Should it be 20 for the SA class header size or 0 here ?
> > Should be 0. Means an the packet size should accommodate an integer number 
> > of
> > SA records (after removing the headers size).
> 
> OK. There's a problem or problems on the receive side (of RMPP) to look
> into but these appear OK for SA client right now.

Patch coming shortly for this.

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RMPP Message Format Errors

2005-09-20 Thread Hal Rosenstock
On Tue, 2005-09-20 at 06:46, Eitan Zahavi wrote:
> Hal Rosenstock wrote:
> > On Tue, 2005-09-20 at 06:33, Eitan Zahavi wrote:
> > 
> >>>Is this what you are referring to ?
> >>
> >>Yes the line of interest is:
> >>__osmv_sa_mad_rcv_cb: Count = 7 = 800 / 112 (16)
> >>This shows 16byte extra in the data size.
> > 
> > 
> > Should it be 20 for the SA class header size or 0 here ?
> Should be 0. Means an the packet size should accommodate an integer number of
> SA records (after removing the headers size).

OK. There's a problem or problems on the receive side (of RMPP) to look
into but these appear OK for SA client right now.

> >>>I do also see:
> >>>Sep 20 05:16:40 995667 [AB001140] -> osmt_get_service_by_name: ERR
> > 
> > 0370:
> > 
> >>>ib_query failed (IB_REMOTE_ERROR).
> >>>Sep 20 05:16:40 995673 [AB001140] -> osmt_get_service_by_name:
> > 
> > Remote
> > 
> >>>error = IB_SA_MAD_STATUS_NO_RECORDS.
> >>>Sep 20 05:16:40 995678 [AB001140] -> osmt_get_service_by_name:
> > 
> > Expected
> > 
> >>>num of records is : 1, Found number of records : 0
> >>
> >>The full osmtest flow has some intentional errors injected.
> >>If it provides the "PASSED" message at the end it means that
> >>the errors were intentional and expected.
> >>
> >>Some of the flows have a special output message that wraps the
> >>errors in a section like:
> >>
> >>"vv"
> >>...
> >>"^^"
> >>
> >>We probably need to apply this convention to all the "bad flows".
> > 
> > 
> > Then is expected number of records in this test 1 rather than 0 ?

Will these be fixed ? Are these issues being documented along with other
ones previously noted ? 

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RMPP Message Format Errors

2005-09-20 Thread Eitan Zahavi

Hal Rosenstock wrote:

On Tue, 2005-09-20 at 06:33, Eitan Zahavi wrote:


Is this what you are referring to ?


Yes the line of interest is:
__osmv_sa_mad_rcv_cb: Count = 7 = 800 / 112 (16)
This shows 16byte extra in the data size.



Should it be 20 for the SA class header size or 0 here ?

Should be 0. Means an the packet size should accommodate an integer number of
SA records (after removing the headers size).




I do also see:
Sep 20 05:16:40 995667 [AB001140] -> osmt_get_service_by_name: ERR


0370:


ib_query failed (IB_REMOTE_ERROR).
Sep 20 05:16:40 995673 [AB001140] -> osmt_get_service_by_name:


Remote


error = IB_SA_MAD_STATUS_NO_RECORDS.
Sep 20 05:16:40 995678 [AB001140] -> osmt_get_service_by_name:


Expected


num of records is : 1, Found number of records : 0


The full osmtest flow has some intentional errors injected.
If it provides the "PASSED" message at the end it means that
the errors were intentional and expected.

Some of the flows have a special output message that wraps the
errors in a section like:

"vv"
...
"^^"

We probably need to apply this convention to all the "bad flows".



Then is expected number of records in this test 1 rather than 0 ?

-- Hal



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RMPP Message Format Errors

2005-09-20 Thread Hal Rosenstock
On Tue, 2005-09-20 at 06:33, Eitan Zahavi wrote:
> > Is this what you are referring to ?
> Yes the line of interest is:
> __osmv_sa_mad_rcv_cb: Count = 7 = 800 / 112 (16)
> This shows 16byte extra in the data size.

Should it be 20 for the SA class header size or 0 here ?

> > I do also see:
> > Sep 20 05:16:40 995667 [AB001140] -> osmt_get_service_by_name: ERR 0370:
> > ib_query failed (IB_REMOTE_ERROR).
> > Sep 20 05:16:40 995673 [AB001140] -> osmt_get_service_by_name: Remote
> > error = IB_SA_MAD_STATUS_NO_RECORDS.
> > Sep 20 05:16:40 995678 [AB001140] -> osmt_get_service_by_name: Expected
> > num of records is : 1, Found number of records : 0
> The full osmtest flow has some intentional errors injected.
> If it provides the "PASSED" message at the end it means that
> the errors were intentional and expected.
> 
> Some of the flows have a special output message that wraps the
> errors in a section like:
> 
> "vv"
> ...
> "^^"
> 
> We probably need to apply this convention to all the "bad flows".

Then is expected number of records in this test 1 rather than 0 ?

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RMPP Message Format Errors

2005-09-20 Thread Eitan Zahavi

Hal Rosenstock wrote:

On Tue, 2005-09-20 at 04:48, Eitan Zahavi wrote:


Hi Hal,

Seems like RMPP works !




Is this what you are referring to ?

Yes the line of interest is:
__osmv_sa_mad_rcv_cb: Count = 7 = 800 / 112 (16)
This shows 16byte extra in the data size.



I do also see:
Sep 20 05:16:40 995667 [AB001140] -> osmt_get_service_by_name: ERR 0370:
ib_query failed (IB_REMOTE_ERROR).
Sep 20 05:16:40 995673 [AB001140] -> osmt_get_service_by_name: Remote
error = IB_SA_MAD_STATUS_NO_RECORDS.
Sep 20 05:16:40 995678 [AB001140] -> osmt_get_service_by_name: Expected
num of records is : 1, Found number of records : 0

The full osmtest flow has some intentional errors injected.
If it provides the "PASSED" message at the end it means that
the errors were intentional and expected.

Some of the flows have a special output message that wraps the
errors in a section like:

"vv"
...
"^^"

We probably need to apply this convention to all the "bad flows".

EZ
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RMPP Message Format Errors

2005-09-20 Thread Hal Rosenstock
Hi Eitan,

On Tue, 2005-09-20 at 04:48, Eitan Zahavi wrote:
> Hi Hal,
> 
> Seems like RMPP works !

Yippee :-)

> This is an important milestone for OpenSM as we are now able to test the 
> SM/SA with osmtest.

and also for Solaris.

> There is still some constant 8 bytes remainder in the RMPP number of received 
> records calculation
> (see osmtest -V log file) but this is minor (as no SA record is that small).

It sounds like there is still a calculation slightly off.

I don't see a constant off by 8 remainder issue. In my configuration
most seem fine and the only one which is not off by 20 (SA class header
size) is the following:

Sep 20 05:17:36 292850 [40FFF960] -> osm_vendor_get: Acquired UMAD 0x53cd40, 
size = 856.
Sep 20 05:17:36 292861 [40FFF960] -> osm_vendor_get: ]
Sep 20 05:17:36 292870 [40FFF960] -> osm_mad_pool_get: Acquired p_madw = 
0x536190, p_mad = 0x53cd78, size = 856. 
Sep 20 05:17:36 292880 [40FFF960] -> osm_mad_pool_get: ]
Sep 20 05:17:36 292889 [40FFF960] -> __osmv_sa_mad_rcv_cb: [
Sep 20 05:17:36 292899 [40FFF960] -> __osmv_sa_mad_rcv_cb: Count = 7 = 800 / 
112 (16)
Sep 20 05:17:36 292909 [40FFF960] -> osmtest_query_res_cb: [
Sep 20 05:17:36 292918 [40FFF960] -> osmtest_query_res_cb: ]
Sep 20 05:17:36 292932 [40FFF960] -> __osmv_sa_mad_rcv_cb: ]
Sep 20 05:17:36 292938 [AB001140] -> __osmv_send_sa_req: ]
Sep 20 05:17:36 292971 [AB001140] -> osmv_query_sa: ]
Sep 20 05:17:36 292980 [AB001140] -> osmtest_get_all_recs: ]
Sep 20 05:17:36 292989 [AB001140] -> osmtest_validate_all_node_recs: Received 7 
records.

Is this what you are referring to ?

I do also see:
Sep 20 05:16:40 995667 [AB001140] -> osmt_get_service_by_name: ERR 0370: 
ib_query failed (IB_REMOTE_ERROR).
Sep 20 05:16:40 995673 [AB001140] -> osmt_get_service_by_name: Remote error = 
IB_SA_MAD_STATUS_NO_RECORDS.
Sep 20 05:16:40 995678 [AB001140] -> osmt_get_service_by_name: Expected num of 
records is : 1, Found number of records : 0

and some timeouts:
Sep 20 05:17:40 644730 [40FFF960] -> umad_receiver: ERR 5409: send completed 
with error (method=1 attr=12) -- dropping.
Sep 20 05:17:40 644740 [40FFF960] -> umad_receiver: ERR 5410: class 0x3 LID 0x0
Sep 20 05:17:40 644750 [40FFF960] -> __osmv_sa_mad_err_cb: [
Sep 20 05:17:40 644760 [40FFF960] -> osmtest_query_res_cb: [
Sep 20 05:17:40 644769 [40FFF960] -> osmtest_query_res_cb: ERR 0003: Error on 
query (IB_TIMEOUT).
Sep 20 05:17:40 644787 [40FFF960] -> osmtest_query_res_cb: ]
Sep 20 05:17:40 644801 [40FFF960] -> __osmv_sa_mad_err_cb: ]
which then resulted in:
Sep 20 05:17:40 644955 [AB001140] -> osmtest_wrong_sm_key_ignored: ERR 0011: 
Did not get a timeout but got (IB_SUCCESS).

> Thanks for your continuous support.
> 
> Eitan
> 
> Hal Rosenstock wrote:
> > Hi Eitan,
> > 
> > The send side RMPP changes for the truncation of the last SA
> > record have now stabilized. With the latest user_mad.c and
> > osm_vendor_ibumad.c changes which are in the OpenIB svn tree (svn
> > revision 3485), this is ready to be verified again. It safe to come out
> > now :-)
> > 
> > -- Hal
> > 
> 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RMPP Message Format Errors

2005-09-20 Thread Hal Rosenstock
On Tue, 2005-09-20 at 04:48, Eitan Zahavi wrote:
> Hi Hal,
> 
> Seems like RMPP works !

Yippee :-)

> This is an important milestone for OpenSM as we are now able to test the 
> SM/SA with osmtest.

and also for Solaris.

> There is still some constant 8 bytes remainder in the RMPP number of received 
> records calculation
> (see osmtest -V log file) but this is minor (as no SA record is that small).

It sounds like there is still a calculation slightly off.

I don't see a constant off by 8 remainder issue. In my configuration
most seem fine and the only one which is not off by 20 (SA class header
size) is the following:

Sep 20 05:17:36 292850 [40FFF960] -> osm_vendor_get: Acquired UMAD 0x53cd40, 
size = 856.
Sep 20 05:17:36 292861 [40FFF960] -> osm_vendor_get: ]
Sep 20 05:17:36 292870 [40FFF960] -> osm_mad_pool_get: Acquired p_madw = 
0x536190, p_mad = 0x53cd78, size = 856. 
Sep 20 05:17:36 292880 [40FFF960] -> osm_mad_pool_get: ]
Sep 20 05:17:36 292889 [40FFF960] -> __osmv_sa_mad_rcv_cb: [
Sep 20 05:17:36 292899 [40FFF960] -> __osmv_sa_mad_rcv_cb: Count = 7 = 800 / 
112 (16)
Sep 20 05:17:36 292909 [40FFF960] -> osmtest_query_res_cb: [
Sep 20 05:17:36 292918 [40FFF960] -> osmtest_query_res_cb: ]
Sep 20 05:17:36 292932 [40FFF960] -> __osmv_sa_mad_rcv_cb: ]
Sep 20 05:17:36 292938 [AB001140] -> __osmv_send_sa_req: ]
Sep 20 05:17:36 292971 [AB001140] -> osmv_query_sa: ]
Sep 20 05:17:36 292980 [AB001140] -> osmtest_get_all_recs: ]
Sep 20 05:17:36 292989 [AB001140] -> osmtest_validate_all_node_recs: Received 7 
records.

Is this what you are referring to ?

I do also see:
Sep 20 05:16:40 995667 [AB001140] -> osmt_get_service_by_name: ERR 0370: 
ib_query failed (IB_REMOTE_ERROR).
Sep 20 05:16:40 995673 [AB001140] -> osmt_get_service_by_name: Remote error = 
IB_SA_MAD_STATUS_NO_RECORDS.
Sep 20 05:16:40 995678 [AB001140] -> osmt_get_service_by_name: Expected num of 
records is : 1, Found number of records : 0

and some timeouts:
Sep 20 05:17:40 644730 [40FFF960] -> umad_receiver: ERR 5409: send completed 
with error (method=1 attr=12) -- dropping.
Sep 20 05:17:40 644740 [40FFF960] -> umad_receiver: ERR 5410: class 0x3 LID 0x0
Sep 20 05:17:40 644750 [40FFF960] -> __osmv_sa_mad_err_cb: [
Sep 20 05:17:40 644760 [40FFF960] -> osmtest_query_res_cb: [
Sep 20 05:17:40 644769 [40FFF960] -> osmtest_query_res_cb: ERR 0003: Error on 
query (IB_TIMEOUT).
Sep 20 05:17:40 644787 [40FFF960] -> osmtest_query_res_cb: ]
Sep 20 05:17:40 644801 [40FFF960] -> __osmv_sa_mad_err_cb: ]
which then resulted in:
Sep 20 05:17:40 644955 [AB001140] -> osmtest_wrong_sm_key_ignored: ERR 0011: 
Did not get a timeout but got (IB_SUCCESS).

> Thanks for your continuous support.
> 
> Eitan
> 
> Hal Rosenstock wrote:
> > Hi Eitan,
> > 
> > The send side RMPP changes for the truncation of the last SA
> > record have now stabilized. With the latest user_mad.c and
> > osm_vendor_ibumad.c changes which are in the OpenIB svn tree (svn
> > revision 3485), this is ready to be verified again. It safe to come out
> > now :-)
> > 
> > -- Hal
> > 
> 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RMPP Message Format Errors

2005-09-20 Thread Eitan Zahavi

Hi Hal,

Seems like RMPP works !

This is an important milestone for OpenSM as we are now able to test the SM/SA 
with osmtest.

There is still some constant 8 bytes remainder in the RMPP number of received 
records calculation
(see osmtest -V log file) but this is minor (as no SA record is that small).

Thanks for your continuous support.

Eitan

Hal Rosenstock wrote:

Hi Eitan,

The send side RMPP changes for the truncation of the last SA
record have now stabilized. With the latest user_mad.c and
osm_vendor_ibumad.c changes which are in the OpenIB svn tree (svn
revision 3485), this is ready to be verified again. It safe to come out
now :-)

-- Hal



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RMPP fixes for 2.6.14

2005-09-07 Thread Michael S. Tsirkin
Quoting r. Roland Dreier <[EMAIL PROTECTED]>:
> Subject: Re: RMPP fixes for 2.6.14
> 
> Michael> Yes, thats what I was referring to.  Too late for 2.6.14?
> 
> Probably.  I wouldn't be comfortable pushing that into all the arch
> trees through my git tree.  I think we would need to go through lkml,
> and I think 2.6.14 will be closed to this sort of stuff around Friday.

It'll wait then. I wont be online this weekend.

-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RMPP fixes for 2.6.14

2005-09-07 Thread Roland Dreier
Michael> Yes, thats what I was referring to.  Too late for 2.6.14?

Probably.  I wouldn't be comfortable pushing that into all the arch
trees through my git tree.  I think we would need to go through lkml,
and I think 2.6.14 will be closed to this sort of stuff around Friday.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RMPP fixes for 2.6.14

2005-09-07 Thread Michael S. Tsirkin
Quoting r. Roland Dreier <[EMAIL PROTECTED]>:
> Subject: Re: RMPP fixes for 2.6.14
> 
> Michael> Roland, what do you say to the idea of moving
> Michael> mthca_doorbell.h to somewhere under include/asm?  Its not
> Michael> really mthca specific, is it?
> 
> Some of it definitely seems like it could be made generic.  I'm not
> sure whether mthca_write_db_rec() is worth it, but the write64()
> emulation with a lock might be worth it on 32-bit systems.

Yes, thats what I was referring to.
Too late for 2.6.14?

-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RMPP fixes for 2.6.14

2005-09-07 Thread Roland Dreier
Michael> Roland, what do you say to the idea of moving
Michael> mthca_doorbell.h to somewhere under include/asm?  Its not
Michael> really mthca specific, is it?

Some of it definitely seems like it could be made generic.  I'm not
sure whether mthca_write_db_rec() is worth it, but the write64()
emulation with a lock might be worth it on 32-bit systems.

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RMPP fixes for 2.6.14

2005-09-07 Thread Michael S. Tsirkin
Quoting r. Roland Dreier <[EMAIL PROTECTED]>:
> But letting me know about anything that I'm missing
> would be good.

Roland, what do you say to the idea of moving
mthca_doorbell.h to somewhere under include/asm?
Its not really mthca specific, is it?

-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RMPP fixes for 2.6.14

2005-09-07 Thread Roland Dreier
Michael> What should I be looking at? Linus's git?

You can look at my git:

http://www.kernel.org/git/?p=linux/kernel/git/roland/infiniband.git;a=summary

I just pushed a few more things, so it will take a few more minutes to
propagate to all the mirrors.

In my previous email, I was a little unclear.  I was just asking for
more RMPP changes specifically, since I know there's something to
merge there.  But letting me know about anything that I'm missing
would be good.

Michael> The qp->wait init patch :)

Yes, that's in there.  I have the following in my git tree on top of
what's already in Linus's tree:

Michael S. Tsirkin:
  IPoIB: fix memory leak
  IB/sa_query: avoid unnecessary list scan
  IB: Initialize qp->wait

Roland Dreier:
  IB: really reset QPs

Sean Hefty:
  IB: Add user-supplied context to userspace CM ABI

 - R.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RMPP fixes for 2.6.14

2005-09-07 Thread Michael S. Tsirkin
Quoting r. Roland Dreier <[EMAIL PROTECTED]>:
> Subject: RMPP fixes for 2.6.14
> 
> I found this RMPP difference between the current kernel and our
> subversion tree.

What should I be looking at? Linus's git?

> Is there anything else that needs to be merged for
> the kernel 2.6.14 tree?
> 
>  - R.

The qp->wait init patch :)

-- 
MST
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RMPP Message Format Errors (Short Term Plan)

2005-08-30 Thread Hal Rosenstock
Hi Eitan,

On Sat, 2005-08-27 at 10:59, Eitan Zahavi wrote:
> Once you think both sender and receiver side issues are resolved please 
> let us know so I can re-run the test with the IB Analyzer.

With r3251, the RMPP issues are resolved as far as I know. [IMO, The
only thing waiting for full closure if verification from Greg (MgtWG).]

Let me know if it works or if you find any further issues.

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: RMPP Message Format Errors

2005-08-30 Thread Hal Rosenstock
On Tue, 2005-08-30 at 12:49, Sean Hefty wrote:
> The interpretation of payload length for the first segment value looks 
> correct.  For the middle segments, 0 should work in all cases and may be 
> a slightly cleaner solution.

OK. I'm going ahead with these changes.

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: RMPP Message Format Errors

2005-08-30 Thread Sean Hefty

Hal Rosenstock wrote:
Hal, can you go ahead and commit your two patches for payload length 
changes for RMPP?



Do you think this is the correct interpretation ? If so, I will go
ahead. I was waiting for confirmation.


The interpretation of payload length for the first segment value looks 
correct.  For the middle segments, 0 should work in all cases and may be 
a slightly cleaner solution.


- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: RMPP Message Format Errors

2005-08-30 Thread Hal Rosenstock
Hi Sean,

On Tue, 2005-08-30 at 12:31, Sean Hefty wrote:
> Hal Rosenstock wrote:
> > I already submitted a patch for this. It wasn't clear to me what the
> > answer for the first segment is from Greg's response (so I sent a
> > followup to clarify that).
> 
> Hal, can you go ahead and commit your two patches for payload length 
> changes for RMPP?

Do you think this is the correct interpretation ? If so, I will go
ahead. I was waiting for confirmation.

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: RMPP Message Format Errors

2005-08-30 Thread Sean Hefty

Hal Rosenstock wrote:

I already submitted a patch for this. It wasn't clear to me what the
answer for the first segment is from Greg's response (so I sent a
followup to clarify that).


Hal, can you go ahead and commit your two patches for payload length 
changes for RMPP?


- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: RMPP Message Format Errors

2005-08-30 Thread Hal Rosenstock
On Tue, 2005-08-30 at 03:04, Eitan Zahavi wrote:
> > It's not a big deal to change it.  If the common interpretation is to only
> > include the partial data size, I will change it.
> I think the common interpretation is that the paylen n the first segment 
> should present the size of the "valid" data only.

I already submitted a patch for this. It wasn't clear to me what the
answer for the first segment is from Greg's response (so I sent a
followup to clarify that).

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: RMPP Message Format Errors

2005-08-30 Thread Eitan Zahavi

Sean Hefty wrote:

In my interpretation, partial data is indicated by the PayloadLength field in
the last segment only.  It's quite possible that my interpretation is


incorrect,


in which case the calculation in the RMPP code is off.


I agree the text might be missing an example or two for clarification.
Anyway, we probably can use the IB Analyzer as the ultimate
interpretation test. Note that there are IB implementations that uses
the first segment payload length as the source of packet length and
count on it to represent the correct DATA length.

We can take your interpretation to discussion in the IBTA MGTWG for
further discussion.
Is the effort for fixing it big?



It's not a big deal to change it.  If the common interpretation is to only
include the partial data size, I will change it.

I think the common interpretation is that the paylen n the first segment should present 
the size of the "valid" data only.


- Sean


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: RMPP Message Format Errors

2005-08-29 Thread Sean Hefty
>> In my interpretation, partial data is indicated by the PayloadLength field in
>> the last segment only.  It's quite possible that my interpretation is
>incorrect,
>> in which case the calculation in the RMPP code is off.
>I agree the text might be missing an example or two for clarification.
>Anyway, we probably can use the IB Analyzer as the ultimate
>interpretation test. Note that there are IB implementations that uses
>the first segment payload length as the source of packet length and
>count on it to represent the correct DATA length.
>
>We can take your interpretation to discussion in the IBTA MGTWG for
>further discussion.
>Is the effort for fixing it big?

It's not a big deal to change it.  If the common interpretation is to only
include the partial data size, I will change it.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RMPP Message Format Errors (Short Term Plan)

2005-08-28 Thread Hal Rosenstock
Hi Eitan,

On Sat, 2005-08-27 at 10:59, Eitan Zahavi wrote:
> Thanks for taking care of the RMPP issues.
> Once you think both sender and receiver side issues are resolved please 
> let us know so I can re-run the test with the IB Analyzer.

I think they are now resolved with the one line patch (which should help
the analyzer which I don't think impacts the end nodes).

-- Hal


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: RMPP Message Format Errors

2005-08-28 Thread Hal Rosenstock
Hi Eitan,

On Sat, 2005-08-27 at 11:14, Eitan Zahavi wrote:
> Sean Hefty wrote:
> 
> > 
> > I believe that the 220 byte payload length is for all RMPP MADs.  Only the
> > common and RMPP header lengths are ignored.
> Yes.
> > 
> > 
> > 
> >>Doesn't it need to account for a "partial" rather than full last segment
> >>transferred data in the first segment length ?
> Yes I think it needs to use the partial length.

Agreed.
 
> > What I couldn't easily tell from the spec is whether a partial last segment 
> > is
> > included in the initial payload length or not.  I read it as: "PayloadLength
> > counts all the bytes in the TransferredData field of the DATA packet 
> > format."
> > In my interpretation, partial data is indicated by the PayloadLength field 
> > in
> > the last segment only.  It's quite possible that my interpretation is 
> > incorrect,
> > in which case the calculation in the RMPP code is off.
> I agree the text might be missing an example or two for clarification.
> Anyway, we probably can use the IB Analyzer as the ultimate 
> interpretation test. Note that there are IB implementations that uses 
> the first segment payload length as the source of packet length and 
> count on it to represent the correct DATA length.
> 
> We can take your interpretation to discussion in the IBTA MGTWG for 
> further discussion.

I think the spec wording is ambiguous and we should take it to the
MgtWG. I believe your interpretation is the intent but could not find
any specific language other than the valid bytes in terms of the last
segment. The first segment length references transferred data which is
the whole segment. I'll send something to MgtWG on this and copy
openib-general.

> Is the effort for fixing it big?

It's a one line patch. I sent it previously.

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: RMPP Message Format Errors

2005-08-28 Thread Hal Rosenstock
Hi Sean,

On Fri, 2005-08-26 at 13:54, Sean Hefty wrote:
> >The 220 byte payload length is for SA. That's mostly right but assumes
> >the last segment will be full (and accounted for by the paylen in the
> >last segment).
> 
> I believe that the 220 byte payload length is for all RMPP MADs.

Yes, you're right.

>   Only the
> common and RMPP header lengths are ignored.
> 
> 
> >Doesn't it need to account for a "partial" rather than full last segment
> >transferred data in the first segment length ?
> 
> What I couldn't easily tell from the spec is whether a partial last segment is
> included in the initial payload length or not.  I read it as: "PayloadLength
> counts all the bytes in the TransferredData field of the DATA packet format."
> In my interpretation, partial data is indicated by the PayloadLength field in
> the last segment only.  It's quite possible that my interpretation is 
> incorrect,
> in which case the calculation in the RMPP code is off.

I'm pretty sure that the intent is that the length in the first segment
reflects the valid data (plus the header to be counted) so the last
segment doesn't count as a full length (220) unless it is full.

Patch for this shortly.

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: RMPP Message Format Errors

2005-08-27 Thread Eitan Zahavi

Sean Hefty wrote:



I believe that the 220 byte payload length is for all RMPP MADs.  Only the
common and RMPP header lengths are ignored.

Yes.





Doesn't it need to account for a "partial" rather than full last segment
transferred data in the first segment length ?

Yes I think it needs to use the partial length.



What I couldn't easily tell from the spec is whether a partial last segment is
included in the initial payload length or not.  I read it as: "PayloadLength
counts all the bytes in the TransferredData field of the DATA packet format."
In my interpretation, partial data is indicated by the PayloadLength field in
the last segment only.  It's quite possible that my interpretation is incorrect,
in which case the calculation in the RMPP code is off.

I agree the text might be missing an example or two for clarification.
Anyway, we probably can use the IB Analyzer as the ultimate 
interpretation test. Note that there are IB implementations that uses 
the first segment payload length as the source of packet length and 
count on it to represent the correct DATA length.


We can take your interpretation to discussion in the IBTA MGTWG for 
further discussion.

Is the effort for fixing it big?

Thanks
Eitan
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RMPP Message Format Errors (Short Term Plan)

2005-08-27 Thread Eitan Zahavi

Hi Hal,

Thanks for taking care of the RMPP issues.
Once you think both sender and receiver side issues are resolved please 
let us know so I can re-run the test with the IB Analyzer.


Eitan

Hal Rosenstock wrote:

Hi,

I will finish with RMPP and then embark on the 1.8.0 merge. I hope and
expect to start the latter early next week.




___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: RMPP Message Format Errors

2005-08-26 Thread Sean Hefty
>The 220 byte payload length is for SA. That's mostly right but assumes
>the last segment will be full (and accounted for by the paylen in the
>last segment).

I believe that the 220 byte payload length is for all RMPP MADs.  Only the
common and RMPP header lengths are ignored.


>Doesn't it need to account for a "partial" rather than full last segment
>transferred data in the first segment length ?

What I couldn't easily tell from the spec is whether a partial last segment is
included in the initial payload length or not.  I read it as: "PayloadLength
counts all the bytes in the TransferredData field of the DATA packet format."
In my interpretation, partial data is indicated by the PayloadLength field in
the last segment only.  It's quite possible that my interpretation is incorrect,
in which case the calculation in the RMPP code is off.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: RMPP Message Format Errors

2005-08-26 Thread Hal Rosenstock
On Fri, 2005-08-26 at 01:16, Sean Hefty wrote: 
> >In any case, doesn't the initial payload length need to be the number of
> >segments times (hdr_len - offsetof(struct ib_rmpp_mad, data)) + data_len
> >? If so, that's part of the problem.
> 
> I believe that the payload is being calculated correctly.  It should be the
> number of segments * 220 bytes per packet, or at least that was my
> interpretation of the spec.

The 220 byte payload length is for SA. That's mostly right but assumes
the last segment will be full (and accounted for by the paylen in the
last segment).

Doesn't it need to account for a "partial" rather than full last segment
transferred data in the first segment length ?

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RMPP Message Format Errors

2005-08-26 Thread Hal Rosenstock
Hi Eitan,

On Fri, 2005-08-26 at 06:15, Eitan Zahavi wrote: 
> I am trying to figure out how a client will figure out the number of
> records provided in the mad it gets back from umad.
> 
> Can you describe this?

A client would use the received length returned from umad_recv and
either the attribute offset in the RMPP header (or expected attribute
offset for record type) to calculate this (in the case of an SA client).
For other classes, it is class specific.

I think there is a problem in
osm_vendor_ibumad_sa.c::__osmv_sa_mad_rcv_cb which I will be working on
as soon as I sort through the send side issues.

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RMPP Message Format Errors

2005-08-26 Thread Eitan Zahavi

Hi Hal,

I am trying to figure out how a client will figure out the number of
records provided in the mad it gets back from umad.

Can you describe this?

Thanks

Eitan

Hal Rosenstock wrote:

On Mon, 2005-08-22 at 10:34, Eitan Zahavi wrote:


It gets a "real" received length provided it supplies a buffer large enough.


So I guess the "real receive length" is truncated to the last data 
record even if the packet sent was 256 bytes?



The receive buffer is not truncated. An error is returned if the buffer
supplied is too small for a receive is too small and it includes the
size of the buffer needed. 


I don't understand what you mean by "even if the packet sent was 256
bytes".

-- Hal


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: RMPP Message Format Errors

2005-08-25 Thread Sean Hefty
>if (rmpp_active) {
>   ...
>rmpp_mad->rmpp_hdr.paylen_newwin = cpu_to_be32(hdr_len -
>offsetof(struct ib_rmpp_mad, data) + data_len);
>
>Then in mad_rmpp.c::send_next_seg, I see:
>
>if (mad_send_wr->seg_num == 1) {
>rmpp_mad->rmpp_hdr.rmpp_rtime_flags |= IB_MGMT_RMPP_FLAG_FIRST;
>rmpp_mad->rmpp_hdr.paylen_newwin =
>cpu_to_be32(mad_send_wr->total_seg *
>(sizeof(struct ib_rmpp_mad) -
>   offsetof(struct ib_rmpp_mad, data)));
>That appears to me to overwrite the initial paylen but I might have
>missed something here.

The payload is being overridden, but that's necessary.  The payload that's set
when creating the MAD is used to indicate the size of the buffer.  The payload
set with the 1st segment indicates the size of the transfer.  They differ
because the headers are duplicated in each segment, but only a single copy is
provided in the send buffer.

>In any case, doesn't the initial payload length need to be the number of
>segments times (hdr_len - offsetof(struct ib_rmpp_mad, data)) + data_len
>? If so, that's part of the problem.

I believe that the payload is being calculated correctly.  It should be the
number of segments * 220 bytes per packet, or at least that was my
interpretation of the spec.

>Another alternative would be not to set paylen in the first segment.

That would work.  I tried to set the value to allow future optimization on the
receive side.

- Sean


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RMPP Message Format Errors

2005-08-25 Thread Hal Rosenstock
Hi Sean,

In mad.c::ib_create_send_mad, if rmpp is active, the payload length is
calculated as follows:

if (rmpp_active) {
...
rmpp_mad->rmpp_hdr.paylen_newwin = cpu_to_be32(hdr_len -
offsetof(struct ib_rmpp_mad, data) + data_len);

Then in mad_rmpp.c::send_next_seg, I see:

if (mad_send_wr->seg_num == 1) {
rmpp_mad->rmpp_hdr.rmpp_rtime_flags |= IB_MGMT_RMPP_FLAG_FIRST;
rmpp_mad->rmpp_hdr.paylen_newwin =
cpu_to_be32(mad_send_wr->total_seg *
(sizeof(struct ib_rmpp_mad) -
   offsetof(struct ib_rmpp_mad, data)));
That appears to me to overwrite the initial paylen but I might have
missed something here.

In any case, doesn't the initial payload length need to be the number of
segments times (hdr_len - offsetof(struct ib_rmpp_mad, data)) + data_len
? If so, that's part of the problem. 

Another alternative would be not to set paylen in the first segment.

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] RE: RMPP Message Format Errors

2005-08-25 Thread Hal Rosenstock
On Wed, 2005-08-24 at 04:07, Sean Hefty wrote: 

In the below,

c /opensm/osm vendor layer/

(It is also used by some SA client code in addition to OpenSM.

> Looking through the code, it appears that the proper size of the MAD is being
> reported in the kernel and exported up to userspace.  If I guessed the 
> structure
> of the opensm code correctly, the length is returned by umad_recv() in
> umad_receiver() in osm_vendor_ibumad.c  The length is discarded after
> umad_receiver() returns.

You "guessed" correctly :-)

> I guess that one possible solution is for opensm to save the length value into
> the payload_length field in the RMPP header before returning from
> umad_receiver().

Yes, that is a possible solution if it is needed on the receive side. It
looks to me like it is currently unused (based on method, received size,
and attribute offset). but it is probably a good idea to do this for the
future as another algorithm would work and might be better. I will put
this on my TODO list.

-- Hal


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RMPP Message Format Errors

2005-08-25 Thread Hal Rosenstock
On Mon, 2005-08-22 at 10:34, Eitan Zahavi wrote:
> > It gets a "real" received length provided it supplies a buffer large enough.
> So I guess the "real receive length" is truncated to the last data 
> record even if the packet sent was 256 bytes?

The receive buffer is not truncated. An error is returned if the buffer
supplied is too small for a receive is too small and it includes the
size of the buffer needed. 

I don't understand what you mean by "even if the packet sent was 256
bytes".

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] RE: RMPP Message Format Errors

2005-08-24 Thread Sean Hefty
>> But the receive side needs to calculate back the correct size of the
>assembled MAD.
>> If it is done in kernel or user it does not matter. To my best knowledge the
>only way to calculate how many records are enclosed in an RMPP message is to
>use the paylen and offset.
>> How can it be done without looking at paylen ?
>>
>> All Sean is saying is that the receive RMPP ignores a non zero PayLen in a
>first segment and uses the last bit (and obviously the PayLen in the last
>segment) to determine the received length (of the reassembled MAD).
>>
>OK, thanks for the clarification. We could use a paylen = 0 at first
>(but that is not last) segment

Looking through the code, it appears that the proper size of the MAD is being
reported in the kernel and exported up to userspace.  If I guessed the structure
of the opensm code correctly, the length is returned by umad_recv() in
umad_receiver() in osm_vendor_ibumad.c  The length is discarded after
umad_receiver() returns.

I guess that one possible solution is for opensm to save the length value into
the payload_length field in the RMPP header before returning from
umad_receiver().

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RMPP Message Format Errors

2005-08-23 Thread Eitan Zahavi

Hal Rosenstock wrote:

Hi Eitan,
 
You wrote:

"Note that the current implementation of the RMPP code ignores the payload length on 
the receive side, and instead relies on the last bit to determine the end of a 
transfer."
 
But the receive side needs to calculate back the correct size of the assembled MAD.

If it is done in kernel or user it does not matter. To my best knowledge the 
only way to calculate how many records are enclosed in an RMPP message is to 
use the paylen and offset.
How can it be done without looking at paylen ?
 
All Sean is saying is that the receive RMPP ignores a non zero PayLen in a first segment and uses the last bit (and obviously the PayLen in the last segment) to determine the received length (of the reassembled MAD).
 
OK, thanks for the clarification. We could use a paylen = 0 at first 
(but that is not last) segment


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] RE: RMPP Message Format Errors

2005-08-23 Thread Hal Rosenstock
Hi Eitan,
 
You wrote:
"Note that the current implementation of the RMPP code ignores the payload 
length on the receive side, and instead relies on the last bit to determine the 
end of a transfer."
 
But the receive side needs to calculate back the correct size of the assembled 
MAD.
If it is done in kernel or user it does not matter. To my best knowledge the 
only way to calculate how many records are enclosed in an RMPP message is to 
use the paylen and offset.
How can it be done without looking at paylen ?
 
All Sean is saying is that the receive RMPP ignores a non zero PayLen in a 
first segment and uses the last bit (and obviously the PayLen in the last 
segment) to determine the received length (of the reassembled MAD).
 
-- Hal
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] RE: RMPP Message Format Errors

2005-08-23 Thread Hal Rosenstock
Hi Eitan,
 
>We have started testing RMPP packets with osmtest and opensm (gen2 version).

>We did not go very far. The first NodeRecord GetTable of all the nodes in a 
>"loopback" case, has some issues.

Is this loopback between the 2 HCA ports ?(Just so I can recreate this when I 
get back).

> The explanation is below:

> 1.  NodeRecord MAD size is 112bytes (note the required padding of 4 bytes 
> at the end of the NodeRec data). 
> 2.  OpenSM log file shows the query should return 2 records one for each 
> end-port. This really happens: 

Aug 21 14:59:49 998104 [40D9DBB0] -> __osm_nr_rcv_create_nr: 
Looking for NodeRecord with LID: 0x0 GUID:0x

Aug 21 14:59:49 998224 [40D9DBB0] -> __osm_nr_rcv_new_nr: New 
NodeRecord: node 0x0002c90217a0

port 0x0002c90217a1, lid 
0x1.

Aug 21 14:59:49 998327 [40D9DBB0] -> __osm_nr_rcv_new_nr: New 
NodeRecord: node 0x0002c90217a0

port 0x0002c90217a2, lid 
0x2.

Aug 21 14:59:49 998395 [40D9DBB0] -> osm_nr_rcv_process: 
Returning 2 records.

> 3.  On the wire we see the following (see attached gif for more details): 

Could you send the raw hex as well ?
a.  Two data segments were sent and two ACKs were returned. This is OK. 
b.  The first segment reports PayLen = 440bytes. According to the spec the 
first segment might provide paylen != 0 and when it is done it should be equal 
to the (class header * Num-Segments) + data length. In our case we have data 
length = 2*112, and SA extra header = 20byte * 2seg. This leads to peylen=264 
and not 440!!!
The spec defines that in p775-l37.
So this is a violation of the spec. 

Agreed. It should either be 0 or the real length.

c.  The last segment (segment 2) provides the paylen field of 100. The 
expected value for the last segment length should have been: SA extra header + 
leftover data size from prev segments. Since the first segment has 200bytes for 
data the left over should have been 112*2 - 200 = 24. With the SA extra header 
44bytes.
So this is another violation of the spec. 

Yes, but perhaps related to the first issue.

d.  The analyzer is confused by the above and reports the result as having 
3 NodeRecords. 
e.  <> 
4.  Following that when we trace the log file of osmtest we find more 
issues. Probably caused by changes to the vendor layer or the rmpp assembly: It 
is expected that after assembly the size of the RMPP mad reported to the osm 
vendor layer will be the rmpp header + SA extra header + data-size. In our case 
that is 32 + 20 + 2*112 = 276. 

The log file shows:

Aug 21 14:59:49 [40D87BB0] -> __osmv_sa_mad_rcv_cb: Count = 1 = 
200 / 112 (88)

Aug 21 14:59:49 [4017F6C0] -> osmtest_write_all_node_recs: 
Received 1 records

So this is another problem - probably with the way RMPP results 
are assembled or pass back to the vendor.

This may be a result of the violations on the sending side. 

> Please let me know if you will have time to dig into these problems or if I 
> should try and resolve them myself and provide patches. 

I will look at these shortly after I get back.

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RMPP Message Format Errors

2005-08-22 Thread Eitan Zahavi

Sean Hefty wrote:



The RMPP code returns the size of the receive as sizeof MAD header + sizeof RMPP
header + optional sizeof other header (e.g. SA header) + actual payload.  This
size can be used to allocate a data buffer large enough to hold the reassembled
MAD.  You should be able to use this to determine the number of records in the
payload.

Good. But how is that size delivered? I mean through umad to the client.

From my first email on this thread you can see there is at least one 
bug in the chain of events:

a. First segment paylen should be either 0 or correct value - it is 
   neither. Should be 264 but is 440
b. Last segment paylen MUST be updated to reflect the size of the data
   in the MAD (including class header) - should be 24 but is 100.
c. In the receiver the re-assembled data size is not correct. OpenSM
   reports it got a 200 bytes MAD back. Probably a bug in the vendor
   layer or umad.

Here is the full data again.



1.  NodeRecord MAD size is 112bytes (note the required padding of 4
bytes at the end of the NodeRec data).
2.  OpenSM log file shows the query should return 2 records one for
each end-port. This really happens:


Aug 21 14:59:49 998104 [40D9DBB0] -> __osm_nr_rcv_create_nr:
Looking for NodeRecord with LID: 0x0 GUID:0x

Aug 21 14:59:49 998224 [40D9DBB0] -> __osm_nr_rcv_new_nr: New
NodeRecord: node 0x0002c90217a0

port 0x0002c90217a1, lid
0x1.

Aug 21 14:59:49 998327 [40D9DBB0] -> __osm_nr_rcv_new_nr: New
NodeRecord: node 0x0002c90217a0

port 0x0002c90217a2, lid
0x2.

Aug 21 14:59:49 998395 [40D9DBB0] -> osm_nr_rcv_process:
Returning 2 records.

3.  On the wire we see the following (see attached gif for more
details):
a.  Two data segments were sent and two ACKs were returned. This is
OK.
b.  The first segment reports PayLen = 440bytes. According to the
spec the first segment might provide paylen != 0 and when it is done it
should be equal to the (class header * Num-Segments) + data length. In
our case we have data length = 2*112, and SA extra header = 20byte *
2seg. This leads to peylen=264 and not 440!!!
The spec defines that in p775-l37.
So this is a violation of the spec.
c.  The last segment (segment 2) provides the paylen field of 100.
The expected value for the last segment length should have been: SA
extra header + leftover data size from prev segments. Since the first
segment has 200bytes for data the left over should have been 112*2 - 200
= 24. With the SA extra header 44bytes.
So this is another violation of the spec.
d.  The analyzer is confused by the above and reports the result as
having 3 NodeRecords.
e.  <>
4.  Following that when we trace the log file of osmtest we find
more issues. Probably caused by changes to the vendor layer or the rmpp
assembly: It is expected that after assembly the size of the RMPP mad
reported to the osm vendor layer will be the rmpp header + SA extra
header + data-size. In our case that is 32 + 20 + 2*112 = 276.

The log file shows:

Aug 21 14:59:49 [40D87BB0] -> __osmv_sa_mad_rcv_cb: Count = 1 =
200 / 112 (88)

Aug 21 14:59:49 [4017F6C0] -> osmtest_write_all_node_recs:
Received 1 records
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] RE: RMPP Message Format Errors

2005-08-22 Thread Sean Hefty
>The number of records is an SA thing and not RMPP thing. This is transparent to
>RMPP itself.
>
>The need to determine the number of records is a consumer issue (SA or SA
>client). To do this, AttributeOffset and (at least the last) PayloadLength
>field is needed (as one can't rely on the first PayloadLength being non zero).

The RMPP code returns the size of the receive as sizeof MAD header + sizeof RMPP
header + optional sizeof other header (e.g. SA header) + actual payload.  This
size can be used to allocate a data buffer large enough to hold the reassembled
MAD.  You should be able to use this to determine the number of records in the
payload.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RMPP Message Format Errors

2005-08-22 Thread Eitan Zahavi

Hal Rosenstock wrote:

Hi again Eitan,
 


The transparency to the RMPP is an RMPP implementation choice.
Having incorrect paylen in the first segment is a compliancy violation.
It should be either 0 or correct value.


 
Yes, is that what is going on ? I haven't had a chance to look at the GIF you sent

and analyze it.

Yes that is exactly what I have provided in the first mail:
 


But how would the SA or SA Client that gets an assembled MAD be
able to tell the number of records?


 
It gets a "real" received length provided it supplies a buffer large enough.
So I guess the "real receive length" is truncated to the last data 
record even if the packet sent was 256 bytes?



Also, does the current implementation let the client do the assembly?
 
No.

Good. I hoped this is the case.



Anyway, the last segment paylen was incorrect too.
 
OK. That's another thing I'll look at. 
The first mail I sent had all the analysis in it with exact peylen 
values for first and second segments.
 
-- Hal


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] RE: RMPP Message Format Errors

2005-08-22 Thread Hal Rosenstock
Hi again Eitan,
 
> The transparency to the RMPP is an RMPP implementation choice.
> Having incorrect paylen in the first segment is a compliancy violation.
> It should be either 0 or correct value.
 
Yes, is that what is going on ? I haven't had a chance to look at the GIF you 
sent
and analyze it.
 
> But how would the SA or SA Client that gets an assembled MAD be
> able to tell the number of records?
 
It gets a "real" received length provided it supplies a buffer large enough.

> Also, does the current implementation let the client do the assembly?
 
No.

> If so how would it handle abort transactions?
 
See previous answer.

> If the re-assembly is done by the MAD service then the client only gets
> offset in the MAD header and probably mad size which is MAD Header +
> RMPP header + SA extra header + data.

> Anyway, the last segment paylen was incorrect too.
 
OK. That's another thing I'll look at. 
 
-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RMPP Message Format Errors

2005-08-22 Thread Eitan Zahavi

Hal Rosenstock wrote:
 
The number of records is an SA thing and not RMPP thing. This is

transparent to RMPP itself.

The transparency to the RMPP is an RMPP implementation choice.
Having incorrect paylen in the first segment is a compliancy violation. 
It should be either 0 or correct value.
 
The need to determine the number of records is a consumer issue (SA or

SA client). To do this, AttributeOffset and (at least the last)
PayloadLength field is needed (as one can't rely on the first
PayloadLength being non zero).
True. But how would the SA or SA Client that gets an assembled MAD be 
able to tell the number of records?

Also, does the current implementation let the client do the assembly?
If so how would it handle abort transactions?

If the re-assembly is done by the MAD service then the client only gets 
offset in the MAD header and probably mad size which is MAD Header + 
RMPP header + SA extra header + data.


Anyway, the last segment paylen was incorrect too.
 
-- Hal




From: Eitan Zahavi [mailto:[EMAIL PROTECTED]
Sent: Mon 8/22/2005 1:54 AM
To: 'Sean Hefty'; Eitan Zahavi; Hal Rosenstock
Cc: OPENIB GENERAL; Liran Sorani; Amit Krig; Aviram Gutman
Subject: RE: RMPP Message Format Errors


Hi Sean,
 
You wrote:

"Note that the current implementation of the RMPP code ignores the
payload length on the receive side, and instead relies on the last bit
to determine the end of a transfer."
 
But the receive side needs to calculate back the correct size of the

assembled MAD.
If it is done in kernel or user it does not matter. To my best knowledge
the only way to calculate how many records are enclosed in an RMPP
message is to use the paylen and offset.
How can it be done without looking at paylen ?
 
EZ
 
Eitan Zahavi

Design Technology Director
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL
 
-Original Message-
From: Sean Hefty [mailto:[EMAIL PROTECTED] 
Sent: Monday, August 22, 2005 1:01 AM

To: 'Eitan Zahavi'; Hal Rosenstock
Cc: OPENIB GENERAL; Liran Sorani; Amit Krig; Aviram Gutman
Subject: RE: RMPP Message Format Errors
 
Please let me know if you will have time to dig into these problems or
if I should try and resolve them myself and provide patches. 
I will not be able to look at this until early next week (with IDF

running this week), but I will try to do so.   So, it wouldn't surprise
me if the receive side accepted an invalid RMPP MAD.
- Sean


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] RE: RMPP Message Format Errors

2005-08-22 Thread Hal Rosenstock
Hi Eitan,
 
All Sean is saying is that the RMPP code itself only uses the last bit. 
 
The number of records is an SA thing and not RMPP thing. This is transparent to 
RMPP itself.
 
The need to determine the number of records is a consumer issue (SA or SA 
client). To do this, AttributeOffset and (at least the last) PayloadLength 
field is needed (as one can't rely on the first PayloadLength being non zero).
 
-- Hal



From: Eitan Zahavi [mailto:[EMAIL PROTECTED]
Sent: Mon 8/22/2005 1:54 AM
To: 'Sean Hefty'; Eitan Zahavi; Hal Rosenstock
Cc: OPENIB GENERAL; Liran Sorani; Amit Krig; Aviram Gutman
Subject: RE: RMPP Message Format Errors


Hi Sean,
 
You wrote:
"Note that the current implementation of the RMPP code ignores the payload 
length on the receive side, and instead relies on the last bit to determine the 
end of a transfer."
 
But the receive side needs to calculate back the correct size of the assembled 
MAD.
If it is done in kernel or user it does not matter. To my best knowledge the 
only way to calculate how many records are enclosed in an RMPP message is to 
use the paylen and offset.
How can it be done without looking at paylen ?
 
EZ
 
Eitan Zahavi
Design Technology Director
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL
 
-Original Message-
From: Sean Hefty [mailto:[EMAIL PROTECTED] 
Sent: Monday, August 22, 2005 1:01 AM
To: 'Eitan Zahavi'; Hal Rosenstock
Cc: OPENIB GENERAL; Liran Sorani; Amit Krig; Aviram Gutman
Subject: RE: RMPP Message Format Errors
 
Please let me know if you will have time to dig into these problems or if I 
should try and resolve them myself and provide patches. 
I will not be able to look at this until early next week (with IDF running this 
week), but I will try to do so.   So, it wouldn't surprise me if the receive 
side accepted an invalid RMPP MAD.
- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] RE: RMPP Message Format Errors

2005-08-21 Thread Eitan Zahavi
Title: RMPP Message Format Errors








Hi Sean,

 

You wrote:

"Note
that the current implementation of the RMPP code ignores the payload length on
the receive side, and instead relies on the last bit to determine the end of a
transfer."

 

But the receive side needs to calculate back the correct size of the assembled
MAD.

If it is done in kernel or user it does not matter. To my best knowledge
the only way to calculate how many records are enclosed in an RMPP message is
to use the paylen and offset.

How can it be done without looking at paylen ?

 

EZ

 



Eitan Zahavi

Design
Technology Director

Mellanox
Technologies LTD

Tel:+972-4-9097208
Fax:+972-4-9593245

P.O.
Box 586 Yokneam 20692 ISRAEL



 



-Original Message-
From: Sean Hefty
[mailto:[EMAIL PROTECTED] 
Sent: Monday, August 22, 2005 1:01
AM
To: 'Eitan Zahavi'; Hal Rosenstock
Cc: OPENIB GENERAL; Liran Sorani;
Amit Krig; Aviram Gutman
Subject: RE: RMPP Message Format
Errors

 



Please let me know if you
will have time to dig into these problems or if I should try
and resolve them myself and provide patches.


I will not
be able to look at this until early next week (with IDF running this week), but
I will try to do so.   So, it wouldn't surprise me if the
receive side accepted an invalid RMPP MAD.

- Sean










___
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[openib-general] RE: RMPP Message Format Errors

2005-08-21 Thread Sean Hefty
Title: RMPP Message Format Errors










Please let me know if you
will have time to dig into these problems or if I should try and resolve them myself and
provide patches. 

I will not be able to look
at this until early next week (with IDF running this week), but I will try to
do so.  Note that the current implementation of the RMPP code ignores the
payload length on the receive side, and instead relies on the last bit to
determine the end of a transfer.  So, it wouldn’t surprise me if the
receive side accepted an invalid RMPP MAD.

- Sean








___
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] Re: RMPP

2005-06-28 Thread Fab Tillier
> From: Hal Rosenstock [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, June 28, 2005 11:47 AM
> I think I see what is going on here...
> 
> In user_mad.c::send_handler
> 
> 
> if (send_wc->status == IB_WC_RESP_TIMEOUT_ERR) {
> packet->mad.hdr.status = ETIMEDOUT;
> 
> if (!queue_packet(file, agent, packet))
> return;
> }
> 
> That is what is causing the problem. I think the send side queues the
> packet on a timeout and simulates a receive so that a transaction can be
> terminated. RMPP sends appear to be a little different in that even non
> transactions get timeouts.

Why is a receive generated at all?  Shouldn't the send completion be enough?  It
seems odd to just generate a receive indication with the sent data when really,
nothing was received.  From an app perspective, it seems that handling the
IB_WC_RESP_TIMEOUT_ERR case should be sufficient to indicate the transaction is
complete.

- Fab

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: RMPP

2005-06-28 Thread Hal Rosenstock
On Tue, 2005-06-28 at 14:07, Hal Rosenstock wrote: 
> On Tue, 2005-06-28 at 13:48, Hal Rosenstock wrote:
> > On Tue, 2005-06-28 at 13:44, Sean Hefty wrote:
> > > Hal Rosenstock wrote:
> > > > Hi Sean,
> > > > 
> > > > I'm in the process of enabling the receive side RMPP from user space and
> > > > this is what I'm seeing in terms of RMPP right now. I have a question
> > > > about the OpenSM side.
> > > > 
> > > > SA client OpenSM
> > > > SA GetTable (PortInfoRecord) -->
> > > >  <--  SA GetTableResp (PortInfoRecord)
> > > > RMPP active, first
> > > > payload length 0x44C
> > > > 
> > > > retries is set to 4 so I see 4 responses (at 2 sec intervals) as the
> > > > client is not currently ACKing. All is fine up to that point.
> > > > 
> > > > At that point, OpenSM sees a large receive which appears to be that send
> > > > timing out (nothing was sent nor observed on the IB wire).
> > > > 
> > > > Could a timed out RMPP send end up as a receive somehow ?
> > > 
> > > On the side that sent the MAD?
> > 
> > The side that sent the RMPP MAD response (e.g. OpenSM).
> > 
> > >   That should be no.
> > 
> > That's what I thought. I'm not sure where the problem is but will start
> > to try to narrow it down.
> 
> I do get EINVAL from user_mad.c::ib_umad_read as follows:
> 
> if (count < packet->length + sizeof (struct ib_user_mad))
>   ret = -EINVAL;
> 
> as the packet->length is larger than a single MAD (and looks like the
> user MAD that was sent by OpenSM).

I think I see what is going on here...

In user_mad.c::send_handler


if (send_wc->status == IB_WC_RESP_TIMEOUT_ERR) {
packet->mad.hdr.status = ETIMEDOUT;

if (!queue_packet(file, agent, packet))
return;
}

That is what is causing the problem. I think the send side queues the
packet on a timeout and simulates a receive so that a transaction can be
terminated. RMPP sends appear to be a little different in that even non
transactions get timeouts. 

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: RMPP

2005-06-28 Thread Hal Rosenstock
On Tue, 2005-06-28 at 13:48, Hal Rosenstock wrote:
> On Tue, 2005-06-28 at 13:44, Sean Hefty wrote:
> > Hal Rosenstock wrote:
> > > Hi Sean,
> > > 
> > > I'm in the process of enabling the receive side RMPP from user space and
> > > this is what I'm seeing in terms of RMPP right now. I have a question
> > > about the OpenSM side.
> > > 
> > > SA client OpenSM
> > > SA GetTable (PortInfoRecord) -->
> > >  <--  SA GetTableResp (PortInfoRecord)
> > > RMPP active, first
> > > payload length 0x44C
> > > 
> > > retries is set to 4 so I see 4 responses (at 2 sec intervals) as the
> > > client is not currently ACKing. All is fine up to that point.
> > > 
> > > At that point, OpenSM sees a large receive which appears to be that send
> > > timing out (nothing was sent nor observed on the IB wire).
> > > 
> > > Could a timed out RMPP send end up as a receive somehow ?
> > 
> > On the side that sent the MAD?
> 
> The side that sent the RMPP MAD response (e.g. OpenSM).
> 
> >   That should be no.
> 
> That's what I thought. I'm not sure where the problem is but will start
> to try to narrow it down.

I do get EINVAL from user_mad.c::ib_umad_read as follows:

if (count < packet->length + sizeof (struct ib_user_mad))
ret = -EINVAL;

as the packet->length is larger than a single MAD (and looks like the
user MAD that was sent by OpenSM).

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RMPP

2005-06-28 Thread Hal Rosenstock
On Tue, 2005-06-28 at 13:44, Sean Hefty wrote:
> Hal Rosenstock wrote:
> > Hi Sean,
> > 
> > I'm in the process of enabling the receive side RMPP from user space and
> > this is what I'm seeing in terms of RMPP right now. I have a question
> > about the OpenSM side.
> > 
> > SA client OpenSM
> > SA GetTable (PortInfoRecord) -->
> >  <--  SA GetTableResp (PortInfoRecord)
> > RMPP active, first
> > payload length 0x44C
> > 
> > retries is set to 4 so I see 4 responses (at 2 sec intervals) as the
> > client is not currently ACKing. All is fine up to that point.
> > 
> > At that point, OpenSM sees a large receive which appears to be that send
> > timing out (nothing was sent nor observed on the IB wire).
> > 
> > Could a timed out RMPP send end up as a receive somehow ?
> 
> On the side that sent the MAD?

The side that sent the RMPP MAD response (e.g. OpenSM).

>   That should be no.

That's what I thought. I'm not sure where the problem is but will start
to try to narrow it down.

> On the remote side, a send can timeout, but still be received, since the ACK 
> is unreliable.  But I don't think that this is the situation that you're 
> describing.

No. The remote side appears to be behaving as I would expect (at least
as far as I have gone).

-- Hal


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RMPP

2005-06-28 Thread Sean Hefty

Hal Rosenstock wrote:

Hi Sean,

I'm in the process of enabling the receive side RMPP from user space and
this is what I'm seeing in terms of RMPP right now. I have a question
about the OpenSM side.

SA client OpenSM
SA GetTable (PortInfoRecord) -->
 <--  SA GetTableResp (PortInfoRecord)
RMPP active, first
payload length 0x44C

retries is set to 4 so I see 4 responses (at 2 sec intervals) as the
client is not currently ACKing. All is fine up to that point.

At that point, OpenSM sees a large receive which appears to be that send
timing out (nothing was sent nor observed on the IB wire).

Could a timed out RMPP send end up as a receive somehow ?


On the side that sent the MAD?  That should be no.

On the remote side, a send can timeout, but still be received, since the ACK 
is unreliable.  But I don't think that this is the situation that you're 
describing.


- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [openib-general] Re: RMPP

2005-05-04 Thread Sean Hefty
Hal Rosenstock wrote:
Yes, that looks better in terms of clearing the padding. I still need to
double check my math on the PayloadLengths.
It turned out that grmpp was clearing the MAD, which was why I wasn't 
detecting the issue.

Also, I've added some documentation on using RMPP to ib_mad.h and added an 
rmpp_active flag to ib_create_send_mad().  My current solution is to have 
ib_create_send_mad() format the RMPP header for the user.

- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: RMPP

2005-05-04 Thread Hal Rosenstock
On Wed, 2005-05-04 at 17:51, Sean Hefty wrote: 
> Can you try with this patch?
> 
> Signed-off-by: Sean Hefty <[EMAIL PROTECTED]>
> 
> Index: core/mad.c
> ===
> -- core/mad.c (revision 2256)
> +++ core/mad.c(working copy)
> @@ -796,9 +796,9 @@
>   buf = kmalloc(sizeof *send_buf + buf_size, gfp_mask);
>   if (!buf)
>   return ERR_PTR(-ENOMEM);
> + memset(buf, 0, sizeof *send_buf + buf_size);
>  
>   send_buf = buf + buf_size;
> - memset(send_buf, 0, sizeof *send_buf);
>   send_buf->mad = buf;
>  
>   send_buf->sge.addr = dma_map_single(mad_agent->device->dma_device,

Yes, that looks better in terms of clearing the padding. I still need to
double check my math on the PayloadLengths.

Thanks.

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


RE: [openib-general] Re: RMPP

2005-05-04 Thread Sean Hefty
>> > I also see the padding on the last segment of
>> > a multipacket send not cleared (I integrated the part of your patch
>> > relating to the pad calculation).
>>
>> I ran some tests, and didn't see any cases where the padding wasn't zero.
>> The RMPP code doesn't touch the padding itself, and create_send should
>> allocate it zeroed.  Are you using an analyzer and seeing that it's not 
>> zeroed?
>
>Yes. I stating this from what I see on the IB "wire".

Can you try with this patch?

Signed-off-by: Sean Hefty <[EMAIL PROTECTED]>

Index: core/mad.c
===
--- core/mad.c  (revision 2256)
+++ core/mad.c  (working copy)
@@ -796,9 +796,9 @@
buf = kmalloc(sizeof *send_buf + buf_size, gfp_mask);
if (!buf)
return ERR_PTR(-ENOMEM);
+   memset(buf, 0, sizeof *send_buf + buf_size);
 
send_buf = buf + buf_size;
-   memset(send_buf, 0, sizeof *send_buf);
send_buf->mad = buf;
 
send_buf->sge.addr = dma_map_single(mad_agent->device->dma_device,



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RMPP

2005-05-04 Thread Sean Hefty
Hal Rosenstock wrote:
On Wed, 2005-05-04 at 16:45, Sean Hefty wrote:
Hal Rosenstock wrote:
I also see the padding on the last segment of
a multipacket send not cleared (I integrated the part of your patch
relating to the pad calculation).
I ran some tests, and didn't see any cases where the padding wasn't zero. 
The RMPP code doesn't touch the padding itself, and create_send should 
allocate it zeroed.  Are you using an analyzer and seeing that it's not zeroed?

Yes. I stating this from what I see on the IB "wire".
I think my tests were just lucky...  It looks like create_send_mad zeros 
only the top portion of the data buffer.

- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RMPP

2005-05-04 Thread Hal Rosenstock
On Wed, 2005-05-04 at 16:45, Sean Hefty wrote:
> Hal Rosenstock wrote:
> > I also see the padding on the last segment of
> > a multipacket send not cleared (I integrated the part of your patch
> > relating to the pad calculation).
> 
> I ran some tests, and didn't see any cases where the padding wasn't zero. 
> The RMPP code doesn't touch the padding itself, and create_send should 
> allocate it zeroed.  Are you using an analyzer and seeing that it's not 
> zeroed?

Yes. I stating this from what I see on the IB "wire".

-- Hal

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RMPP

2005-05-04 Thread Sean Hefty
Hal Rosenstock wrote:
I also see the padding on the last segment of
a multipacket send not cleared (I integrated the part of your patch
relating to the pad calculation).
I ran some tests, and didn't see any cases where the padding wasn't zero. 
The RMPP code doesn't touch the padding itself, and create_send should 
allocate it zeroed.  Are you using an analyzer and seeing that it's not zeroed?

- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RMPP

2005-05-04 Thread Sean Hefty
Hal Rosenstock wrote:
In addition to passing the hdr_len and data_len to ib_create_send_mad: 
rmpp_mad->rmpp_hdr.paylen_newwin = cpu_to_be32(hdr_len -
offsetof(struct ib_rmpp_mad, data) + data_len);
That's not the "real" payload length that would go into the packet; just
one segment's worth of class header. Correct ?

If the above is correct, I'm not sure the actual payload lengths in the
DATA packets are correct. I also see the padding on the last segment of
a multipacket send not cleared (I integrated the part of your patch
relating to the pad calculation).
Running grmpp and using madeye to check the packets, I ran 3 tests:
user-data size  payload 1st in segment  payload in last segment
12001320 (220x6)124
50  54  54
580 660 (220x3) 152
For multi-packet segments, I think that the payload in the 1st segment will 
always be a multiple of 220 bytes (256 - sizeof common MAD header - sizeof 
RMPP header).  For the tests that I ran, these appear to be correct.

Since grmpp uses vendor MADs, it can transfer 216 bytes of user-data per 
segment.  Using the 580 example, the data is segmented into 216 + 216 + 148 
bytes.  The payload in the last segment includes the extra 4 byte vendor 
specific header.  Likewise, 1200 = 216 x 5 + 120, with an extra 4 bytes 
added for the header.

I haven't looked at why the data isn't cleared yet, but will do so next. 
The memory allocations in create_send_mad look right to me...

- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RMPP

2005-05-04 Thread Sean Hefty
Hal Rosenstock wrote:
In one test case, I see the first RMPP DATA segment sent and there is no
response (ACK) from the receiver (this was due to a (receiver) test
program issue). The transmitter retry depends on retries in the send_wr
ud structure. Does it need to known/Is there a way to know that this
failed (no ACK, etc.) when retries are exhausted or is this reliant on
the receiver rerequesting or is the entire RMPP transaction treated like
a UD send (e.g. unreliable) ?
The RMPP send is treated as reliable, even if no response is expected.  If 
the send is not ACKed completely, the request will timeout, and a send 
failure is reported to the user.  If the send completes successfully, then 
the user knows that it was received by the remote side; although, there's no 
guarantee that it was processed by anyone.

Is this on any (RMPP) transmits or just requests ? I will check to see
if I can see this.
This is true for any RMPP send operation.
You can test for this by running a grmpp client, without having the grmpp 
server loaded on the destination node.  You should see a timeout error 
sending the MAD in the messages log file.

You can also see timeouts on both requests and replies if you run grmpp and 
increase the number of messages to a few thousand.

(e.g. insmod ib_grmpp.ko "slid=1" "dlid=2" "message_count=15000" 
"message_size=100" "responses=1")

- Sean
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RMPP

2005-05-04 Thread Hal Rosenstock
On Wed, 2005-05-04 at 13:43, Sean Hefty wrote:
> Hal Rosenstock wrote:
> > Hi Sean,
> > 
> > In addition to passing the hdr_len and data_len to ib_create_send_mad: 
> > rmpp_mad->rmpp_hdr.paylen_newwin = cpu_to_be32(hdr_len -
> > offsetof(struct ib_rmpp_mad, data) + data_len);
> > That's not the "real" payload length that would go into the packet; just
> > one segment's worth of class header. Correct ?
> 
> There are two sizes needed by the RMPP code.  First, it needs the total 
> length of the data buffer, which includes any necessary padding.  This is 
> set in the scatter-gather entry for the work request.  Second, it needs to 
> know how much of the data buffer contains valid user data.

Got it.

> ib_create_send_mad returns the length of the of the valid user data in a 
> slightly encoded format.  It subtracts the size of the RMPP and common MAD 
> headers from the size (see below).  The intent is that the payload length 
> value is set to the correct value for MADs that are a single segment in 
> length.

Right. That might be a better way of expressing it than how I did :-)

> -
> MAD header
> -
> RMPP header
> -
> SA/Vendor hdr
> -
> User data
> -
> Pad
> -
> 
> sge length = size from MAD header to end of pad.
> payload = size from SA/Vendor hdr to end of user data.
> 
> > If the above is correct, I'm not sure the actual payload lengths in the
> > DATA packets are correct. I also see the padding on the last segment of
> > a multipacket send not cleared (I integrated the part of your patch
> > relating to the pad calculation).
> 
> I will run some tests and verify the payload values using madeye.  I'm not 
> sure why the padding isn't cleared.  It may be an indication that 
> create_send_mad isn't allocating the correct pad size.

Thanks.

> > In one test case, I see the first RMPP DATA segment sent and there is no
> > response (ACK) from the receiver (this was due to a (receiver) test
> > program issue). The transmitter retry depends on retries in the send_wr
> > ud structure. Does it need to known/Is there a way to know that this
> > failed (no ACK, etc.) when retries are exhausted or is this reliant on
> > the receiver rerequesting or is the entire RMPP transaction treated like
> > a UD send (e.g. unreliable) ?
> 
> The RMPP send is treated as reliable, even if no response is expected.  If 
> the send is not ACKed completely, the request will timeout, and a send 
> failure is reported to the user.  If the send completes successfully, then 
> the user knows that it was received by the remote side; although, there's no 
> guarantee that it was processed by anyone.

Is this on any (RMPP) transmits or just requests ? I will check to see
if I can see this.

> The code will retry a given segment the number of times specified by the 
> user.  If forward progress is made on the send, the retry count is reset.

for each subsequent segment. Makes sense.

> > Here's a summary of changes so far:
> > 1. ib_create_send_mad either needs an additional parameter (RMPP active
> > in current send packet) or the paylen_newwin needs to be set by the user
> > outside of this routine.
> 
> I have this on my short term to-do list.  I'm just not sure on the best 
> approach yet...

OK. I'll wait. I have my workaround right now.

> > 2. Some minor ib_mad.h commentary changes for more clarity on the
> > assumptions of the RMPP API.

Thanks.

-- Hal


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[openib-general] Re: RMPP

2005-05-04 Thread Sean Hefty
Hal Rosenstock wrote:
Hi Sean,
In addition to passing the hdr_len and data_len to ib_create_send_mad: 
rmpp_mad->rmpp_hdr.paylen_newwin = cpu_to_be32(hdr_len -
offsetof(struct ib_rmpp_mad, data) + data_len);
That's not the "real" payload length that would go into the packet; just
one segment's worth of class header. Correct ?
There are two sizes needed by the RMPP code.  First, it needs the total 
length of the data buffer, which includes any necessary padding.  This is 
set in the scatter-gather entry for the work request.  Second, it needs to 
know how much of the data buffer contains valid user data.

ib_create_send_mad returns the length of the of the valid user data in a 
slightly encoded format.  It subtracts the size of the RMPP and common MAD 
headers from the size (see below).  The intent is that the payload length 
value is set to the correct value for MADs that are a single segment in length.

-
MAD header
-
RMPP header
-
SA/Vendor hdr
-
User data
-
Pad
-
sge length = size from MAD header to end of pad.
payload = size from SA/Vendor hdr to end of user data.
If the above is correct, I'm not sure the actual payload lengths in the
DATA packets are correct. I also see the padding on the last segment of
a multipacket send not cleared (I integrated the part of your patch
relating to the pad calculation).
I will run some tests and verify the payload values using madeye.  I'm not 
sure why the padding isn't cleared.  It may be an indication that 
create_send_mad isn't allocating the correct pad size.

In one test case, I see the first RMPP DATA segment sent and there is no
response (ACK) from the receiver (this was due to a (receiver) test
program issue). The transmitter retry depends on retries in the send_wr
ud structure. Does it need to known/Is there a way to know that this
failed (no ACK, etc.) when retries are exhausted or is this reliant on
the receiver rerequesting or is the entire RMPP transaction treated like
a UD send (e.g. unreliable) ?
The RMPP send is treated as reliable, even if no response is expected.  If 
the send is not ACKed completely, the request will timeout, and a send 
failure is reported to the user.  If the send completes successfully, then 
the user knows that it was received by the remote side; although, there's no 
guarantee that it was processed by anyone.

The code will retry a given segment the number of times specified by the 
user.  If forward progress is made on the send, the retry count is reset.

Here's a summary of changes so far:
1. ib_create_send_mad either needs an additional parameter (RMPP active
in current send packet) or the paylen_newwin needs to be set by the user
outside of this routine.
I have this on my short term to-do list.  I'm just not sure on the best 
approach yet...

2. Some minor ib_mad.h commentary changes for more clarity on the
assumptions of the RMPP API.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general