Re: [openib-general] design for communication established affiliated asynchronous event handling

2006-07-03 Thread Or Gerlitz
Sean Hefty wrote:
> Rimmer, Todd wrote:
>> The CM would open the CA, provide its async event callback routine and
>> perform a special register_cm() verbs call.  Of course most CM traffic
>> would occur on the GSI QP, so this open CA instance was only for this
>> purpose.  This special verb was only available in kernel space (avoiding
>> security issue of application stealing CM interface and because our CM
>> was in the kernel anyway).
> 
> Thanks for the info.  I'm considering this sort of approach.

OK, so you opt for a change that will have the whole solution running 
within the ibstack core (hw driver / core / cm) - the CM gets an async 
event which make it synthesize an RTU and act on it.

So we went down from CMA level handling to CM level handling and it
would work for both user and kernel consumers, this is in the price of 
having to change the verbs access layer for the CM to register on QP 
async events.

Again, also with this solution the ULP has to be aware for CQ 
completions related to a QP on which ESTABLISHED event was not yet 
delivered on the associated CMA ID.

Sound good, in fact our gen1 stack was using this solution as well, 
relying on the VAPI driver feature of delivering affiliated async events 
to all the kernel consumers (the async event ***handler*** was not 
associated with a specific QP)

Or.



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] design for communication established affiliated asynchronous event handling

2006-06-29 Thread Rimmer, Todd

> -Original Message-
> From: Michael S. Tsirkin
> Sent: Thursday, June 29, 2006 1:45 AM
>  
> Quoting r. Roland Dreier <[EMAIL PROTECTED]>:
> > Subject: Re: design for communication established affiliated
> asynchronous event handling
> >
> > >I suggest the following design: the CMA would replace the event
handler
> > >provided with the qp_init_attr struct with a callback of its own
and
> > >keep the original handler/context on a private structure.
> >
> > This is probably fine.  There is one further situation where the
> > connection needs to be established, beyond RTU and the communication
> > established async event.  Namely, if a receive completion is polled.
> > Since async events are, well, asynchronous, there's no guarantee
that
> > the communication established event will be reported any time
soon...
> 
> How about user taking this into account and not arming the CQ /
> not polling it until the established event?

If the ULP is properly designed, the asynchronous-ness of the event (or
RTU for that matter) should not be an issue.

Per the IBTA CM state machine, the passive side upon sending the REP
should move its endpoint (the QP and the ULPs state machine) state to
Ready to Receive.  QPs in RTR can have send WQEs posted to them, however
they will not be sent until the QP is moved to RTS.

This means the ULP while in RTR can perform its normal receive
completion handling and even build and post send requests in response to
such received messages.  Such sends will be queued until the QP later
moves to RTS.

Most ULPs have some sort of application level flow control.  This may be
simply RNR NAK or it could be a credit system (such as SRP) or an
additional application initialization protocol (such as SDP).  Hence the
active side will generally perform limited sends (typically one) to the
passive side until it gets a response from the passive side (which won't
happen until the QP is in RTS).  Hence for a good ULP protocol, there is
no risk of overflowing the send Q while waiting to move to RTS.

The only thing the passive side ULP should not do until in RTS is any
sort of "periodic status messages which don't require active side
acknowledgement".  Since the RTS state could be delayed, the ULP should
not risk overflowing its send Q with such messages.  Most of the
standard ULP protocols (SDP, etc) do not have such messages or they
require ULP level protocol negotiation before they are activated.

Hence if this is all properly handled, the passive side's RTU/Async
Event handling sequence will merely move the QP to RTS and notify the
ULP.  The ULP will likely do very limited work for this notification
(perhaps just a state transition) as all the real work should have been
done before sending the REP.

The movement to RTS will enable the QP to start processing its Send Q
and everything will be good.

Taking this approach keeps the CM/CMA and ULP simpler in design and
merely allows the RTS/RTU/Async Event handling to be another event in a
state machine.

Todd Rimmer

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] design for communication established affiliated asynchronous event handling

2006-06-29 Thread Rimmer, Todd
> -Original Message-
> From: openib Sean Hefty
> Sent: Wednesday, June 28, 2006 7:24 PM
> 
> Roland Dreier wrote:
> >>I suggest the following design: the CMA would replace the event
handler
> >>provided with the qp_init_attr struct with a callback of its own and
> >>keep the original handler/context on a private structure.
> 
> I should also point out that the proposed design will not work for
> userspace.
> I'm hesitant to make this change until a solution for userspace can
also
> be
> found, in the hope that a common fix can be shared.
> 
> - Sean

The approach we took in our proprietary stack was to provide a verbs
driver interface for the CM to register itself with the verbs driver. 

The CM would open the CA, provide its async event callback routine and
perform a special register_cm() verbs call.  Of course most CM traffic
would occur on the GSI QP, so this open CA instance was only for this
purpose.  This special verb was only available in kernel space (avoiding
security issue of application stealing CM interface and because our CM
was in the kernel anyway).

When the CA got an Async Event for a Communication Established event, it
would deliver it to both the CM (regardless of which QP it was for) and
to the open instance owning the QP.  All other async events were only
delivered to the appropriate open instance.

This put the handling in the kernel and at a low level where it would
not impact handling of other async events and avoided complications of
user vs kernel async event filters.

Depending on the design of APM, the CM might also be interested in APM
related Async Events (in our design the application had an opportunity
to select a new alternate path, so it was more appropriate to let the
ULP handle these events directly).

Todd Rimmer

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] design for communication established affiliated asynchronous event handling

2006-06-29 Thread Sean Hefty
Rimmer, Todd wrote:
> The CM would open the CA, provide its async event callback routine and
> perform a special register_cm() verbs call.  Of course most CM traffic
> would occur on the GSI QP, so this open CA instance was only for this
> purpose.  This special verb was only available in kernel space (avoiding
> security issue of application stealing CM interface and because our CM
> was in the kernel anyway).

Thanks for the info.  I'm considering this sort of approach.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] design for communication established affiliated asynchronous event handling

2006-06-28 Thread Sean Hefty
>How about user taking this into account and not arming the CQ /
>not polling it until the established event?

The CQ could be in use by other QPs.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] design for communication established affiliated asynchronous event handling

2006-06-28 Thread Michael S. Tsirkin
Quoting r. Roland Dreier <[EMAIL PROTECTED]>:
> Subject: Re: design for communication established affiliated asynchronous 
> event handling
> 
> >I suggest the following design: the CMA would replace the event handler
> >provided with the qp_init_attr struct with a callback of its own and
> >keep the original handler/context on a private structure.
> 
> This is probably fine.  There is one further situation where the
> connection needs to be established, beyond RTU and the communication
> established async event.  Namely, if a receive completion is polled.
> Since async events are, well, asynchronous, there's no guarantee that
> the communication established event will be reported any time soon...

How about user taking this into account and not arming the CQ /
not polling it until the established event?

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] design for communication established affiliated asynchronous event handling

2006-06-28 Thread Sean Hefty
Roland Dreier wrote:
>>I suggest the following design: the CMA would replace the event handler
>>provided with the qp_init_attr struct with a callback of its own and
>>keep the original handler/context on a private structure.
> 
> 
> This is probably fine.  There is one further situation where the
> connection needs to be established, beyond RTU and the communication
> established async event.  Namely, if a receive completion is polled.
> Since async events are, well, asynchronous, there's no guarantee that
> the communication established event will be reported any time soon...

This brings up a good point.  Even if a user gets a communication established 
event, the IB CM could have already timed out and failed the connection.  I 
don't think that we can do anything about this.

I should also point out that the proposed design will not work for userspace. 
I'm hesitant to make this change until a solution for userspace can also be 
found, in the hope that a common fix can be shared.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] design for communication established affiliated asynchronous event handling

2006-06-16 Thread Sean Hefty
Rimmer, Todd wrote:
> CM - have a hook so the CM can get the Async Events for all CAs.  On
> getting the Async Event for packet first packet received while in RTR
> (Communication established), the CM should treat this exactly like an
> RTU (with no private data).  The CM will need to cross reference the
> CA/QP this event was reported for to identify the applicable connection
> endpoint.  If you check the IBTA spec and the CM state machines you will
> see the CM is supposed to handle this event.  Also if the RTU does
> arrive later, the CM state machine also handles that correctly by
> discarding the RTU as if it was a duplicate.  Note: this is why
> applications should not depend on private data in the RTU.

The IB CM has this capability, and behaves as indicated.  The missing piece is 
for the RDMA CM to handle this situation.  I believe that Or's approach of 
replacing the user's QP handler with the CMA's will fix this.

> has completed its processing.  In general IB allows for this situation
> quite nicely.  The ULP can process the inbound data normally and queue
> it to the Send Q.  Putting data on a Send Q is permitted in RTR, but the

This is a good point, which indicates to me that nothing more is needed than 
handling the communication established event by the RDMA CM.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] design for communication established affiliated asynchronous event handling

2006-06-16 Thread Rimmer, Todd
> -Original Message-
> From: Or Gerlitz; openib-general
> > In most cases, I would expect that the IB CM will eventually receive
the
> RTU,
> > which will generate an event to the RDMA CM to transition the QP
into
> RTS.
> 
> But we want an IB stack and set of ULPs which would work in production
so
> they
> need to handle also irregular cases... eg when the RTU is lost over
and
> over.

Agreed.  The missing RTU case must be handled for a few reasons:
1. The RTU could honestly be lost (GSI QPs are UD, they could overflow,
fabric could loose the packet, etc)
2. The RC send could beat the processing of the RTU (packets on wire may
be out of order if there are different SLs/VLs involved with GSI vs
application QP).  Also its possible the CM is slower getting to its
queue of packets (such as when bombarded by many connections) while
application/ULP gets its RC send quickly. [I have observed this
situation in various real world stress tests].

This problem is quite simple to handle (I did it a few years ago in the
SilverStorm stack) and the IB spec completely covers this issue:

CM - have a hook so the CM can get the Async Events for all CAs.  On
getting the Async Event for packet first packet received while in RTR
(Communication established), the CM should treat this exactly like an
RTU (with no private data).  The CM will need to cross reference the
CA/QP this event was reported for to identify the applicable connection
endpoint.  If you check the IBTA spec and the CM state machines you will
see the CM is supposed to handle this event.  Also if the RTU does
arrive later, the CM state machine also handles that correctly by
discarding the RTU as if it was a duplicate.  Note: this is why
applications should not depend on private data in the RTU.

ULPs - all ULPs should be written so they are fully ready to process
inbound data before they tell the CM to send the REP.  It is very likely
the ULP will get a CQ completion for the inbound RQ data before the CM
has completed its processing.  In general IB allows for this situation
quite nicely.  The ULP can process the inbound data normally and queue
it to the Send Q.  Putting data on a Send Q is permitted in RTR, but the
QP will not initiate sending until moved to RTS.  As such the ULP can
allow the Cm RTU processing (which will race with the RQ data
completion) do its normal thing and move the QP to RTS.

Todd Rimmer

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] design for communication established affiliated asynchronous event handling

2006-06-16 Thread Sean Hefty
Hal Rosenstock wrote:
>>This moves the QP state to RTS, as opposed to the CEP state to connected.  So 
>>I 
>>don't believe that it violates the spec.
> 
> 
> Isn't the CEP the QP (see p. 689 line 7) ? 

Hmm... I was viewing the CEP as moving through the states described in 12.9.5 
and 12.9.6.  (Idle, REQ sent, REP wait, etc.)  I see what you're saying now.

> It sounds like I may have been looking at the wrong state but
> nonetheless the CEP/QP states are defined there and this would be
> different from what is in the spec. I wasn't saying it couldn't be made
> to work though. I haven't looked at it enough to know. If it does work,
> maybe the spec should get updated to cover this option too.

What I'd like to find is a way that a user, upon receiving a message, can send 
a 
response.  Today, a user cannot send the response until after they get a 
connection established event from the IB CM, and then RDMA CM.  So, it sounds 
like even the RDMA CM needs some sort of rdma_establish() call to finish 
connecting a QP.

I don't think that iWarp would run into this issue.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] design for communication established affiliated asynchronous event handling

2006-06-16 Thread Hal Rosenstock
On Fri, 2006-06-16 at 12:31, Sean Hefty wrote:
> Hal Rosenstock wrote:
> > IMO, it would violate the CM state machine and the passive CM transition
> > specification in 12.9.7.2 and have the effect of circumventing the
> > retransmission of REP on lost RTU. Data can't fly until either the RTU
> > or the first data message is received from the other direction.
> 
> This moves the QP state to RTS, as opposed to the CEP state to connected.  So 
> I 
> don't believe that it violates the spec.

Isn't the CEP the QP (see p. 689 line 7) ? 

> A drawback to moving the QP to RTS is that the communication established 
> event 
> will not be generated.  This forces us to wait for the RTU to move the CEP to 
> connected, or we need to do it upon receiving the first completion.

> The RDMA CM has no knowledge when the latter occurs, so would need user input.

It sounds like I may have been looking at the wrong state but
nonetheless the CEP/QP states are defined there and this would be
different from what is in the spec. I wasn't saying it couldn't be made
to work though. I haven't looked at it enough to know. If it does work,
maybe the spec should get updated to cover this option too.

-- Hal

> - Sean


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] design for communication established affiliated asynchronous event handling

2006-06-16 Thread Roland Dreier
>I suggest the following design: the CMA would replace the event handler
>provided with the qp_init_attr struct with a callback of its own and
>keep the original handler/context on a private structure.

This is probably fine.  There is one further situation where the
connection needs to be established, beyond RTU and the communication
established async event.  Namely, if a receive completion is polled.
Since async events are, well, asynchronous, there's no guarantee that
the communication established event will be reported any time soon...

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] design for communication established affiliated asynchronous event handling

2006-06-16 Thread Sean Hefty
Hal Rosenstock wrote:
> IMO, it would violate the CM state machine and the passive CM transition
> specification in 12.9.7.2 and have the effect of circumventing the
> retransmission of REP on lost RTU. Data can't fly until either the RTU
> or the first data message is received from the other direction.

This moves the QP state to RTS, as opposed to the CEP state to connected.  So I 
don't believe that it violates the spec.

A drawback to moving the QP to RTS is that the communication established event 
will not be generated.  This forces us to wait for the RTU to move the CEP to 
connected, or we need to do it upon receiving the first completion.

The RDMA CM has no knowledge when the latter occurs, so would need user input.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] design for communication established affiliated asynchronous event handling

2006-06-16 Thread Sean Hefty
Or Gerlitz wrote:
> This is what i was suspecting, Sean can you confirm that? if it does
> not emulate RTU
> reception, than what it does do?

Both receiving an RTU and getting a connection established event move the 
connection into the established state.  They generate different events to the 
user of the IB CM because RTUs carry private data.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] design for communication established affiliated asynchronous event handling

2006-06-16 Thread Hal Rosenstock
On Fri, 2006-06-16 at 11:15, James Lentini wrote:

[snip...]

> > As an alternative, I don't think that there's any reason why the QP 
> > can't be transition to RTS when the CM REP is sent.  
> 
> I like this idea. It simplifies how ULPs handle this issue. Are there 
> any spec. compliance issues with this?

IMO, it would violate the CM state machine and the passive CM transition
specification in 12.9.7.2 and have the effect of circumventing the
retransmission of REP on lost RTU. Data can't fly until either the RTU
or the first data message is received from the other direction.

-- Hal


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] design for communication established affiliated asynchronous event handling

2006-06-16 Thread Sean Hefty
James Lentini wrote:
>>As an alternative, I don't think that there's any reason why the QP 
>>can't be transition to RTS when the CM REP is sent.  
> 
> I like this idea. It simplifies how ULPs handle this issue. Are there 
> any spec. compliance issues with this?

There's no spec compliance issues that I can readily find.  I will make a note 
to fix this, as well as handle the connection established event as Or 
suggested, 
but it will be a couple of weeks before I get to this.  (I will be attending 
the 
workshop next week.)

> If the passive side CM doesn't receive an RTU, the passive side CM 
> should retransmit the REP. At least that is how I read 12.9.8.6 
> "Timeouts and Retries" in the IBTA spec. I can't find where this 
> happens in the code. Did I miss it?

The MAD layer retries the CM messages, typically until the CM cancels the 
operation.

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] design for communication established affiliated asynchronous event handling

2006-06-16 Thread James Lentini


On Fri, 16 Jun 2006, Or Gerlitz wrote:

> On 6/15/06, James Lentini <[EMAIL PROTECTED]> wrote:
> > ib_cm_establish() doesn't emulate an RTU reception. It generates an
> > IB_CM_USER_ESTABLISHED event (not an IB_CM_RTU_RECEIVED event). The
> > CMA's cma_ib_handler() doesn't recognize a IB_CM_USER_ESTABLISHED
> > event. The QP's state will not be moved to RTS.
> 
> This is what i was suspecting, Sean can you confirm that? if it does
> not emulate RTU
> reception, than what it does do?
> 
> > Consumers don't actually have to queue the completions, they have to
> > defer posting sends (either in response to the recvs or otherwise)
> > until the QP moves to RTS. Could the implementations queue up the
> > requests for the consumers?
> 
> nope the CM/CMA are not in charge of the consumer CQ, so there is no 
> way for them to queue those completions and anyway, i think its 

I was refering to requests, not completions. In any event, I like 
Sean's idea of moving the QP to RTS when a REP is sent better.

> wrong for lower layer to queue completions, this "race" exists by 
> IB's nature (since the RTU goes to QP1 and the data to the user's QP 
> and the two QPs are totally unrelated) so if you want to have 
> production with IB you need to handle this case in your code, as 
> others do.

Agreed.

> > Strictly speaking, IB requires an error to be generated (C10-29 in 
> > the IBTA spec. vol 1, page 456). Still, it would be nice if 
> > consumers didn't have to be worry about this issue.
> 
> What do you mean by error, this async event happens all the time, 
> you can't error the establishment just b/c it happend. I don't have 
> access now to the spec, so i can't say what i understand from the 
> section you have pointed to.

Again, I was refering to requests, not completions. 

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] design for communication established affiliated asynchronous event handling

2006-06-16 Thread James Lentini


On Thu, 15 Jun 2006, Sean Hefty wrote:

> >The cma/verbs consumer can't just ignore the event since its qp state is
> >still RTR which means an attempt to tx replying the rx would fail.
> 
> In most cases, I would expect that the IB CM will eventually receive the RTU,
> which will generate an event to the RDMA CM to transition the QP into RTS.  
> This
> is why I think that the event can safely be ignored.  It does however mean 
> that
> a user cannot send on the QP until the user sees RDMA_CM_EVENT_ESTABLISHED.
> 
> >I suggest the following design: the CMA would replace the event handler
> >provided with the qp_init_attr struct with a callback of its own and
> >keep the original handler/context on a private structure.
> 
> This sounds like it would work.  I don't think that there are any events where
> the additional delay would matter.
> 
> As an alternative, I don't think that there's any reason why the QP 
> can't be transition to RTS when the CM REP is sent.  

I like this idea. It simplifies how ULPs handle this issue. Are there 
any spec. compliance issues with this?

> A user just can't post to the send queue until either an 
> RDMA_CM_EVENT_ESTABLISHED, IB_EVENT_COMM_EST, or a completion occurs 
> on the QP.  (This doesn't change the fact that the IB CM still needs 
> to know that the connection has been established, or it risks 
> putting the connection into an error state if an RTU is never 
> received.)

If the passive side CM doesn't receive an RTU, the passive side CM 
should retransmit the REP. At least that is how I read 12.9.8.6 
"Timeouts and Retries" in the IBTA spec. I can't find where this 
happens in the code. Did I miss it?

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] design for communication established affiliated asynchronous event handling

2006-06-16 Thread Or Gerlitz
On 6/16/06, Sean Hefty <[EMAIL PROTECTED]> wrote:
>>The cma/verbs consumer can't just ignore the event since its qp state is
>>still RTR which means an attempt to tx replying the rx would fail.

> In most cases, I would expect that the IB CM will eventually receive the RTU,
> which will generate an event to the RDMA CM to transition the QP into RTS.

But we want an IB stack and set of ULPs which would work in production so they
need to handle also irregular cases... eg when the RTU is lost over and over.

Or

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] design for communication established affiliated asynchronous event handling

2006-06-16 Thread Or Gerlitz
On 6/15/06, James Lentini <[EMAIL PROTECTED]> wrote:
> ib_cm_establish() doesn't emulate an RTU reception. It generates an
> IB_CM_USER_ESTABLISHED event (not an IB_CM_RTU_RECEIVED event). The
> CMA's cma_ib_handler() doesn't recognize a IB_CM_USER_ESTABLISHED
> event. The QP's state will not be moved to RTS.

This is what i was suspecting, Sean can you confirm that? if it does
not emulate RTU
reception, than what it does do?

> Consumers don't actually have to queue the completions, they have to
> defer posting sends (either in response to the recvs or otherwise)
> until the QP moves to RTS. Could the implementations queue up the
> requests for the consumers?

nope the CM/CMA are not in charge of the consumer CQ, so there is no way for
them to queue those completions and anyway, i think its wrong for lower layer to
queue completions, this "race" exists by IB's nature (since the RTU
goes to QP1 and
the data to the user's QP and the  two QPs are totally unrelated) so
if you want to
have production with IB you need to handle this case in your code, as others do.

> Strictly speaking, IB requires an error to be generated (C10-29 in the
> IBTA spec. vol 1, page 456). Still, it would be nice if consumers
> didn't have to be worry about this issue.

What do you mean by error, this async event happens all the time, you
can't error
the establishment just b/c it happend. I don't have access now to the
spec, so i can't
say what i understand from the section you have pointed to.

Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] design for communication established affiliated asynchronous event handling

2006-06-15 Thread Sean Hefty
>The cma/verbs consumer can't just ignore the event since its qp state is
>still RTR which means an attempt to tx replying the rx would fail.

In most cases, I would expect that the IB CM will eventually receive the RTU,
which will generate an event to the RDMA CM to transition the QP into RTS.  This
is why I think that the event can safely be ignored.  It does however mean that
a user cannot send on the QP until the user sees RDMA_CM_EVENT_ESTABLISHED.

>I suggest the following design: the CMA would replace the event handler
>provided with the qp_init_attr struct with a callback of its own and
>keep the original handler/context on a private structure.

This sounds like it would work.  I don't think that there are any events where
the additional delay would matter.

As an alternative, I don't think that there's any reason why the QP can't be
transition to RTS when the CM REP is sent.  A user just can't post to the send
queue until either an RDMA_CM_EVENT_ESTABLISHED, IB_EVENT_COMM_EST, or a
completion occurs on the QP.  (This doesn't change the fact that the IB CM still
needs to know that the connection has been established, or it risks putting the
connection into an error state if an RTU is never received.)

- Sean

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] design for communication established affiliated asynchronous event handling

2006-06-15 Thread James Lentini


On Thu, 15 Jun 2006, Or Gerlitz wrote:

> Sean Hefty wrote:
> > James Lentini wrote:
> >> The IBTA spec (volume 1, version 1.2) describes a communication 
> >> established affiliated asynchronous event.
> >> We've seen this event delivered to our NFS-RDMA server and aren't sure 
> >> what to do with it.
> 
> > This event is delivered to the verbs consumer, since it occurs on 
> > the QP.  It's expected that the consumer will call 
> > ib_cm_establish.  Although, I would guess that you can probably 
> > ignore the event, under the assumption that the RTU will 
> > eventually be received by the local CM.
> 
> Sean,
> 
> The cma/verbs consumer can't just ignore the event since its qp 
> state is still RTR which means an attempt to tx replying the rx 
> would fail.

Good point. 

> On the other hand it can't call ib_cm_establish since the CMA does 
> not expose an API for that, 

This is a problem.

> nor the CM can register a cb to get this event and emulate an RTU 
> reception since the CMA is the one to create the QP and the CMA 
> consumer providing the qp_init_attr along with event handler...
> 
> I suggest the following design: the CMA would replace the event 
> handler provided with the qp_init_attr struct with a callback of its 
> own and keep the original handler/context on a private structure.
> 
> On the delivery of IB_EVENT_COMM_EST event, the CMA would call down 
> the CM to emulate RTU reception (ib_cm_establish) and then call up 

ib_cm_establish() doesn't emulate an RTU reception. It generates an 
IB_CM_USER_ESTABLISHED event (not an IB_CM_RTU_RECEIVED event). The 
CMA's cma_ib_handler() doesn't recognize a IB_CM_USER_ESTABLISHED 
event. The QP's state will not be moved to RTS.

> the consumer original handler, typical CMA consumers would just 
> ignore this event, i think.
> 
> The CM should be able to allow ib_cm_established to be called in the 
> context over which the event handler is called (or jump the 
> treatment to higher context). The CM must also ignore the actual RTU 
> if it arrives later/in parallel to when ib_cm_establish was called.
> 
> By this design the verbs consumer is guaranteed to always get 
> RDMA_CM_EVENT_ESTABLISHED no matter if the RTU is just late or never 
> arrives 

The CMA's cma_ib_handler() needs to be modified for this to be true.

> but it still can get a CQ RX completion(s) before getting the CMA 
> established event; in that case it can queue these completion 
> elements for the short time window before the established event 
> arrives and then process them.

Consumers don't actually have to queue the completions, they have to 
defer posting sends (either in response to the recvs or otherwise) 
until the QP moves to RTS. Could the implementations queue up the 
requests for the consumers?

Strictly speaking, IB requires an error to be generated (C10-29 in the 
IBTA spec. vol 1, page 456). Still, it would be nice if consumers 
didn't have to be worry about this issue.

> A design similar to that was implemented at the Voltaire gen1 stack 
> and it works in production with iSER target and VIBNAL (CFS Lustre 
> NAL for voltaire gen1 ib) server side.
> 
> Does anyone know on what context (hard_irq, soft_irq, thread) are 
> the event handlers being called?
> 
> Or.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] design for communication established affiliated asynchronous event handling

2006-06-15 Thread Or Gerlitz
Or Gerlitz wrote:
> I suggest the following design: the CMA would replace the event handler 
> provided with the qp_init_attr struct with a callback of its own and 
> keep the original handler/context on a private structure.
> 
> On the delivery of IB_EVENT_COMM_EST event, the CMA would call down the 
> CM to emulate RTU reception (ib_cm_establish) and then call up the 
> consumer original handler, typical CMA consumers would just ignore this 
> event, i think.

and on other qp affiliated events the CMA would just call up the 
consumer callback. This proxy-ing of qp events can help us down the road 
to add support for path migration in the CMA.

Or.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] design for communication established affiliated asynchronous event handling

2006-06-15 Thread Or Gerlitz
Sean Hefty wrote:
> James Lentini wrote:
>> The IBTA spec (volume 1, version 1.2) describes a communication 
>> established affiliated asynchronous event.
>> We've seen this event delivered to our NFS-RDMA server and aren't sure 
>> what to do with it.

> This event is delivered to the verbs consumer, since it occurs on the QP.  
> It's 
> expected that the consumer will call ib_cm_establish.  Although, I would 
> guess 
> that you can probably ignore the event, under the assumption that the RTU 
> will 
> eventually be received by the local CM.

Sean,

The cma/verbs consumer can't just ignore the event since its qp state is 
still RTR which means an attempt to tx replying the rx would fail.

On the other hand it can't call ib_cm_establish since the CMA does not 
expose an API for that, nor the CM can register a cb to get this event 
and emulate an RTU reception since the CMA is the one to create the QP 
and the CMA consumer providing the qp_init_attr along with event handler...

I suggest the following design: the CMA would replace the event handler 
provided with the qp_init_attr struct with a callback of its own and 
keep the original handler/context on a private structure.

On the delivery of IB_EVENT_COMM_EST event, the CMA would call down the 
CM to emulate RTU reception (ib_cm_establish) and then call up the 
consumer original handler, typical CMA consumers would just ignore this 
event, i think.

The CM should be able to allow ib_cm_established to be called in the 
context over which the event handler is called (or jump the treatment to 
higher context). The CM must also ignore the actual RTU if it arrives 
later/in parallel to when ib_cm_establish was called.

By this design the verbs consumer is guaranteed to always get 
RDMA_CM_EVENT_ESTABLISHED no matter if the RTU is just late or never 
arrives but it still can get a CQ RX completion(s) before getting the 
CMA established event; in that case it can queue these completion 
elements for the short time window before the established event arrives 
and then process them.

A design similar to that was implemented at the Voltaire gen1 stack and 
it works in production with iSER target and VIBNAL (CFS Lustre NAL for 
voltaire gen1 ib) server side.

Does anyone know on what context (hard_irq, soft_irq, thread) are the 
event handlers being called?

Or.








___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general