Re: Handling busy responses from the SA
Mike, On Wed, Jun 16, 2010 at 3:57 PM, Mike Heinz michael.he...@qlogic.com wrote: Hal, But if the original trap had retries 0, wouldn't resending the trap be what the issuer intended? I suppose as there's nothing in the IBA spec that precludes using busy on TrapRepresses although I'd be hard pressed to rationalize using that particularly for SMP traps. -- Hal I guess I'm confused why treating BUSY as similar to simply never getting a response at all is a bad thing. In my mind, receiving a BUSY response is like getting a busy signal when you call someone on the phone - a sign you need to wait a bit then try again. Similarly, if I call someone and never get an answer my strategy is going to be to wait, then try again. -Original Message- From: Hal Rosenstock [mailto:hal.rosenst...@gmail.com] Sent: Tuesday, June 08, 2010 8:16 PM To: Mike Heinz Cc: Hefty, Sean; linux-rdma@vger.kernel.org Subject: Re: Handling busy responses from the SA Mike, I'm referring to the receipt of the TrapRepress with busy status. Wouldn't your patch cause the original Trap to be resent when retries 0 ? TrapRepress is essentially a response to Trap and classified as such by ib_response_mad. Your proposed patch treats a busy as a timeout and can cause retry of the original sent Trap. -- Hal -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Handling busy responses from the SA
To be honest, we haven't been able to think of a case where a sender would use retries on a trap or a busy on a repress either, but I don't think it would hurt to omit represses from the busy handling either. Would that be acceptable to everyone? To alter the patch to allow BUSY trap repress MADs to pass through? -Original Message- From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Hal Rosenstock Sent: Thursday, June 17, 2010 9:30 AM To: Mike Heinz Cc: Hefty, Sean; linux-rdma@vger.kernel.org; Todd Rimmer Subject: Re: Handling busy responses from the SA Mike, On Wed, Jun 16, 2010 at 3:57 PM, Mike Heinz michael.he...@qlogic.com wrote: Hal, But if the original trap had retries 0, wouldn't resending the trap be what the issuer intended? I suppose as there's nothing in the IBA spec that precludes using busy on TrapRepresses although I'd be hard pressed to rationalize using that particularly for SMP traps. -- Hal I guess I'm confused why treating BUSY as similar to simply never getting a response at all is a bad thing. In my mind, receiving a BUSY response is like getting a busy signal when you call someone on the phone - a sign you need to wait a bit then try again. Similarly, if I call someone and never get an answer my strategy is going to be to wait, then try again. -Original Message- From: Hal Rosenstock [mailto:hal.rosenst...@gmail.com] Sent: Tuesday, June 08, 2010 8:16 PM To: Mike Heinz Cc: Hefty, Sean; linux-rdma@vger.kernel.org Subject: Re: Handling busy responses from the SA Mike, I'm referring to the receipt of the TrapRepress with busy status. Wouldn't your patch cause the original Trap to be resent when retries 0 ? TrapRepress is essentially a response to Trap and classified as such by ib_response_mad. Your proposed patch treats a busy as a timeout and can cause retry of the original sent Trap. -- Hal -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Handling busy responses from the SA
Hal, But if the original trap had retries 0, wouldn't resending the trap be what the issuer intended? I guess I'm confused why treating BUSY as similar to simply never getting a response at all is a bad thing. In my mind, receiving a BUSY response is like getting a busy signal when you call someone on the phone - a sign you need to wait a bit then try again. Similarly, if I call someone and never get an answer my strategy is going to be to wait, then try again. -Original Message- From: Hal Rosenstock [mailto:hal.rosenst...@gmail.com] Sent: Tuesday, June 08, 2010 8:16 PM To: Mike Heinz Cc: Hefty, Sean; linux-rdma@vger.kernel.org Subject: Re: Handling busy responses from the SA Mike, I'm referring to the receipt of the TrapRepress with busy status. Wouldn't your patch cause the original Trap to be resent when retries 0 ? TrapRepress is essentially a response to Trap and classified as such by ib_response_mad. Your proposed patch treats a busy as a timeout and can cause retry of the original sent Trap. -- Hal
Re: Handling busy responses from the SA
Mike, On Mon, Jun 7, 2010 at 12:00 PM, Mike Heinz michael.he...@qlogic.com wrote: Hal said: Should a busy be retried at all at the mad layer ? Is a special longer) timeout policy for busy needed ? Also, should this be done for all MADs classified by ib_response_mad (e.g. trap represses) ? Hal, The idea of processing BUSY responses in the MAD layer is to BUSY responses like timeouts - which are currently handled by the MAD layer. Right now there is an issue where various apps and ULPs either treat BUSY as a cause to immediately retry or as a permanent error. This doesn't seem to affect users of the OpenSM so much because (as I understand it) the OpenSM seems to discard requests when it gets too busy - but for other SA/SMs, it can cause a major packet storm or, worse, a simple loss of connectivity where MPI jobs or kernel ULPs simply assume the SA is broken because they got a BUSY reply. By treating the BUSY reply as a timeout, we're actually simplifying matters by fitting into existing practice. Understood. Timing these out makes sense to me but still does not preclude the client from potentially handling this if the retries fail. As for needing a longer timeout - in our old proprietary stack, QLogic did have a longer timeout for retrying busy replies than for normal timeouts How much longer ? What are the two timeouts used ? - but we should try to get this in now so we can get some relief before we begin the long term discussion of the best way to handle this issue overall. All I was getting at here was: does retrying when busy work ? If not, why retry at all at the MAD layer (regardless of retries requested) and perhaps use a longer timeout for this. If it does work, maybe the timeout on the subsequent retries should be extended. I think my two other comments on details are relevant to an updated patch. -- Hal -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Handling busy responses from the SA
As for needing a longer timeout - in our old proprietary stack, QLogic did have a longer timeout for retrying busy replies than for normal timeouts - but we should try to get this in now so we can get some relief before we begin the long term discussion of the best way to handle this issue overall. Because applications may handle BUSY replies differently, we shouldn't simply start hiding them from the user. I would much rather agree on the longer term plan, so that the ABI can reflect the proper semantics. I don't see any issue with changing the current behavior for kernel clients, however. - Sean
RE: Handling busy responses from the SA
Sean said, Because applications may handle BUSY replies differently, we shouldn't simply start hiding them from the user. Sean - remember that this patch will still return a BUSY status to the caller, if retries are exhausted and the last return code was BUSY, then that's what the caller will get. Thus, code which sets retries to zero will not be affected by this patch at all. Hal said, All I was getting at here was: does retrying when busy work ? If not, why retry at all at the MAD layer (regardless of retries requested) and perhaps use a longer timeout for this. If it does work, maybe the timeout on the subsequent retries should be extended. Personally, I think it's been extremely helpful - we've been using busy status to tell compute nodes to slow down since our old proprietary stack and we've seen a significant improvement in overall traffic congestion when we added this patch to OFED clusters using our SM. In addition use of the BUSY return code simplifies debugging traffic congestion problems (since it allows you to immediately differentiate between SA overload and other traffic issues) and it paves the way for more sophisticated back-off strategies in the future. As to that, and your question, our old stack used two different timeout values specified by the client. One value was for actual timeouts and one for busy responses. In the case of busy responses, we added a randomization factor to spread out the traffic. This issue with adapting that to the Linux-RDMA stack is that it's an API change. What I would suggest personally, is something like this: 1. Take either the timeout passed by the caller OR a predefined constant, whichever is larger. I would suggest setting the predefined constant to something moderate, say 2 seconds. 2. Add a randomization factor - say between -250 and +250 ms? 3. Update the packet timeout with this new value. N�r��yb�X��ǧv�^�){.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj��!�i
RE: Handling busy responses from the SA
Anyone know why my messages are being appended with interesting garbage? -Original Message- From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Mike Heinz Sent: Tuesday, June 08, 2010 11:49 AM To: Hal Rosenstock Cc: linux-rdma@vger.kernel.org Subject: RE: Handling busy responses from the SA N�r��y���b�X��ǧv�^�){.n�+{��ٚ�{ay�ʇڙ�,j ��f���h���z��w��� ���j:+v���w�j�m zZ+�ݢj��!�i
RE: Handling busy responses from the SA
Anyone know why my messages are being appended with interesting garbage? I get that too. I first noticed it a couple of weeks ago. It eventually went back to the normal 'To unsubscribe from this list' message.
RE: Handling busy responses from the SA
Sean - remember that this patch will still return a BUSY status to the caller, if retries are exhausted and the last return code was BUSY, then that's what the caller will get. Thus, code which sets retries to zero will not be affected by this patch at all. It looks like it only returns the BUSY response if that matches with the last retry, otherwise, the BUSY response is dropped. It also looks like it applies to all MADs, including vendor specific ones, and not just those from the SA. - Sean
RE: Handling busy responses from the SA
Right. Effectively this is similar to the I/O resolution timeout policy laid out in the spec. -Original Message- From: Hefty, Sean [mailto:sean.he...@intel.com] Sent: Tuesday, June 08, 2010 12:27 PM To: Mike Heinz; Hal Rosenstock Cc: linux-rdma@vger.kernel.org Subject: RE: Handling busy responses from the SA Sean - remember that this patch will still return a BUSY status to the caller, if retries are exhausted and the last return code was BUSY, then that's what the caller will get. Thus, code which sets retries to zero will not be affected by this patch at all. It looks like it only returns the BUSY response if that matches with the last retry, otherwise, the BUSY response is dropped. It also looks like it applies to all MADs, including vendor specific ones, and not just those from the SA. - Sean
RE: Handling busy responses from the SA
Sean - Is there case where we would ever want to treat BUSY responses differently from timeouts? -Original Message- From: Hefty, Sean [mailto:sean.he...@intel.com] Sent: Tuesday, June 08, 2010 12:27 PM To: Mike Heinz; Hal Rosenstock Cc: linux-rdma@vger.kernel.org Subject: RE: Handling busy responses from the SA Sean - remember that this patch will still return a BUSY status to the caller, if retries are exhausted and the last return code was BUSY, then that's what the caller will get. Thus, code which sets retries to zero will not be affected by this patch at all. It looks like it only returns the BUSY response if that matches with the last retry, otherwise, the BUSY response is dropped. It also looks like it applies to all MADs, including vendor specific ones, and not just those from the SA. - Sean
RE: Handling busy responses from the SA
Is there case where we would ever want to treat BUSY responses differently from timeouts? I doubt it for a single MAD, but I can't say what people may have implemented. The main difference I can think of is that a busy response requires a retry, whereas a timeout does not. This affects the retry policy when multiple MADs are outstanding. E.g. if there are 10 requests outstanding and the first times out, we may only resend the first request and increase the timeouts of the other 9. If the 10 requests all receive a busy, then they must all be retried. To me, it looks like it makes more sense to never send busy, except maybe when receive buffer space is full consumed, but implement a more intelligent timeout/retry mechanism on the sender side. The SA almost needs some sort of MRA like message. - Sean
Re: Handling busy responses from the SA
On Tue, Jun 8, 2010 at 12:27 PM, Hefty, Sean sean.he...@intel.com wrote: Sean - remember that this patch will still return a BUSY status to the caller, if retries are exhausted and the last return code was BUSY, then that's what the caller will get. Thus, code which sets retries to zero will not be affected by this patch at all. It looks like it only returns the BUSY response if that matches with the last retry, otherwise, the BUSY response is dropped. It also looks like it applies to all MADs, including vendor specific ones, and not just those from the SA. Per the proposed patch, it currently includes trap represses (as determined by ib_response_mad). Shouldn't busy be ignored for that case ? I don't think that would be used but it seems safer to me. -- Hal - Sean -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Handling busy responses from the SA
Hal, I may be confused - but I thought the spec said there was no valid response to a trap repress. I interpreted o14-3.a4: The SMA shall not send any message in response to a valid SubnTrapRepress() message to mean that the SMA isn't allowed to respond with a BUSY status for a trap repress. -Original Message- From: Hal Rosenstock [mailto:hal.rosenst...@gmail.com] Sent: Tuesday, June 08, 2010 3:09 PM To: Hefty, Sean Cc: Mike Heinz; linux-rdma@vger.kernel.org Subject: Re: Handling busy responses from the SA On Tue, Jun 8, 2010 at 12:27 PM, Hefty, Sean sean.he...@intel.com wrote: Sean - remember that this patch will still return a BUSY status to the caller, if retries are exhausted and the last return code was BUSY, then that's what the caller will get. Thus, code which sets retries to zero will not be affected by this patch at all. It looks like it only returns the BUSY response if that matches with the last retry, otherwise, the BUSY response is dropped. It also looks like it applies to all MADs, including vendor specific ones, and not just those from the SA. Per the proposed patch, it currently includes trap represses (as determined by ib_response_mad). Shouldn't busy be ignored for that case ? I don't think that would be used but it seems safer to me. -- Hal - Sean N�r��yb�X��ǧv�^�){.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj��!�i
Re: Handling busy responses from the SA
Is there case where we would ever want to treat BUSY responses differently from timeouts? If there isn't then it's silly for the SA to ever send a BUSY response. - R. -- Roland Dreier rola...@cisco.com || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Handling busy responses from the SA
Mike, On Tue, Jun 8, 2010 at 3:59 PM, Mike Heinz michael.he...@qlogic.com wrote: Hal, I may be confused - but I thought the spec said there was no valid response to a trap repress. I interpreted o14-3.a4: The SMA shall not send any message in response to a valid SubnTrapRepress() message to mean that the SMA isn't allowed to respond with a BUSY status for a trap repress. I'm referring to the receipt of the TrapRepress with busy status. Wouldn't your patch cause the original Trap to be resent when retries 0 ? TrapRepress is essentially a response to Trap and classified as such by ib_response_mad. Your proposed patch treats a busy as a timeout and can cause retry of the original sent Trap. -- Hal -Original Message- From: Hal Rosenstock [mailto:hal.rosenst...@gmail.com] Sent: Tuesday, June 08, 2010 3:09 PM To: Hefty, Sean Cc: Mike Heinz; linux-rdma@vger.kernel.org Subject: Re: Handling busy responses from the SA On Tue, Jun 8, 2010 at 12:27 PM, Hefty, Sean sean.he...@intel.com wrote: Sean - remember that this patch will still return a BUSY status to the caller, if retries are exhausted and the last return code was BUSY, then that's what the caller will get. Thus, code which sets retries to zero will not be affected by this patch at all. It looks like it only returns the BUSY response if that matches with the last retry, otherwise, the BUSY response is dropped. It also looks like it applies to all MADs, including vendor specific ones, and not just those from the SA. Per the proposed patch, it currently includes trap represses (as determined by ib_response_mad). Shouldn't busy be ignored for that case ? I don't think that would be used but it seems safer to me. -- Hal - Sean -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Handling busy responses from the SA
Hal said: Should a busy be retried at all at the mad layer ? Is a special longer) timeout policy for busy needed ? Also, should this be done for all MADs classified by ib_response_mad (e.g. trap represses) ? Hal, The idea of processing BUSY responses in the MAD layer is to BUSY responses like timeouts - which are currently handled by the MAD layer. Right now there is an issue where various apps and ULPs either treat BUSY as a cause to immediately retry or as a permanent error. This doesn't seem to affect users of the OpenSM so much because (as I understand it) the OpenSM seems to discard requests when it gets too busy - but for other SA/SMs, it can cause a major packet storm or, worse, a simple loss of connectivity where MPI jobs or kernel ULPs simply assume the SA is broken because they got a BUSY reply. By treating the BUSY reply as a timeout, we're actually simplifying matters by fitting into existing practice. As for needing a longer timeout - in our old proprietary stack, QLogic did have a longer timeout for retrying busy replies than for normal timeouts - but we should try to get this in now so we can get some relief before we begin the long term discussion of the best way to handle this issue overall.