I never got a response to our last discussions of this issue; to remind everyone, this is a patch for handling BUSY responses from the SA in the same manner time outs are handled. The purpose of this change is to prevent poorly written applications from overloading the SM, or failing when they receive a BUSY response.
To recap where we were, the problem is that no existing OFED module or user application currently handles a BUSY response from the SA at all. For the most part, they treat a return code of IB_MGMT_MAD_STATUS_BUSY as requiring an immediate resend of the query or, worse, as a fatal error. I have a proposed patch which causes the BUSY response to be handled within the infinband core modules in the same way timeouts are handled. (i.e., BUSY responses are treated as if no response was received). I have a version of this patch which meets the objections about handling TRAP REPRESS, but which does not affect any existing APIs and does not require changes to any existing kernel modules or applications. As far as I can tell, this patch has neither been accepted nor rejected at this point. In addition to the patch itself, there has been significant discussion about changing the APIs to allow the application to explicitly specify how to handle BUSY. Reviewing the code, I tried to get an idea of the impact that applications to set the number of busy retries and busy timeout values would have. At the kernel level, two data structures (ib_mad_send_wr and ib_mad_send_wr_private) and three key functions are affected. The functions are mad.c/ib_mad_complete_recv(), mad.c/ib_post_send_mad() and sa_query.c/send_mad(). At the very minimum, the latter two functions would need to be patched to copy the new fields while ib_mad_complete_recv() would be patched in a manner similar to my current patch, but using the busy retries and busy timeout values. Beyond that, setting the new parameters will require patching the agent portion of ib_mad, the cm, mlx4 mthca drivers and srp. Other modules will be indirectly affected, if we choose to expose the busy values through the sa query interface (ib_sa_path_rec_get(), etcetera). Beyond those, user_mad.c/ib_umad_read() and user_mad.c/ib_umad_write() would need to be altered in a similar manner to allow the new fields to be passed into user space, requiring changes to the ib_user_mad structure (which would change the ABI) and affects the ibsim, qlvnic tools, the opensm and, I think, adding new calls to libibmad. Obviously, once that it done it will trigger a cascade of changes to user space tools such as saquery, ibdiagnet and so on. Rather than chewing off all that at once, what I would suggest is that we simply add the new fields to struct ib_mad_send_wr and struct ib_mad_send_wr_private in kernel space, and have and have send_mad() and ib_post_send_mad() check the values in ib_mad_send_wr and, if they are zero, set the values in ib_mad_send_wr_private to match the existing retries and timeout_ms fields. This would allow existing code to work without modification while laying the ground work for the broader change. Once these changes were made, it would be possible to add support for explicitly setting BUSY behavior to the ulps on a case by case basis although it should be noted that adding the new fields to the user space interfaces will trigger another new ABI revision - so I would suggest leaving that change for a major update to OFED. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html