I never got a response to our last discussions of this issue; to remind 
everyone, this is a patch for handling BUSY responses from the SA in the same 
manner time outs are handled. The purpose of this change is to prevent poorly 
written applications from overloading the SM, or failing when they receive a 
BUSY response.

To recap where we were, the problem is that no existing OFED module or user 
application currently handles a BUSY response from the SA at all. For the most 
part, they treat a return code of IB_MGMT_MAD_STATUS_BUSY as requiring an 
immediate resend of the query or, worse, as a fatal error.

I have a proposed patch which causes the BUSY response to be handled within the 
infinband core modules in the same way timeouts are handled. (i.e., BUSY 
responses are treated as if no response was received). I have a version of this 
patch which meets the objections about handling TRAP REPRESS, but which does 
not affect any existing APIs and does not require changes to any existing 
kernel modules or applications. As far as I can tell, this patch has neither 
been accepted nor rejected at this point.

In addition to the patch itself, there has been significant discussion about 
changing the APIs to allow the application to explicitly specify how to handle 
BUSY. 

Reviewing the code, I tried to get an idea of the impact that applications to 
set the number of busy retries and busy timeout values would have.

At the kernel level, two data structures (ib_mad_send_wr and 
ib_mad_send_wr_private) and three key functions are affected. The functions are 
mad.c/ib_mad_complete_recv(), mad.c/ib_post_send_mad() and 
sa_query.c/send_mad(). At the very minimum, the latter two functions would need 
to be patched to copy the new fields while ib_mad_complete_recv() would be 
patched in a manner similar to my current patch, but using the busy retries and 
busy timeout values.

Beyond that, setting the new parameters will require patching the agent portion 
of ib_mad, the cm, mlx4 mthca drivers and srp. Other modules will be indirectly 
affected, if we choose to expose the busy values through the sa query interface 
(ib_sa_path_rec_get(), etcetera).
 
Beyond those, user_mad.c/ib_umad_read() and user_mad.c/ib_umad_write() would 
need to be altered in a similar manner to allow the new fields to be passed 
into user space, requiring changes to the ib_user_mad structure (which would 
change the ABI) and affects the ibsim, qlvnic tools, the opensm and, I think, 
adding new calls to libibmad.


Obviously, once that it done it will trigger a cascade of changes to user space 
tools such as saquery, ibdiagnet and so on.

Rather than chewing off all that at once, what I would suggest is that we 
simply add the new fields to struct ib_mad_send_wr and struct 
ib_mad_send_wr_private in kernel space, and have and have send_mad() and 
ib_post_send_mad() check the values in ib_mad_send_wr and, if they are zero, 
set the values in ib_mad_send_wr_private to match the existing retries and 
timeout_ms fields. This would allow existing code to work without modification 
while laying the ground work for the broader change. 

Once these changes were made, it would be possible to add support for 
explicitly setting BUSY behavior to the ulps on a case by case basis although 
it should be noted that adding the new fields to the user space interfaces will 
trigger another new ABI revision - so I would suggest leaving that change for a 
major update to OFED.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to