Hello all.
To Jeff:
I thought that if there are no replies it means OK.
Thank you for your comments, I fixed it, you can see the patch below.
Jeff Squyres wrote:
On Dec 15, 2009, at 8:56 PM, Jeff Squyres wrote:
Hmm. I'm a little disappointed that this was applied without answering my
questions first...
http://www.open-mpi.org/community/lists/devel/2009/12/7187.php
WRONG. You *did* answer -- somehow my mail client ate it (I see the reply in
the web archives, but not in my local mail client -- #$@!$@!#$!!!!).
My bad... :-(
Could you add some of your explanations as comments in the code? The rationale
here is that if I had those questions while reading your patch, someone else
(including me, months from now) will likely have the same questions while
reading the code.
Another minor quibble in a help message:
+[SRQ doesn't found]
+The srq doesn't found.
+Below is some information about the host that raised the error:
+
+ Local host: %s
+ Local device: %s
It's not correct grammar and is fairly unhelpful to the user -- please change
to:
[SRQ not found]
Open MPI tried to access a shared receive queue (SRQ) that was not found. This
should not happen, and is a fatal error. Your MPI job will now abort.
Local host: %s
Local device: %s
Also:
+ - When the number of not used receive buffers will decreased to 8
+ the IBV_EVENT_SRQ_LIMIT_REACHED event will be signaled and the number
+ of receive buffers that we can pre-post will be increased.
I don't think users know what IBV_EVENT_... is. Perhaps it should read:
+ - When the number of unused shared receive buffers reaches 8, more
+ buffers will be posted.
(how many more buffers will be posted, BTW?)
Index: ompi/mca/btl/openib/help-mpi-btl-openib.txt
===================================================================
--- ompi/mca/btl/openib/help-mpi-btl-openib.txt (revision 22318)
+++ ompi/mca/btl/openib/help-mpi-btl-openib.txt (working copy)
@@ -168,9 +168,9 @@
You may need to consult with your system administrator to get this
problem fixed.
#
-[SRQ doesn't found]
-The srq doesn't found.
-Below is some information about the host that raised the error:
+[SRQ not found]
+Open MPI tried to access a shared receive queue (SRQ) that was not found.
+This should not happen, and is a fatal error. Your MPI job will now abort.
Local host: %s
Local device: %s
@@ -411,9 +411,8 @@
- A sender will not send to a peer unless it has less than 32
outstanding sends to that peer.
- 32 receive buffers will be preposted.
- - When the number of not used receive buffers will decreased to 8
- the IBV_EVENT_SRQ_LIMIT_REACHED event will be signaled and the number
- of receive buffers that we can pre-post will be increased.
+ - When the number of unused shared receive buffers reaches 8, more
+ buffers (32 in this case) will be posted.
Local host: %s
Bad queue specification: %s
Index: ompi/mca/btl/openib/btl_openib.h
===================================================================
--- ompi/mca/btl/openib/btl_openib.h (revision 22318)
+++ ompi/mca/btl/openib/btl_openib.h (working copy)
@@ -381,6 +381,15 @@
/** The flag points if we want to get the
IBV_EVENT_SRQ_LIMIT_REACHED events for dynamically resizing SRQ */
bool srq_limit_event_flag;
+ /**< In difference of the "--mca enable_srq_resize" parameter that says,
if we want(or no)
+ to start with small num of pre-posted receive buffers (rd_curr_num)
and to increase this number by needs
+ (the max of this value is rd_num the whole size of SRQ), the
"srq_limit_event_flag" says if we want to get limit event
+ from device if the defined srq limit was reached (signal to the main
thread) and we put off this flag if the rd_curr_num
+ was increased up to rd_num.
+ In order to prevent lock/unlock operation in the critical path we
prefer only put-on
+ the srq_limit_event_flag in asynchronous thread, because in this way
we post receive buffers
+ in the main thread only and only after posting we set (if
srq_limit_event_flag is true)
+ the limit for IBV_EVENT_SRQ_LIMIT_REACHED event. */
}; typedef struct mca_btl_openib_module_srq_qp_t
mca_btl_openib_module_srq_qp_t;
struct mca_btl_openib_module_qp_t {