Hello all.
To Jeff:
   I thought that if there are no replies it means OK.
Thank you for your comments, I fixed it, you can see the patch below.
Jeff Squyres wrote:
On Dec 15, 2009, at 8:56 PM, Jeff Squyres wrote:

Hmm.  I'm a little disappointed that this was applied without answering my 
questions first...

   http://www.open-mpi.org/community/lists/devel/2009/12/7187.php

WRONG.  You *did* answer -- somehow my mail client ate it (I see the reply in 
the web archives, but not in my local mail client -- #$@!$@!#$!!!!).

My bad...  :-(

Could you add some of your explanations as comments in the code?  The rationale 
here is that if I had those questions while reading your patch, someone else 
(including me, months from now) will likely have the same questions while 
reading the code.

Another minor quibble in a help message:

+[SRQ doesn't found]
+The srq doesn't found.
+Below is some information about the host that raised the error:
+
+    Local host:   %s
+    Local device: %s

It's not correct grammar and is fairly unhelpful to the user -- please change 
to:

[SRQ not found]
Open MPI tried to access a shared receive queue (SRQ) that was not found.  This 
should not happen, and is a fatal error.  Your MPI job will now abort.

    Local host:   %s
    Local device: %s

Also:

+  - When the number of not used receive buffers will decreased to 8
+    the IBV_EVENT_SRQ_LIMIT_REACHED event will be signaled and the number
+    of receive buffers that we can pre-post will be increased.

I don't think users know what IBV_EVENT_... is.  Perhaps it should read:

+  - When the number of unused shared receive buffers reaches 8, more
+    buffers will be posted.

(how many more buffers will be posted, BTW?)




Index: ompi/mca/btl/openib/help-mpi-btl-openib.txt
===================================================================
--- ompi/mca/btl/openib/help-mpi-btl-openib.txt (revision 22318)
+++ ompi/mca/btl/openib/help-mpi-btl-openib.txt (working copy)
@@ -168,9 +168,9 @@
 You may need to consult with your system administrator to get this
 problem fixed.
 #
-[SRQ doesn't found]
-The srq doesn't found.
-Below is some information about the host that raised the error:
+[SRQ not found]
+Open MPI tried to access a shared receive queue (SRQ) that was not found.
+This should not happen, and is a fatal error.  Your MPI job will now abort.

     Local host:   %s
     Local device: %s
@@ -411,9 +411,8 @@
   - A sender will not send to a peer unless it has less than 32
     outstanding sends to that peer.
   - 32 receive buffers will be preposted.
-  - When the number of not used receive buffers will decreased to 8
-    the IBV_EVENT_SRQ_LIMIT_REACHED event will be signaled and the number
-    of receive buffers that we can pre-post will be increased.
+  - When the number of unused shared receive buffers reaches 8, more
+    buffers (32 in this case) will be posted.

   Local host: %s
   Bad queue specification: %s
Index: ompi/mca/btl/openib/btl_openib.h
===================================================================
--- ompi/mca/btl/openib/btl_openib.h    (revision 22318)
+++ ompi/mca/btl/openib/btl_openib.h    (working copy)
@@ -381,6 +381,15 @@
     /** The flag points if we want to get the 
          IBV_EVENT_SRQ_LIMIT_REACHED events for dynamically resizing SRQ */
     bool srq_limit_event_flag;
+    /**< In difference of the "--mca enable_srq_resize" parameter that says, 
if we want(or no)
+         to start with small num of pre-posted receive buffers (rd_curr_num) 
and to increase this number by needs
+         (the max of this value is rd_num – the whole size of SRQ), the 
"srq_limit_event_flag" says if we want to get limit event
+         from device if the defined srq limit was reached (signal to the main 
thread) and we put off this flag if the rd_curr_num
+         was increased up to rd_num.
+         In order to prevent lock/unlock operation in the critical path we 
prefer only put-on
+         the srq_limit_event_flag in asynchronous thread, because in this way 
we post receive buffers
+         in the main thread only and only after posting we set (if 
srq_limit_event_flag is true)
+         the limit for IBV_EVENT_SRQ_LIMIT_REACHED event. */
 }; typedef struct mca_btl_openib_module_srq_qp_t 
mca_btl_openib_module_srq_qp_t;

 struct mca_btl_openib_module_qp_t {

Reply via email to