Re: [OMPI devel] trac #2034 : single rail openib btl shows better bandwidth than dual rail (12k< x < 128k)

2009-10-07 Thread Don Kerr

George,

Were you suggesting that the proposed new parameter 
"max_rdma_single_rget" be set by the individual btls similar to 
"btl_eager_limit"?  Seems to me to that is the better approach if I am 
to move forward with this.


-DON

On 10/06/09 11:14, Don Kerr wrote:
I agree there is probably a larger issue here and yes this is somewhat 
specific but where as OB1 appears to have multiple protocols depending 
on the capabilities of the BTLs I would not characterize as an IB 
centric problem. Maybe OB1 RDMA problem. There is a clear benefit from 
modifying this specific case. Do you think its not worth making 
incremental improvements while also attacking a potential bigger issue?


-DON

On 10/06/09 10:52, George Bosilca wrote:

Don,

This seems a very IB centric problem (and solution) going up in the 
PML. Moreover, I noticed that independent on the BTL we have some 
problems with the multi-rail performance. As an example on a cluster 
with 3 GB cards we get the same performance is I enable 2 or 3. 
Didn't had time to look into the details, but this might be a more 
general problem.


  george.

On Oct 6, 2009, at 09:51 , Don Kerr wrote:



I intend to make the change suggested in this ticket to the trunk.  
The change does not impact single rail, tested with openib btl, case 
and does improve dual rail case. Since it does involve performance 
and I am adding a OB1 mca parameter just wanted to check if anyone 
was interested or had an issue with it before I committed the change.


-DON
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


Re: [OMPI devel] trac #2034 : single rail openib btl shows better bandwidth than dual rail (12k< x < 128k)

2009-10-07 Thread George Bosilca

Don,

The problem is that a particular BTL doesn't have the knowledge about  
the other selected BTL, so allowing the BTLs to set this limit is not  
as easy as it sound. However, in the case two identical BTLs are  
selected and that they are the only ones, this clearly is a better  
approach.


If this parameter is set at the PML level, I can't imagine how we  
figure out the correct value depending on the BTLs.


I see this as a pretty strong restriction. How do we know we set a  
value that make sense?


  george.

On Oct 7, 2009, at 10:19 , Don Kerr wrote:


George,

Were you suggesting that the proposed new parameter  
"max_rdma_single_rget" be set by the individual btls similar to  
"btl_eager_limit"?  Seems to me to that is the better approach if I  
am to move forward with this.


-DON

On 10/06/09 11:14, Don Kerr wrote:
I agree there is probably a larger issue here and yes this is  
somewhat specific but where as OB1 appears to have multiple  
protocols depending on the capabilities of the BTLs I would not  
characterize as an IB centric problem. Maybe OB1 RDMA problem.  
There is a clear benefit from modifying this specific case. Do you  
think its not worth making incremental improvements while also  
attacking a potential bigger issue?


-DON

On 10/06/09 10:52, George Bosilca wrote:

Don,

This seems a very IB centric problem (and solution) going up in  
the PML. Moreover, I noticed that independent on the BTL we have  
some problems with the multi-rail performance. As an example on a  
cluster with 3 GB cards we get the same performance is I enable 2  
or 3. Didn't had time to look into the details, but this might be  
a more general problem.


 george.

On Oct 6, 2009, at 09:51 , Don Kerr wrote:



I intend to make the change suggested in this ticket to the  
trunk.  The change does not impact single rail, tested with  
openib btl, case and does improve dual rail case. Since it does  
involve performance and I am adding a OB1 mca parameter just  
wanted to check if anyone was interested or had an issue with it  
before I committed the change.


-DON
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r22066

2009-10-07 Thread George Bosilca

Jeff,

I'm intrigued by this commit. Here is how I read the patch:

1. In most of the cases you added one opal_output on top of the  
already existing orte_show_help


2. In some other cases you replaced meaningful messages such as
 "mca_common_sm_mmap_init: orte_rml.recv failed from %d with  
errno=%d\n"

  with
 "Receive was less than posted size"

I don't think this makes things more clear or cleaner ...

  george.

On Oct 7, 2009, at 14:58 , jsquy...@osl.iu.edu wrote:


Author: jsquyres
Date: 2009-10-07 14:58:58 EDT (Wed, 07 Oct 2009)
New Revision: 22066
URL: https://svn.open-mpi.org/trac/ompi/changeset/22066

Log:
Fix CID 1384.  Also remove some opal_output(0,...)'s in favor of
ORTE_ERROR_LOG.

Text files modified:
  trunk/ompi/mca/common/sm/common_sm_mmap.c |22 +++ 
+--

  1 files changed, 12 insertions(+), 10 deletions(-)

Modified: trunk/ompi/mca/common/sm/common_sm_mmap.c
= 
= 
= 
= 
= 
= 
= 
= 
==

--- trunk/ompi/mca/common/sm/common_sm_mmap.c   (original)
+++ trunk/ompi/mca/common/sm/common_sm_mmap.c	2009-10-07 14:58:58  
EDT (Wed, 07 Oct 2009)

@@ -50,6 +50,7 @@
#include "orte/util/name_fns.h"
#include "orte/util/show_help.h"
#include "orte/runtime/orte_globals.h"
+#include "orte/mca/errmgr/errmgr.h"

#include "ompi/constants.h"
#include "ompi/proc/proc.h"
@@ -223,15 +224,19 @@
/* process initializing the file */
fd = open(file_name, O_CREAT|O_RDWR, 0600);
if (fd < 0) {
+int err = errno;
+ORTE_ERROR_LOG(OMPI_ERR_IN_ERRNO);
orte_show_help("help-mpi-common-sm.txt", "sys call  
fail", 1,

   orte_process_info.nodename,
   "open(2)", file_name,
-   strerror(errno), errno);
+   strerror(err), err);
} else if (ftruncate(fd, size) != 0) {
+int err = errno;
+ORTE_ERROR_LOG(OMPI_ERR_IN_ERRNO);
orte_show_help("help-mpi-common-sm.txt", "sys call  
fail", 1,

   orte_process_info.nodename,
   "ftruncate(2)", "",
-   strerror(errno), errno);
+   strerror(err), err);
close(fd);
unlink(file_name);
fd = -1;
@@ -263,6 +268,7 @@
rc = orte_rml.send(&(procs[p]->proc_name), iov, 3,
   OMPI_RML_TAG_SM_BACK_FILE_CREATED, 0);
if (rc < (ssize_t) (iov[0].iov_len + iov[1].iov_len + iov 
[2].iov_len)) {

+ORTE_ERROR_LOG(OMPI_ERR_COMM_FAILURE);
opal_output(0, "mca_common_sm_mmap_init: "
"orte_rml.send failed to %lu with errno= 
%d, ret=%d, iov_len sum=%d\n",

(unsigned long)p, errno,
@@ -312,11 +318,8 @@

OMPI_RML_TAG_SM_BACK_FILE_CREATED, 0);

opal_progress_event_users_decrement();
if (rc < 0) {
-opal_output(0, "mca_common_sm_mmap_init: "
-"orte_rml.recv failed from %d with  
errno=%d\n",

-0, errno);
-munmap(map, size);
-/* fd wasn't opened here; no need to close/ 
reset */

+ORTE_ERROR_LOG(OMPI_ERR_RECV_LESS_THAN_POSTED);
+/* fd/map wasn't opened here; no need to close/ 
reset */

goto out;
}

@@ -328,9 +331,8 @@
/* If not, put it on the pending list and try again */
rml_msg = OBJ_NEW(pending_rml_msg_t);
if (NULL == rml_msg) {
-opal_output(0, "mca_common_sm_mmap_init: failed  
to create pending rml message");

-munmap(map, size);
-/* fd wasn't opened here; no need to close/ 
reset */

+ORTE_ERROR_LOG(OMPI_ERR_OUT_OF_RESOURCE);
+/* fd/map wasn't opened here; no need to close/ 
reset */

goto out;
}
memcpy(rml_msg->file_name, filename_to_send,
___
svn-full mailing list
svn-f...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/svn-full




Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r22066

2009-10-07 Thread Jeff Squyres

Thanks for the sanity check.

I don't know what I was thinking for the first 2 -- those are clearly  
extraneous.  The 3rd one, I meant to remove the opal_output() and have  
it just be the ORTE_ERROR_LOG (the existing opal_output() doesn't  
really tell much extra).


I'll fix.


On Oct 7, 2009, at 3:58 PM, George Bosilca wrote:


Jeff,

I'm intrigued by this commit. Here is how I read the patch:

1. In most of the cases you added one opal_output on top of the
already existing orte_show_help

2. In some other cases you replaced meaningful messages such as
  "mca_common_sm_mmap_init: orte_rml.recv failed from %d with
errno=%d\n"
   with
  "Receive was less than posted size"

I don't think this makes things more clear or cleaner ...

   george.

On Oct 7, 2009, at 14:58 , jsquy...@osl.iu.edu wrote:

> Author: jsquyres
> Date: 2009-10-07 14:58:58 EDT (Wed, 07 Oct 2009)
> New Revision: 22066
> URL: https://svn.open-mpi.org/trac/ompi/changeset/22066
>
> Log:
> Fix CID 1384.  Also remove some opal_output(0,...)'s in favor of
> ORTE_ERROR_LOG.
>
> Text files modified:
>   trunk/ompi/mca/common/sm/common_sm_mmap.c |22 +++
> +--
>   1 files changed, 12 insertions(+), 10 deletions(-)
>
> Modified: trunk/ompi/mca/common/sm/common_sm_mmap.c
> =
> =
> =
> =
> =
> =
> =
> =
>  
==

> --- trunk/ompi/mca/common/sm/common_sm_mmap.c (original)
> +++ trunk/ompi/mca/common/sm/common_sm_mmap.c 2009-10-07 14:58:58
> EDT (Wed, 07 Oct 2009)
> @@ -50,6 +50,7 @@
> #include "orte/util/name_fns.h"
> #include "orte/util/show_help.h"
> #include "orte/runtime/orte_globals.h"
> +#include "orte/mca/errmgr/errmgr.h"
>
> #include "ompi/constants.h"
> #include "ompi/proc/proc.h"
> @@ -223,15 +224,19 @@
> /* process initializing the file */
> fd = open(file_name, O_CREAT|O_RDWR, 0600);
> if (fd < 0) {
> +int err = errno;
> +ORTE_ERROR_LOG(OMPI_ERR_IN_ERRNO);
> orte_show_help("help-mpi-common-sm.txt", "sys call
> fail", 1,
>orte_process_info.nodename,
>"open(2)", file_name,
> -   strerror(errno), errno);
> +   strerror(err), err);
> } else if (ftruncate(fd, size) != 0) {
> +int err = errno;
> +ORTE_ERROR_LOG(OMPI_ERR_IN_ERRNO);
> orte_show_help("help-mpi-common-sm.txt", "sys call
> fail", 1,
>orte_process_info.nodename,
>"ftruncate(2)", "",
> -   strerror(errno), errno);
> +   strerror(err), err);
> close(fd);
> unlink(file_name);
> fd = -1;
> @@ -263,6 +268,7 @@
> rc = orte_rml.send(&(procs[p]->proc_name), iov, 3,
>OMPI_RML_TAG_SM_BACK_FILE_CREATED,  
0);
> if (rc < (ssize_t) (iov[0].iov_len + iov[1].iov_len +  
iov

> [2].iov_len)) {
> +ORTE_ERROR_LOG(OMPI_ERR_COMM_FAILURE);
> opal_output(0, "mca_common_sm_mmap_init: "
> "orte_rml.send failed to %lu with errno=
> %d, ret=%d, iov_len sum=%d\n",
> (unsigned long)p, errno,
> @@ -312,11 +318,8 @@
>
> OMPI_RML_TAG_SM_BACK_FILE_CREATED, 0);
> opal_progress_event_users_decrement();
> if (rc < 0) {
> -opal_output(0, "mca_common_sm_mmap_init: "
> -"orte_rml.recv failed from %d with
> errno=%d\n",
> -0, errno);
> -munmap(map, size);
> -/* fd wasn't opened here; no need to close/
> reset */
> +ORTE_ERROR_LOG(OMPI_ERR_RECV_LESS_THAN_POSTED);
> +/* fd/map wasn't opened here; no need to close/
> reset */
> goto out;
> }
>
> @@ -328,9 +331,8 @@
> /* If not, put it on the pending list and try  
again */

> rml_msg = OBJ_NEW(pending_rml_msg_t);
> if (NULL == rml_msg) {
> -opal_output(0, "mca_common_sm_mmap_init: failed
> to create pending rml message");
> -munmap(map, size);
> -/* fd wasn't opened here; no need to close/
> reset */
> +ORTE_ERROR_LOG(OMPI_ERR_OUT_OF_RESOURCE);
> +/* fd/map wasn't opened here; no need to close/
> reset */
> goto out;
> }
> memcpy(rml_msg->file_name, filename_to_send,
> ___
> svn-full mailing list
> svn-f...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/svn-full

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--
Jeff Squyres
j