[OMPI devel] [PATCH] fix mx btl_bandwidth

2010-09-03 Thread Brice Goglin
For some reason, the MX btl sets btl_bandwidth in megabits/s instead of
megabytes/s. So we get crazy btl_weights in case of heterogeneous
multirail. And --mca btl_mx_bandwidth  cannot work around the
problem (it probably doesn't help because it's overriden by the runtime
link width detection anyway?).

Signed-off-by: Brice Goglin 

Index: ompi/mca/btl/mx/btl_mx_component.c
===
--- ompi/mca/btl/mx/btl_mx_component.c  (révision 23711)
+++ ompi/mca/btl/mx/btl_mx_component.c  (copie de travail)
@@ -159,7 +159,7 @@
  MCA_BTL_FLAGS_PUT |
  MCA_BTL_FLAGS_SEND |
  MCA_BTL_FLAGS_RDMA_MATCHED);
-mca_btl_mx_module.super.btl_bandwidth = 2000;
+mca_btl_mx_module.super.btl_bandwidth = 250;
 mca_btl_mx_module.super.btl_latency = 5;
 mca_btl_base_param_register(&mca_btl_mx_component.super.btl_version,
 &mca_btl_mx_module.super);
@@ -357,7 +357,7 @@
 mx_btl->mx_endpoint = mx_endpoint;
 mx_btl->mx_endpoint_addr = mx_endpoint_addr;

-mx_btl->super.btl_bandwidth = 2000;  /* whatever */
+mx_btl->super.btl_bandwidth = 250;  /* whatever */
 mx_btl->super.btl_latency = 10;
 #if defined(MX_HAS_NET_TYPE)
 {
@@ -370,11 +370,11 @@
 } else {
 if( MX_SPEED_2G == value ) {
 mx_unique_network_id |= 0xaa00;
-mx_btl->super.btl_bandwidth = 2000;
+mx_btl->super.btl_bandwidth = 250;
 mx_btl->super.btl_latency = 5;
 } else if( MX_SPEED_10G == value ) {
 mx_unique_network_id |= 0xbb00;
-mx_btl->super.btl_bandwidth = 1;
+mx_btl->super.btl_bandwidth = 1250;
 mx_btl->super.btl_latency = 3;
 } else {
 mx_unique_network_id |= 0xcc00;


Index: ompi/mca/btl/mx/btl_mx_component.c
===
--- ompi/mca/btl/mx/btl_mx_component.c	(révision 23711)
+++ ompi/mca/btl/mx/btl_mx_component.c	(copie de travail)
@@ -159,7 +159,7 @@
  MCA_BTL_FLAGS_PUT |
  MCA_BTL_FLAGS_SEND |
  MCA_BTL_FLAGS_RDMA_MATCHED);
-mca_btl_mx_module.super.btl_bandwidth = 2000;
+mca_btl_mx_module.super.btl_bandwidth = 250;
 mca_btl_mx_module.super.btl_latency = 5;
 mca_btl_base_param_register(&mca_btl_mx_component.super.btl_version,
 &mca_btl_mx_module.super);
@@ -357,7 +357,7 @@
 mx_btl->mx_endpoint = mx_endpoint;
 mx_btl->mx_endpoint_addr = mx_endpoint_addr;

-mx_btl->super.btl_bandwidth = 2000;  /* whatever */
+mx_btl->super.btl_bandwidth = 250;  /* whatever */
 mx_btl->super.btl_latency = 10;
 #if defined(MX_HAS_NET_TYPE)
 {
@@ -370,11 +370,11 @@
 } else {
 if( MX_SPEED_2G == value ) {
 mx_unique_network_id |= 0xaa00;
-mx_btl->super.btl_bandwidth = 2000;
+mx_btl->super.btl_bandwidth = 250;
 mx_btl->super.btl_latency = 5;
 } else if( MX_SPEED_10G == value ) {
 mx_unique_network_id |= 0xbb00;
-mx_btl->super.btl_bandwidth = 1;
+mx_btl->super.btl_bandwidth = 1250;
 mx_btl->super.btl_latency = 3;
 } else {
 mx_unique_network_id |= 0xcc00;


Re: [OMPI devel] [PATCH] fix mx btl_bandwidth

2010-09-03 Thread Jeff Squyres
Thanks; committed in r23712.

Can you file CMRs for 1.4 and 1.5?  Thanks.


On Sep 3, 2010, at 3:53 AM, Brice Goglin wrote:

> For some reason, the MX btl sets btl_bandwidth in megabits/s instead of
> megabytes/s. So we get crazy btl_weights in case of heterogeneous
> multirail. And --mca btl_mx_bandwidth  cannot work around the
> problem (it probably doesn't help because it's overriden by the runtime
> link width detection anyway?).
> 
> Signed-off-by: Brice Goglin 
> 
> Index: ompi/mca/btl/mx/btl_mx_component.c
> ===
> --- ompi/mca/btl/mx/btl_mx_component.c(révision 23711)
> +++ ompi/mca/btl/mx/btl_mx_component.c(copie de travail)
> @@ -159,7 +159,7 @@
>  MCA_BTL_FLAGS_PUT |
>  MCA_BTL_FLAGS_SEND |
>  MCA_BTL_FLAGS_RDMA_MATCHED);
> -mca_btl_mx_module.super.btl_bandwidth = 2000;
> +mca_btl_mx_module.super.btl_bandwidth = 250;
> mca_btl_mx_module.super.btl_latency = 5;
> mca_btl_base_param_register(&mca_btl_mx_component.super.btl_version,
> &mca_btl_mx_module.super);
> @@ -357,7 +357,7 @@
> mx_btl->mx_endpoint = mx_endpoint;
> mx_btl->mx_endpoint_addr = mx_endpoint_addr;
> 
> -mx_btl->super.btl_bandwidth = 2000;  /* whatever */
> +mx_btl->super.btl_bandwidth = 250;  /* whatever */
> mx_btl->super.btl_latency = 10;
> #if defined(MX_HAS_NET_TYPE)
> {
> @@ -370,11 +370,11 @@
> } else {
> if( MX_SPEED_2G == value ) {
> mx_unique_network_id |= 0xaa00;
> -mx_btl->super.btl_bandwidth = 2000;
> +mx_btl->super.btl_bandwidth = 250;
> mx_btl->super.btl_latency = 5;
> } else if( MX_SPEED_10G == value ) {
> mx_unique_network_id |= 0xbb00;
> -mx_btl->super.btl_bandwidth = 1;
> +mx_btl->super.btl_bandwidth = 1250;
> mx_btl->super.btl_latency = 3;
> } else {
> mx_unique_network_id |= 0xcc00;
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] OMPI 1.5 twitter notification plugin probably broken by switch to OAUTH

2010-09-03 Thread Jeff Squyres
On Sep 1, 2010, at 7:15 AM, Chris Samuel wrote:

> Looking at the code for the Twitter notifier in OMPI 1.5
> and seeing its use of HTTP basic authentication I would
> suggest that it may be non-functional due to Twitters
> switch to purely OAUTH based authentication for their API.

Oy; I got that notice from Twitter, too.

I'm afraid I don't know much about OAuth -- would anyone be interested in 
submitting a patch to make the twitter notifier use OAuth?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] openib btl - fatal errors don't abort the job

2010-09-03 Thread Jeff Squyres
On Sep 1, 2010, at 4:47 PM, Steve Wise wrote:

> I was wondering what the logic is behind allowing an MPI job to continue in 
> the presence of a fatal qp error?

It's a feature...?

> Note the "will try to continue" sentence:
> 
> --
> The OpenFabrics stack has reported a network error event.  Open MPI
> will try to continue, but your job may end up failing.
> 
>  Local host:escher
>  MPI process PID:   19136
>  Error number:  1 (IBV_EVENT_QP_FATAL)
> 
> This error may indicate connectivity problems within the fabric;
> please contact your system administrator.
> --
> 
> Due to other bugs I'm chasing, I get these sorts of errors, and I notice the 
> job just hangs even though the connections have been disconnected, the qps 
> flushed, and all pending WRs completed with status == FLUSH.

Would it be better to make it a fatal error?  (I'm thinking probably "yes")

Feel free to submit a patch...

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] 1/4/3rc1 over MX

2010-09-03 Thread Jeff Squyres
On Sep 1, 2010, at 9:10 AM, Scott Atchley wrote:

> I posted a patch for this on the ticket.

Will someone be committing this to SVN?

I re-opened the ticket because just posting a patch to the ticket doesn't 
actually fix anything.  :-)

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] 1.5rc5 over MX

2010-09-03 Thread Jeff Squyres
Ditto for the v1.5 patch -- it wasn't committed anywhere and no CMR was filed, 
so I re-opened the ticket.

Plus you mentioned a 2us (!) latency increase.  Doesn't that need attention, 
too?


On Sep 1, 2010, at 9:09 AM, Scott Atchley wrote:

> Jeff,
> 
> I posted a patch on the ticket.
> 
> Scott
> 
> On Aug 27, 2010, at 3:08 PM, Scott Atchley wrote:
> 
>> Jeff,
>> 
>> Sure, I need to register to file the tickets.
>> 
>> I have not had a chance yet. I will try to look at them first thing next 
>> week.
>> 
>> Scott
>> 
>> On Aug 27, 2010, at 2:41 PM, Jeff Squyres wrote:
>> 
>>> Scott --
>>> 
>>> Can you file tickets for this against 1.4 and 1.5?  These should probably 
>>> be blockers.
>>> 
>>> Have you been able to track these down any further, perchance?
>>> 
>>> 
>>> On Aug 26, 2010, at 10:38 AM, Scott Atchley wrote:
>>> 
 Hi all,
 
 Testing 1.5rc5 over MX with the same setup as 1.4.3rc1 (RHEL 5.4 and MX 
 1.2.12).
 
 This version also dies during init due to the memory manager if I do not 
 specify which pml to use. If I specify pml ob1 or pml cm, the tests start 
 but die with segfaults:
 
131072  320   166.86   749.15
 [rain15:14939] *** Process received signal ***
 [rain15:14939] Signal: Segmentation fault (11)
 [rain15:14939] Signal code: Address not mapped (1)
 [rain15:14939] Failing at address: 0x3b20
 
 Again, configuring without the memory manager or setting 
 OMPI_MCA_memory_ptmalloc2_disable=1 before calling mpirun work.
 
 Similar latency issues with the BTl and not with the MTL.
 
 Scott
 ___
 devel mailing list
 de...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> 
>>> -- 
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> 
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] 1/4/3rc1 over MX

2010-09-03 Thread Scott Atchley
On Sep 3, 2010, at 8:19 AM, Jeff Squyres wrote:

> On Sep 1, 2010, at 9:10 AM, Scott Atchley wrote:
> 
>> I posted a patch for this on the ticket.
> 
> Will someone be committing this to SVN?
> 
> I re-opened the ticket because just posting a patch to the ticket doesn't 
> actually fix anything.  :-)

We should probably set me up with commit privileges.

Scott


Re: [OMPI devel] 1.5rc5 over MX

2010-09-03 Thread Scott Atchley
Shouldn't the regression be a separate ticket since it is unrelated?

Scott

On Sep 3, 2010, at 8:20 AM, Jeff Squyres wrote:

> Ditto for the v1.5 patch -- it wasn't committed anywhere and no CMR was 
> filed, so I re-opened the ticket.
> 
> Plus you mentioned a 2us (!) latency increase.  Doesn't that need attention, 
> too?
> 
> 
> On Sep 1, 2010, at 9:09 AM, Scott Atchley wrote:
> 
>> Jeff,
>> 
>> I posted a patch on the ticket.
>> 
>> Scott
>> 
>> On Aug 27, 2010, at 3:08 PM, Scott Atchley wrote:
>> 
>>> Jeff,
>>> 
>>> Sure, I need to register to file the tickets.
>>> 
>>> I have not had a chance yet. I will try to look at them first thing next 
>>> week.
>>> 
>>> Scott
>>> 
>>> On Aug 27, 2010, at 2:41 PM, Jeff Squyres wrote:
>>> 
 Scott --
 
 Can you file tickets for this against 1.4 and 1.5?  These should probably 
 be blockers.
 
 Have you been able to track these down any further, perchance?
 
 
 On Aug 26, 2010, at 10:38 AM, Scott Atchley wrote:
 
> Hi all,
> 
> Testing 1.5rc5 over MX with the same setup as 1.4.3rc1 (RHEL 5.4 and MX 
> 1.2.12).
> 
> This version also dies during init due to the memory manager if I do not 
> specify which pml to use. If I specify pml ob1 or pml cm, the tests start 
> but die with segfaults:
> 
>   131072  320   166.86   749.15
> [rain15:14939] *** Process received signal ***
> [rain15:14939] Signal: Segmentation fault (11)
> [rain15:14939] Signal code: Address not mapped (1)
> [rain15:14939] Failing at address: 0x3b20
> 
> Again, configuring without the memory manager or setting 
> OMPI_MCA_memory_ptmalloc2_disable=1 before calling mpirun work.
> 
> Similar latency issues with the BTl and not with the MTL.
> 
> Scott
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
 
 
 -- 
 Jeff Squyres
 jsquy...@cisco.com
 For corporate legal information go to:
 http://www.cisco.com/web/about/doing_business/legal/cri/
 
 
 ___
 devel mailing list
 de...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 




Re: [OMPI devel] 1/4/3rc1 over MX

2010-09-03 Thread Abhishek Kulkarni



On Fri, 3 Sep 2010, Jeff Squyres wrote:


On Sep 1, 2010, at 9:10 AM, Scott Atchley wrote:


I posted a patch for this on the ticket.


Will someone be committing this to SVN?



Done. Filed the CMRs to get this moved to 1.4.3 and 1.5.


I re-opened the ticket because just posting a patch to the ticket doesn't 
actually fix anything.  :-)

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] [PATCH] fix mx btl_bandwidth

2010-09-03 Thread George Bosilca
Jeff,

I think you will have to revert this patch as the btl_bandwidth __IS__ supposed 
to be in Mbs and not MBs. We usually talk about networks in Mbs (there is a 
pattern in Ethernet 1G/10G, Myricom 10G). In addition the original design of 
the multi-rail was based on this assumption, and the multi-rail handling code 
deal with these values (at that level I don't think it really matters, but at 
least it needs consistent values from all BTLs).

However, going over the existing BTLs I can see that some BTLs do not correctly 
set this value:

BTL BandwidthAuto-detect Status
Elan2000NO   Correct
GM  250 NO   Doubtful
MX  2000/1  YES (Mbs)Correct (before the patch)
OFUD800 NO   Doubtful
OpenIB  2000/4000/8000  YES (Mbs)Correct (multiplied by the 
active_width)
Portals 1000NO   Doubtful
SCTP100 NO   Conservative value (correct)
Self100 XXX  Correct (doesn't matter anyway)
SM  9000NO   Correct
TCP 100 NO   Conservative value (correct)
UDAPL   225 NO   Incorrect

Some of these BTL values do not make sense, neither in Mbs or MBs. Here is a 
list of such BTLs: OFUD, Portals, UDAPL. If the corresponding developers can 
provide the default bandwidth (in Mbs) I will update their values.

For SCTP, TCP I don't know how to detect it reliably in a portable way, so I 
expect to let them set to this very conservative value. Moreover, the BTL TCP 
is only used for multi-rail if the available high performance network allows 
it, so it doesn't really matter.

  george.

On Sep 3, 2010, at 08:03 , Jeff Squyres wrote:

> Thanks; committed in r23712.
> 
> Can you file CMRs for 1.4 and 1.5?  Thanks.
> 
> 
> On Sep 3, 2010, at 3:53 AM, Brice Goglin wrote:
> 
>> For some reason, the MX btl sets btl_bandwidth in megabits/s instead of
>> megabytes/s. So we get crazy btl_weights in case of heterogeneous
>> multirail. And --mca btl_mx_bandwidth  cannot work around the
>> problem (it probably doesn't help because it's overriden by the runtime
>> link width detection anyway?).
>> 
>> Signed-off-by: Brice Goglin 
>> 
>> Index: ompi/mca/btl/mx/btl_mx_component.c
>> ===
>> --- ompi/mca/btl/mx/btl_mx_component.c   (révision 23711)
>> +++ ompi/mca/btl/mx/btl_mx_component.c   (copie de travail)
>> @@ -159,7 +159,7 @@
>> MCA_BTL_FLAGS_PUT |
>> MCA_BTL_FLAGS_SEND |
>> MCA_BTL_FLAGS_RDMA_MATCHED);
>> -mca_btl_mx_module.super.btl_bandwidth = 2000;
>> +mca_btl_mx_module.super.btl_bandwidth = 250;
>>mca_btl_mx_module.super.btl_latency = 5;
>>mca_btl_base_param_register(&mca_btl_mx_component.super.btl_version,
>>&mca_btl_mx_module.super);
>> @@ -357,7 +357,7 @@
>>mx_btl->mx_endpoint = mx_endpoint;
>>mx_btl->mx_endpoint_addr = mx_endpoint_addr;
>> 
>> -mx_btl->super.btl_bandwidth = 2000;  /* whatever */
>> +mx_btl->super.btl_bandwidth = 250;  /* whatever */
>>mx_btl->super.btl_latency = 10;
>> #if defined(MX_HAS_NET_TYPE)
>>{
>> @@ -370,11 +370,11 @@
>>} else {
>>if( MX_SPEED_2G == value ) {
>>mx_unique_network_id |= 0xaa00;
>> -mx_btl->super.btl_bandwidth = 2000;
>> +mx_btl->super.btl_bandwidth = 250;
>>mx_btl->super.btl_latency = 5;
>>} else if( MX_SPEED_10G == value ) {
>>mx_unique_network_id |= 0xbb00;
>> -mx_btl->super.btl_bandwidth = 1;
>> +mx_btl->super.btl_bandwidth = 1250;
>>mx_btl->super.btl_latency = 3;
>>} else {
>>mx_unique_network_id |= 0xcc00;
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [PATCH] fix mx btl_bandwidth

2010-09-03 Thread Jeff Squyres
On Sep 3, 2010, at 9:38 AM, George Bosilca wrote:

> I think you will have to revert this patch as the btl_bandwidth __IS__ 
> supposed to be in Mbs and not MBs. We usually talk about networks in Mbs 
> (there is a pattern in Ethernet 1G/10G, Myricom 10G).

This is why I shouldn't commit patches for others, and why I'm glad I pushed 
Scott to commit the other fixes himself...

I'll revert; you, Scott, and Brice figure out what you want to do.

> In addition the original design of the multi-rail was based on this 
> assumption, and the multi-rail handling code deal with these values (at that 
> level I don't think it really matters, but at least it needs consistent 
> values from all BTLs).
> 
> However, going over the existing BTLs I can see that some BTLs do not 
> correctly set this value:
> 
> BTL BandwidthAuto-detect Status
> Elan2000NO   Correct
> GM  250 NO   Doubtful
> MX  2000/1  YES (Mbs)Correct (before the patch)
> OFUD800 NO   Doubtful
> OpenIB  2000/4000/8000  YES (Mbs)Correct (multiplied by the 
> active_width)
> Portals 1000NO   Doubtful
> SCTP100 NO   Conservative value (correct)
> Self100 XXX  Correct (doesn't matter anyway)
> SM  9000NO   Correct
> TCP 100 NO   Conservative value (correct)
> UDAPL   225 NO   Incorrect
> 
> Some of these BTL values do not make sense, neither in Mbs or MBs. Here is a 
> list of such BTLs: OFUD, Portals, UDAPL. If the corresponding developers can 
> provide the default bandwidth (in Mbs) I will update their values.

OFUD should be just like OpenFabrics.  But I doubt anyone cares.  Should we 
remove it?

UDAPL intentionally hides that kind of stuff; I don't know if it's possible to 
get it.  Rolf/Terry?

> For SCTP, TCP I don't know how to detect it reliably in a portable way, so I 
> expect to let them set to this very conservative value. Moreover, the BTL TCP 
> is only used for multi-rail if the available high performance network allows 
> it, so it doesn't really matter.

Some servers have 1GB and 10GB TCP, though...

It might be worth having even a Linux-specific way to auto-detect, just for 
this use case (which is becoming more common -- 1GB LOM and a 10GB non-iWARP 
NIC).

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] [PATCH] fix mx btl_bandwidth

2010-09-03 Thread Brice Goglin
Le 03/09/2010 15:38, George Bosilca a écrit :
> Jeff,
>
> I think you will have to revert this patch as the btl_bandwidth __IS__ 
> supposed to be in Mbs and not MBs. We usually talk about networks in Mbs 
> (there is a pattern in Ethernet 1G/10G, Myricom 10G). In addition the 
> original design of the multi-rail was based on this assumption, and the 
> multi-rail handling code deal with these values (at that level I don't think 
> it really matters, but at least it needs consistent values from all BTLs).
>
> However, going over the existing BTLs I can see that some BTLs do not 
> correctly set this value:
>
> BTL BandwidthAuto-detect Status
> Elan2000NO   Correct
>   

2000 looks strange to me. Last time I played with Elan4, bandwidth was
900MB/s or so.

> GM  250 NO   Doubtful
> MX  2000/1  YES (Mbs)Correct (before the patch)
> OFUD800 NO   Doubtful
> OpenIB  2000/4000/8000  YES (Mbs)Correct (multiplied by the 
> active_width)
>   

I found the problem when using both MX and OpenIB at the same time, so
they can't be both wrong or both correct. IB was reporting 800, not
2000/4000/8000. Maybe because auto-detect didn't work and the default is
wrong:
btl_openib_mca.c:527:mca_btl_openib_module.super.btl_bandwidth = 800;

Brice



Re: [OMPI devel] [PATCH] fix mx btl_bandwidth

2010-09-03 Thread George Bosilca

On Sep 3, 2010, at 09:50 , Brice Goglin wrote:

> Le 03/09/2010 15:38, George Bosilca a écrit :
>> Jeff,
>> 
>> I think you will have to revert this patch as the btl_bandwidth __IS__ 
>> supposed to be in Mbs and not MBs. We usually talk about networks in Mbs 
>> (there is a pattern in Ethernet 1G/10G, Myricom 10G). In addition the 
>> original design of the multi-rail was based on this assumption, and the 
>> multi-rail handling code deal with these values (at that level I don't think 
>> it really matters, but at least it needs consistent values from all BTLs).
>> 
>> However, going over the existing BTLs I can see that some BTLs do not 
>> correctly set this value:
>> 
>> BTL BandwidthAuto-detect Status
>> Elan2000NO   Correct
>> 
> 
> 2000 looks strange to me. Last time I played with Elan4, bandwidth was
> 900MB/s or so.

Lucky you ;) The 2000 was the bandwidth of the last Elan device we had.

> 
>> GM  250 NO   Doubtful
>> MX  2000/1  YES (Mbs)Correct (before the patch)
>> OFUD800 NO   Doubtful
>> OpenIB  2000/4000/8000  YES (Mbs)Correct (multiplied by the 
>> active_width)
>> 
> 
> I found the problem when using both MX and OpenIB at the same time, so
> they can't be both wrong or both correct. IB was reporting 800, not
> 2000/4000/8000. Maybe because auto-detect didn't work and the default is
> wrong:
> btl_openib_mca.c:527:mca_btl_openib_module.super.btl_bandwidth = 800;

It appears that Open IB only auto-detect the bandwidth if the value is 
explicitly set to zero via the mca parameters. As a last resort: as for the 
other devices you can set it manually. Use something like 
btl_openib_bandwidth_%dev_name% to set the bandwidth per device.

  george.


> 
> Brice
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] 1.5rc5 over MX

2010-09-03 Thread Jeff Squyres
Yes, probably so.

On Sep 3, 2010, at 8:53 AM, Scott Atchley wrote:

> Shouldn't the regression be a separate ticket since it is unrelated?
> 
> Scott
> 
> On Sep 3, 2010, at 8:20 AM, Jeff Squyres wrote:
> 
>> Ditto for the v1.5 patch -- it wasn't committed anywhere and no CMR was 
>> filed, so I re-opened the ticket.
>> 
>> Plus you mentioned a 2us (!) latency increase.  Doesn't that need attention, 
>> too?
>> 
>> 
>> On Sep 1, 2010, at 9:09 AM, Scott Atchley wrote:
>> 
>>> Jeff,
>>> 
>>> I posted a patch on the ticket.
>>> 
>>> Scott
>>> 
>>> On Aug 27, 2010, at 3:08 PM, Scott Atchley wrote:
>>> 
 Jeff,
 
 Sure, I need to register to file the tickets.
 
 I have not had a chance yet. I will try to look at them first thing next 
 week.
 
 Scott
 
 On Aug 27, 2010, at 2:41 PM, Jeff Squyres wrote:
 
> Scott --
> 
> Can you file tickets for this against 1.4 and 1.5?  These should probably 
> be blockers.
> 
> Have you been able to track these down any further, perchance?
> 
> 
> On Aug 26, 2010, at 10:38 AM, Scott Atchley wrote:
> 
>> Hi all,
>> 
>> Testing 1.5rc5 over MX with the same setup as 1.4.3rc1 (RHEL 5.4 and MX 
>> 1.2.12).
>> 
>> This version also dies during init due to the memory manager if I do not 
>> specify which pml to use. If I specify pml ob1 or pml cm, the tests 
>> start but die with segfaults:
>> 
>>  131072  320   166.86   749.15
>> [rain15:14939] *** Process received signal ***
>> [rain15:14939] Signal: Segmentation fault (11)
>> [rain15:14939] Signal code: Address not mapped (1)
>> [rain15:14939] Failing at address: 0x3b20
>> 
>> Again, configuring without the memory manager or setting 
>> OMPI_MCA_memory_ptmalloc2_disable=1 before calling mpirun work.
>> 
>> Similar latency issues with the BTl and not with the MTL.
>> 
>> Scott
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
 
 
 ___
 devel mailing list
 de...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>> 
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> 
> 


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] 1.5rc5 has been posted

2010-09-03 Thread Larry Baker
Using MPI-2 (Gropp et al.) says MPI_SIZEOF() only supports numeric  
intrinsic data types.  So, I patched OpenMPI 1.4.2 to remove the  
declarations of the undefined Logical and Character specific  
procedures in ompi/mpi/f90/scripts/mpi-f90-interfaces.h.sh:



  output_197 MPI_Sizeof ${rank} CH "character${dim}"
  output_197 MPI_Sizeof ${rank} L "logical${dim}"


I also changed all the dummy array declarations in the INTERFACE  
declarations to use assumed-shape arrays, which is the correct Fortran  
90 method to declare the rank and extent of any actual array arguments.


I simplified both ompi/mpi/f90/scripts/mpi-f90-interfaces.h.sh and  
ompi/mpi/f90/scripts/mpi_sizeof.f90.sh.  In mpi-f90-interfaces.h.sh, I  
defined an array, array_dims, with the DIMENSION declarations, then  
replaced all the copies of dim= throughout the code with a reference  
to array_dims by rank:



array_dims[0]=''
array_dims[1]=', dimension(:)'
array_dims[2]=', dimension(:,:)'
array_dims[3]=', dimension(:,:,:)'
array_dims[4]=', dimension(:,:,:,:)'
array_dims[5]=', dimension(:,:,:,:,:)'
array_dims[6]=', dimension(:,:,:,:,:,:)'
array_dims[7]=', dimension(:,:,:,:,:,:,:)'

for rank in $allranks
do
  dim=${array_dims[${rank}]}


In mpi_sizeof.f90.sh, I copied the method to enumerate rank 0 with all  
the other ranks from the code in mpi-f90-interfaces.h.sh:



allranks="0 $ranks"

for rank in $allranks
do
  case "$rank" in  0)  dim=''  ;  esac
  case "$rank" in  1)  dim=', dimension(:)'  ;  esac
  case "$rank" in  2)  dim=', dimension(:,:)'  ;  esac
  case "$rank" in  3)  dim=', dimension(:,:,:)'  ;  esac
  case "$rank" in  4)  dim=', dimension(:,:,:,:)'  ;  esac
  case "$rank" in  5)  dim=', dimension(:,:,:,:,:)'  ;  esac
  case "$rank" in  6)  dim=', dimension(:,:,:,:,:,:)'  ;  esac
  case "$rank" in  7)  dim=', dimension(:,:,:,:,:,:,:)'  ;  esac


Here's the patch I used for OpenMPI 1.4.2:

# Remove declarations of Logical and Character specific procedures  
from
# Generic Subroutine MPI_SIZEOF and fix dummy arrays to be assumed- 
shape
mv openmpi-1.4.2/ompi/mpi/f90/scripts/mpi-f90- 
interfaces.h.sh{,.original}

sed -e $'34{p;
s/^.*$/array_dims[0]=\'\'/;p;
s/^.*$/array_dims[1]=\', dimension(:)\'/;p;
s/^.*$/array_dims[2]=\', dimension(:,:)\'/;p;
s/^.*$/array_dims[3]=\', dimension(:,:,:)\'/;p;
s/^.*$/array_dims[4]=\', dimension(:,:,:,:)\'/;p;
s/^.*$/array_dims[5]=\', dimension(:,:,:,:,:)\'/;p;
s/^.*$/array_dims[6]=\', dimension(:,:,:,:,:,:)\'/;p;
s/^.*$/array_dims[7]=\', dimension(:,:,:,:,:,:,:)\'/;p;
s/^.*$//;}' \
-e '/case "$rank" in  [0-6])  dim=/d' \
-e '/case "$rank" in  7)  dim=.*$/s//dim=${array_dims[$ 
{rank}]}/' \

-e '7129,7130d' \
openmpi-1.4.2/ompi/mpi/f90/scripts/mpi-f90- 
interfaces.h.sh.original \

>openmpi-1.4.2/ompi/mpi/f90/scripts/mpi-f90-interfaces.h.sh
chmod +x openmpi-1.4.2/ompi/mpi/f90/scripts/mpi-f90-interfaces.h.sh
mv openmpi-1.4.2/ompi/mpi/f90/scripts/mpi_sizeof.f90.sh{,.original}
sed -e '25,84d' \
-e '85s/^.*$/allranks="0 $ranks"/' \
-e '87s/\$ranks/$allranks/' \
-e $'88{p;s/^.*$/  case "$rank" in  0)  dim=\'\'  ;  esac/;}' \
-e $'89,95{s/dim=\'/dim=\', dimension(/;s/1,/:,/g;s/\*\'/:) 
\'/;}' \

-e '97,110d' \
-e '118s/, dimension(\${dim})/${dim}/' \
-e '133s/, dimension(\${dim})/${dim}/' \
-e '148s/, dimension(\${dim})/${dim}/' \
openmpi-1.4.2/ompi/mpi/f90/scripts/mpi_sizeof.f90.sh.original \
>openmpi-1.4.2/ompi/mpi/f90/scripts/mpi_sizeof.f90.sh
chmod +x openmpi-1.4.2/ompi/mpi/f90/scripts/mpi_sizeof.f90.sh


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On Sep 1, 2010, at 5:09 PM, Larry Baker wrote:

OpenMPI 1.4.x and 1.5x fail to link a program that calls Subroutine  
MPI_SIZEOF using the PGI 10.3 compilers:



$ cat junk.f90
 Use MPI
 Implicit None
 Integer var, size, err
 Call MPI_SIZEOF( var, size, err )
 Write (*,*) 'Size of Integer var is ', size, ' bytes.'
 Stop
 End

$ /opt/pgi/linux86-64/current/openmpi/bin/mpif90 -o junk junk.f90
/opt/pgi/linux86-64/10.3/openmpi/lib/libmpi_f90.so: undefined  
reference to `mpi_sizeof1dl_'
/opt/pgi/linux86-64/10.3/openmpi/lib/libmpi_f90.so: undefined  
reference to `mpi_sizeof4dch_'
/opt/pgi/linux86-64/10.3/openmpi/lib/libmpi_f90.so: undefined  
reference to `mpi_sizeof3dl_'
/opt/pgi/linux86-64/10.3/openmpi/lib/libmpi_f90.so: undefined  
reference to `mpi_sizeof4dl_'
/opt/pgi/linux86-64/10.3/openmpi/lib/libmpi_f90.so: undefined  
reference to `mpi_sizeof2dch_'
/opt/pgi/linux86-64/10.3/openmpi/lib/libmpi_f90.so: undefined  
reference to `mpi_sizeof2dl_'
/opt/pgi/linux86-64/10.3/openmpi/lib/libmpi_f90.so: undefined  
reference to `mpi_sizeof3dch_'
/opt/pgi/linux86-64/10.3/openmpi/lib/libmpi_f90.so: undefined  
reference to `mpi_sizeof1dch_'
/opt/pgi/linux86-64/10.3/openmpi/lib/libmpi_f90.so: undefined  
reference to `mpi_sizeof0dl_'
/opt/pgi/linux86-64/10.

Re: [OMPI devel] [PATCH] fix mx btl_bandwidth

2010-09-03 Thread Bogdan Costescu
On Fri, Sep 3, 2010 at 3:47 PM, Jeff Squyres  wrote:

> It might be worth having even a Linux-specific way to auto-detect, just for 
> this use case (which is becoming more common -- 1GB LOM and a 10GB non-iWARP 
> NIC).

The file:

/sys/class/net/ethX/speed

should contain the current speed and is readable by any user; if it
contains 65535 there is no link so the speed is not defined. The
information should also be available through ethtool, but for root
only, which is not so useful in this case. The file might not always
exists, f.e. when /sys is not mounted, using an older kernel, the
driver doesn't expose this info, etc., but from what I understand this
is just a best effort to locate a realistic value.

Cheers,
Bogdan