Re: [OMPI devel] OMPI 1.4.3 hangs in gather

2011-01-18 Thread Terry Dontje
Could the issue have anything to do with the how OMPI implements lazy 
connections with IBCM?  Does setting the mca parameter 
mpi_preconnect_all to 1 change things?


--td

On 01/16/2011 04:12 AM, Doron Shoham wrote:

Hi,

The gather hangs only in liner_sync algorithm but works with
basic_linear and binomial algorithms.
The gather algorithm is choosen dynamiclly depanding on block size and
communicator size.
So, in the beginning, binomial algorithm is chosen (communicator size
is larger then 60).
When increasing the message size, the liner_sync algorithm is chosen
(with small_segment_size).
When debugging on the cluster I saw that the linear_sync function is
called in endless loop with segment size of 1024.
This explain why hang occure in the middle of the run.

I still don't understand why does RDMACM solve it or what causes
liner_sync hangs.

Again, in 1.5 it doesn't hang (maybe timing is different?).
I'm still trying to understand what are the diffrences in those areas
between 1.4.3 and 1.5


BTW,
Choosing RDMACM fixes hangs and performance issues in all collective operations.

Thanks,
Doron


On Thu, Jan 13, 2011 at 9:44 PM, Shamis, Pavel  wrote:

RDMACM creates the same QPs with the same tunings as OOB, so I don't see how 
CPC may effect on performance.

Pavel (Pasha) Shamis
---
Application Performance Tools Group
Computer Science and Math Division
Oak Ridge National Laboratory






On Jan 13, 2011, at 2:15 PM, Jeff Squyres wrote:


+1 on what Pasha said -- if using rdmacm fixes the problem, then there's 
something else nefarious going on...

You might want to check padb with your hangs to see where all the processes are 
hung to see if anything obvious jumps out.  I'd be surprised if there's a bug 
in the oob cpc; it's been around for a long, long time; it should be pretty 
stable.

Do we create QP's differently between oob and rdmacm, such that perhaps they are 
"better" (maybe better routed, or using a different SL, or ...) when created 
via rdmacm?


On Jan 12, 2011, at 12:12 PM, Shamis, Pavel wrote:


RDMACM or OOB can not effect on performance of this benchmark, since they are 
not involved in communication. So I'm not sure that the performance changes 
that you see are related to connection manager changes.
About oob - I'm not aware about hangs issue there, the code is very-very old, 
we did not touch it for a long time.

Regards,

Pavel (Pasha) Shamis
---
Application Performance Tools Group
Computer Science and Math Division
Oak Ridge National Laboratory
Email: sham...@ornl.gov





On Jan 12, 2011, at 8:45 AM, Doron Shoham wrote:


Hi,

For the first problem, I can see that when using rdmacm as openib oob
I get much better performence results (and no hangs!).

mpirun -display-map -np 64 -machinefile voltairenodes -mca btl
sm,self,openib -mca btl_openib_connect_rdmacm_priority 100
imb/src/IMB-MPI1 gather -npmin 64


#bytes  #repetitionst_min[usec] t_max[usec] t_avg[usec]

0   10000.040.050.05

1   100019.64   19.69   19.67

2   100019.97   20.02   19.99

4   100021.86   21.96   21.89

8   100022.87   22.94   22.90

16  100024.71   24.80   24.76

32  100027.23   27.32   27.27

64  100030.96   31.06   31.01

128 100036.96   37.08   37.02

256 100042.64   42.79   42.72

512 100060.32   60.59   60.46

1024100082.44   82.74   82.59

20481000497.66  499.62  498.70

40961000684.15  686.47  685.33

8192519 544.07  546.68  545.85

16384   519 653.20  656.23  655.27

32768   519 704.48  707.55  706.60

65536   519 918.00  922.12  920.86

131072  320 2414.08 2422.17 2418.20

262144  160 4198.25 4227.58 4213.19

524288  80  7333.04 7503.99 7438.18

1048576 40  13692.6014150.2013948.75

2097152 20  30377.3432679.1531779.86

4194304 10  61416.7071012.5068380.04

How can the oob cause the hang? Isn't it only used to bring up the connection?
Does the oob has any part of the connections were made?

Thanks,
Dororn

On Tue, Jan 11, 2011 at 2:58 PM, Doron Shoham  wrote:

Hi

All machines on the setup are IDataPlex with Nehalem 12 cores per node, 24GB  
memory.



· Problem 1 – OMPI 1.4.3 hangs in gather:



I’m trying to run IMB and gather operation with OMPI 1.4.3 (Vanilla).

It happens when np>= 64 and message size exceed 4k:

mpirun -np 64 -machinefile voltairenodes -mca btl sm,self,openib  
imb/src-1.4.2/IMB-MPI1 gather –npmin 64



voltairenodes consists of 64 machines.



#

# Benchmarking Gather

# #processes = 64

#---

Re: [OMPI devel] OMPI 1.4.3 hangs in gather

2011-01-18 Thread Jeff Squyres
IBCM is broken and disabled (has been for a long time).

Did you mean RDMACM?


On Jan 18, 2011, at 6:22 AM, Terry Dontje wrote:

> Could the issue have anything to do with the how OMPI implements lazy 
> connections with IBCM?  Does setting the mca parameter mpi_preconnect_all to 
> 1 change things?
> 
> --td
> 
> On 01/16/2011 04:12 AM, Doron Shoham wrote:
>> Hi,
>> 
>> The gather hangs only in liner_sync algorithm but works with
>> basic_linear and binomial algorithms.
>> The gather algorithm is choosen dynamiclly depanding on block size and
>> communicator size.
>> So, in the beginning, binomial algorithm is chosen (communicator size
>> is larger then 60).
>> When increasing the message size, the liner_sync algorithm is chosen
>> (with small_segment_size).
>> When debugging on the cluster I saw that the linear_sync function is
>> called in endless loop with segment size of 1024.
>> This explain why hang occure in the middle of the run.
>> 
>> I still don't understand why does RDMACM solve it or what causes
>> liner_sync hangs.
>> 
>> Again, in 1.5 it doesn't hang (maybe timing is different?).
>> I'm still trying to understand what are the diffrences in those areas
>> between 1.4.3 and 1.5
>> 
>> 
>> BTW,
>> Choosing RDMACM fixes hangs and performance issues in all collective 
>> operations.
>> 
>> Thanks,
>> Doron
>> 
>> 
>> On Thu, Jan 13, 2011 at 9:44 PM, Shamis, Pavel 
>> 
>>  wrote:
>> 
>>> RDMACM creates the same QPs with the same tunings as OOB, so I don't see 
>>> how CPC may effect on performance.
>>> 
>>> Pavel (Pasha) Shamis
>>> ---
>>> Application Performance Tools Group
>>> Computer Science and Math Division
>>> Oak Ridge National Laboratory
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Jan 13, 2011, at 2:15 PM, Jeff Squyres wrote:
>>> 
>>> 
 +1 on what Pasha said -- if using rdmacm fixes the problem, then there's 
 something else nefarious going on...
 
 You might want to check padb with your hangs to see where all the 
 processes are hung to see if anything obvious jumps out.  I'd be surprised 
 if there's a bug in the oob cpc; it's been around for a long, long time; 
 it should be pretty stable.
 
 Do we create QP's differently between oob and rdmacm, such that perhaps 
 they are "better" (maybe better routed, or using a different SL, or ...) 
 when created via rdmacm?
 
 
 On Jan 12, 2011, at 12:12 PM, Shamis, Pavel wrote:
 
 
> RDMACM or OOB can not effect on performance of this benchmark, since they 
> are not involved in communication. So I'm not sure that the performance 
> changes that you see are related to connection manager changes.
> About oob - I'm not aware about hangs issue there, the code is very-very 
> old, we did not touch it for a long time.
> 
> Regards,
> 
> Pavel (Pasha) Shamis
> ---
> Application Performance Tools Group
> Computer Science and Math Division
> Oak Ridge National Laboratory
> Email: 
> sham...@ornl.gov
> 
> 
> 
> 
> 
> 
> On Jan 12, 2011, at 8:45 AM, Doron Shoham wrote:
> 
> 
>> Hi,
>> 
>> For the first problem, I can see that when using rdmacm as openib oob
>> I get much better performence results (and no hangs!).
>> 
>> mpirun -display-map -np 64 -machinefile voltairenodes -mca btl
>> sm,self,openib -mca btl_openib_connect_rdmacm_priority 100
>> imb/src/IMB-MPI1 gather -npmin 64
>> 
>> 
>> #bytes  #repetitionst_min[usec] t_max[usec] 
>> t_avg[usec]
>> 
>> 0   10000.040.050.05
>> 
>> 1   100019.64   19.69   19.67
>> 
>> 2   100019.97   20.02   19.99
>> 
>> 4   100021.86   21.96   21.89
>> 
>> 8   100022.87   22.94   22.90
>> 
>> 16  100024.71   24.80   24.76
>> 
>> 32  100027.23   27.32   27.27
>> 
>> 64  100030.96   31.06   31.01
>> 
>> 128 100036.96   37.08   37.02
>> 
>> 256 100042.64   42.79   42.72
>> 
>> 512 100060.32   60.59   60.46
>> 
>> 1024100082.44   82.74   82.59
>> 
>> 20481000497.66  499.62  498.70
>> 
>> 40961000684.15  686.47  685.33
>> 
>> 8192519 544.07  546.68  545.85
>> 
>> 16384   519 653.20  656.23  655.27
>> 
>> 32768   519 704.48  707.55  706.60
>> 
>> 65536   519 918.00  922.12  920.86
>> 
>> 131072  320 2414.08 2422.17 2418.20
>> 
>> 262144  160 4198.25 4227.58 4213.19
>> 
>> 524288  80  7333.04 7503.99 7438.18
>> 
>> 1048576 40  136

Re: [OMPI devel] OMPI 1.4.3 hangs in gather

2011-01-18 Thread Jeff Squyres
+1.  I'm afraid I don't know why offhand there would be such differences.  I'm 
thinking that you'll need to dive a little deeper to figure it out; sorry.  :-(


On Jan 16, 2011, at 10:54 AM, Shamis, Pavel wrote:

> Well, then I would suspect rdmacm vs oob QP configuration. They supposed to 
> be the same, but  probably it's some bug there, and  somehow rdmacm QP tuning 
> different from oob, it is potential source cause for the performance 
> differences that you see.
> 
> Regards,
> Pavel (Pasha) Shamis
> ---
> Application Performance Tools Group
> Computer Science and Math Division
> Oak Ridge National Laboratory
> 
> 
> 
> 
> 
> 
> On Jan 16, 2011, at 4:12 AM, Doron Shoham wrote:
> 
>> Hi,
>> 
>> The gather hangs only in liner_sync algorithm but works with
>> basic_linear and binomial algorithms.
>> The gather algorithm is choosen dynamiclly depanding on block size and
>> communicator size.
>> So, in the beginning, binomial algorithm is chosen (communicator size
>> is larger then 60).
>> When increasing the message size, the liner_sync algorithm is chosen
>> (with small_segment_size).
>> When debugging on the cluster I saw that the linear_sync function is
>> called in endless loop with segment size of 1024.
>> This explain why hang occure in the middle of the run.
>> 
>> I still don't understand why does RDMACM solve it or what causes
>> liner_sync hangs.
>> 
>> Again, in 1.5 it doesn't hang (maybe timing is different?).
>> I'm still trying to understand what are the diffrences in those areas
>> between 1.4.3 and 1.5
>> 
>> 
>> BTW,
>> Choosing RDMACM fixes hangs and performance issues in all collective 
>> operations.
>> 
>> Thanks,
>> Doron
>> 
>> 
>> On Thu, Jan 13, 2011 at 9:44 PM, Shamis, Pavel  wrote:
>>> RDMACM creates the same QPs with the same tunings as OOB, so I don't see 
>>> how CPC may effect on performance.
>>> 
>>> Pavel (Pasha) Shamis
>>> ---
>>> Application Performance Tools Group
>>> Computer Science and Math Division
>>> Oak Ridge National Laboratory
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Jan 13, 2011, at 2:15 PM, Jeff Squyres wrote:
>>> 
 +1 on what Pasha said -- if using rdmacm fixes the problem, then there's 
 something else nefarious going on...
 
 You might want to check padb with your hangs to see where all the 
 processes are hung to see if anything obvious jumps out.  I'd be surprised 
 if there's a bug in the oob cpc; it's been around for a long, long time; 
 it should be pretty stable.
 
 Do we create QP's differently between oob and rdmacm, such that perhaps 
 they are "better" (maybe better routed, or using a different SL, or ...) 
 when created via rdmacm?
 
 
 On Jan 12, 2011, at 12:12 PM, Shamis, Pavel wrote:
 
> RDMACM or OOB can not effect on performance of this benchmark, since they 
> are not involved in communication. So I'm not sure that the performance 
> changes that you see are related to connection manager changes.
> About oob - I'm not aware about hangs issue there, the code is very-very 
> old, we did not touch it for a long time.
> 
> Regards,
> 
> Pavel (Pasha) Shamis
> ---
> Application Performance Tools Group
> Computer Science and Math Division
> Oak Ridge National Laboratory
> Email: sham...@ornl.gov
> 
> 
> 
> 
> 
> On Jan 12, 2011, at 8:45 AM, Doron Shoham wrote:
> 
>> Hi,
>> 
>> For the first problem, I can see that when using rdmacm as openib oob
>> I get much better performence results (and no hangs!).
>> 
>> mpirun -display-map -np 64 -machinefile voltairenodes -mca btl
>> sm,self,openib -mca btl_openib_connect_rdmacm_priority 100
>> imb/src/IMB-MPI1 gather -npmin 64
>> 
>> 
>> #bytes  #repetitionst_min[usec] t_max[usec] 
>> t_avg[usec]
>> 
>> 0   10000.040.050.05
>> 
>> 1   100019.64   19.69   19.67
>> 
>> 2   100019.97   20.02   19.99
>> 
>> 4   100021.86   21.96   21.89
>> 
>> 8   100022.87   22.94   22.90
>> 
>> 16  100024.71   24.80   24.76
>> 
>> 32  100027.23   27.32   27.27
>> 
>> 64  100030.96   31.06   31.01
>> 
>> 128 100036.96   37.08   37.02
>> 
>> 256 100042.64   42.79   42.72
>> 
>> 512 100060.32   60.59   60.46
>> 
>> 1024100082.44   82.74   82.59
>> 
>> 20481000497.66  499.62  498.70
>> 
>> 40961000684.15  686.47  685.33
>> 
>> 8192519 544.07  546.68  545.85
>> 
>> 16384   519 653.20  656.23  655.27
>> 
>> 32768   519 704.48  707.55  706.60
>> 
>>

Re: [OMPI devel] OMPI 1.4.3 hangs in gather

2011-01-18 Thread Terry Dontje
On 01/18/2011 07:48 AM, Jeff Squyres wrote:
> IBCM is broken and disabled (has been for a long time).
>
> Did you mean RDMACM?
>
>
No I think I meant OMPI oob.

sorry,

-- 
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI devel] RFC: Bring the lastest ROMIO version from MPICH2-1.3 into the trunk

2011-01-18 Thread Jeff Squyres
Hmm.  That looks like a merge gone bad; I'm not sure what happened there.  It 
could well be an artifact of traversing from 1.5 to 1.4, or something like 
that.  

I would not re-remove these files.


On Jan 17, 2011, at 11:11 AM, Pascal Deveze wrote:

> Jeff,
> 
> You removed the following files 
> (https://bitbucket.org/devezep/new-romio-for-openmpi/changeset/9b8f70de722d). 
> I see that they are in the trunk. Shall I remove them again ?
> 
> HACKING
> config/Makefile.options
> config/libltdl-preopen-error.diff
> config/lt224-icc.diff
> config/mca_acinclude.m4
> config/mca_configure.ac
> config/mca_make_configure.pl
> config/ompi_check_libfca.m4
> config/ompi_ensure_contains_optflags.m4
> config/ompi_ext.m4
> config/ompi_microsoft.m4
> config/ompi_setup_component_package.m4
> config/ompi_strip_optflags.m4
> contrib/check_unnecessary_headers.sh
> contrib/code_counter.pl
> contrib/copyright.pl
> contrib/dist/find-copyrights.pl
> contrib/dist/gkcommit.pl
> contrib/dist/linux/README
> contrib/dist/linux/README.ompi-spec-generator
> contrib/dist/linux/buildrpm.sh
> contrib/dist/linux/buildswitcherrpm.sh
> contrib/dist/linux/ompi-spec-generator.py
> contrib/dist/linux/openmpi-switcher-modulefile.spec
> contrib/dist/linux/openmpi-switcher-modulefile.tcl
> contrib/dist/make-authors.pl
> contrib/dist/make_tarball
> contrib/find_occurence.pl
> contrib/find_offenders.pl
> contrib/fix_headers.pl
> contrib/fix_indent.pl
> contrib/gen_stats.pl
> contrib/generate_file_list.pl
> contrib/header_replacement.sh
> contrib/headers.txt
> contrib/hg/build-hgignore.pl
> contrib/hg/set-hg-share-perms.csh
> contrib/nightly/build_sample_config.txt
> contrib/nightly/build_tarball.pl
> contrib/nightly/build_tests.pl
> contrib/nightly/check_devel_headers.pl
> contrib/nightly/create_tarball.sh
> contrib/nightly/illegal_symbols_report.pl
> contrib/nightly/ompi_cronjob.sh
> contrib/nightly/unimplemented_report.sh
> contrib/ompi_cplusplus.sed
> contrib/ompi_cplusplus.sh
> contrib/ompi_cplusplus.txt
> contrib/platform/cisco/ebuild/hlfr
> contrib/platform/cisco/ebuild/hlfr.conf
> 
> 
> Pascal Deveze a écrit :
>> The bitbucket tree (https://bitbucket.org/devezep/new-romio-for-openmpi) has 
>> just been updated with the open-mpi trunk.
>> 
>> 
>> I have made three patches:
>> 
>> hg out
>> comparing with ssh://h...@bitbucket.org/devezep/new-romio-for-openmpi
>> searching for changes
>> changeset:   25:3e677102a125
>> user:Pascal Deveze 
>> date:Mon Jan 17 13:40:10 2011 +0100
>> summary: Remove all files
>> 
>> changeset:   26:e3989f46f83a
>> user:Pascal Deveze 
>> date:Mon Jan 17 14:46:48 2011 +0100
>> summary: Import from http://svn.open-mpi.org/svn/ompi/trunki (r24256)
>> 
>> changeset:   27:97f54ec8a575
>> tag: tip
>> user:Pascal Deveze 
>> date:Mon Jan 17 16:14:06 2011 +0100
>> summary: New Romio
>> 
>> I have tested the result and the ROMIO tests are OK.
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] RFC: Bring the lastest ROMIO version from MPICH2-1.3 into the trunk

2011-01-18 Thread Jeff Squyres
IMHO, it's (much) easier to get an SVN checkout of the tree you're trying to 
sync with and then follow the procedures on that wiki page for SVN + Mercurial 
interaction.  This allows two things:

1. You can easily stay up-to-date with SVN changes, even on release branches.
2. You can easily/directly commit back to SVN when ready (no need for 
additional patch files, etc.).

It looks like you manually updated the hg repo to be v1.5, so I guess we can go 
from there (i.e., I'll review and send feedback).  But in the future, you might 
want to try the above procedures, instead.

That being said, I think we also need to update the SVN trunk to be the same 
version as what is going into v1.5 (trunk will eventually branch to become 
v1.7).  Can you make an hg tree for trunk+new ROMIO?  I'm sorry to ask for 
this, but we do need a way to go forward... :-\  Hopefully, it should be 
straightforward to apply the new ROMIO to the trunk (especially since you've 
already done the work to apply it to v1.4 and v1.5!).

It's unfortunate that there's such a divergence between trunk and the v1.5 
branch right now, but you *might* only have to adapt the build system stuff, 
per the email I sent to devel the other day (since not much else has changed in 
the IO MCA framework for a long time).

FWIW, most people usually develop on the trunk and then port to a release 
branch.  It's *usually* easier this way.  Per above, there's a bit of 
divergence between the trunk and v1.5 right now, making this a little harder 
than it should be, but it's not too bad (especially for the IO MCA framework, 
as noted above).  That being said, it's probably not too hard to port from the 
v1.5 branch to the trunk, either.



On Jan 17, 2011, at 11:00 AM, Pascal Deveze wrote:

> The bitbucket tree (https://bitbucket.org/devezep/new-romio-for-openmpi) has 
> just been updated with the open-mpi trunk.
> 
> 
> I have made three patches:
> 
> hg out
> comparing with ssh://h...@bitbucket.org/devezep/new-romio-for-openmpi
> searching for changes
> changeset:   25:3e677102a125
> user:Pascal Deveze 
> date:Mon Jan 17 13:40:10 2011 +0100
> summary: Remove all files
> 
> changeset:   26:e3989f46f83a
> user:Pascal Deveze 
> date:Mon Jan 17 14:46:48 2011 +0100
> summary: Import from http://svn.open-mpi.org/svn/ompi/trunki (r24256)
> 
> changeset:   27:97f54ec8a575
> tag: tip
> user:Pascal Deveze 
> date:Mon Jan 17 16:14:06 2011 +0100
> summary: New Romio
> 
> I have tested the result and the ROMIO tests are OK.
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] RFC: Bring the lastest ROMIO version from MPICH2-1.3 into the trunk

2011-01-18 Thread Jeff Squyres
On Jan 18, 2011, at 9:31 AM, Jeff Squyres wrote:

> It looks like you manually updated the hg repo to be v1.5, so I guess we can 
> go from there (i.e., I'll review and send feedback).  But in the future, you 
> might want to try the above procedures, instead.

Wrong -- you updated it to the SVN trunk; my bad.

Ok, let's try this one out and when ready, move it to the v1.5 branch.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




[OMPI devel] New ROMIO

2011-01-18 Thread Jeff Squyres
Pascal's HG tree looks good to me.  

https://bitbucket.org/devezep/new-romio-for-openmpi

I think it should be committed back to the SVN trunk for more wide-spread 
testing (e.g., on non-NFS filesystems -- I don't have any parallel 
filesystems).  

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




[OMPI devel] Forgot to mention on the call...

2011-01-18 Thread Jeff Squyres
As previously discussed, Ralph is stepping back from being the v1.4 GK because 
of other time commitments.

I was reminded that I was the backup 1.4 GK, so I'll be stepping in.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/