Re: [OMPI devel] OMPI alltoall memory footprint

2006-11-28 Thread Li-Ta Lo
On Mon, 2006-11-27 at 17:21 -0800, Matt Leininger wrote:
> On Mon, 2006-11-27 at 16:45 -0800, Matt Leininger wrote:
> > Has anyone testing OMPI's alltoall at > 2000 MPI tasks?  I'm seeing each
> > MPI task eat up > 1GB of memory (just for OMPI - not the app).  
> 
>   I gathered some more data using the alltoall benchmark in mpiBench.
> mpiBench is pretty smart about how large its buffers are.  I set it to
> use <= 100MB.
> 
>  num nodesnum MPI tasks   system mem  mpibench buffer mem
>128   1024  1   GB  65 MB
>160   1280  1.2 GB  82 MB
>192   1536  1.4 GB  98 MB
>224   1792  1.6 GB  57 MB
>256   2048  1.6-1.8 GB   < 100 MB 
> 
> The 256 node run was killed by the OOM for using too much memory.  For
> all these tests the OMPI alltoall is using 1 GB or more of system
> memory.  I know LANL is looking into optimized alltoall, but is anyone
> looking into the scalability of the memory footprint?
> 

I am the one who is looking into those collective communications. Which
mca/coll are you using for alltoall? Does the OOM killer kick in when
calling other collective routines? If it is a problem caused by SM 
files, all collectives should be affected.

Ollie




Re: [OMPI devel] OMPI alltoall memory footprint

2006-11-28 Thread Li-Ta Lo
On Tue, 2006-11-28 at 09:28 -0800, Matt Leininger wrote:
> On Tue, 2006-11-28 at 10:00 -0700, Li-Ta Lo wrote:
> > On Mon, 2006-11-27 at 17:21 -0800, Matt Leininger wrote:
> > > On Mon, 2006-11-27 at 16:45 -0800, Matt Leininger wrote:
> > > > Has anyone testing OMPI's alltoall at > 2000 MPI tasks?  I'm seeing each
> > > > MPI task eat up > 1GB of memory (just for OMPI - not the app).  
> > > 
> > >   I gathered some more data using the alltoall benchmark in mpiBench.
> > > mpiBench is pretty smart about how large its buffers are.  I set it to
> > > use <= 100MB.
> > > 
> > >  num nodesnum MPI tasks   system mem  mpibench buffer mem
> > >128   1024  1   GB  65 MB
> > >160   1280  1.2 GB  82 MB
> > >192   1536  1.4 GB  98 MB
> > >224   1792  1.6 GB  57 MB
> > >256   2048  1.6-1.8 GB   < 100 MB 
> > > 
> > > The 256 node run was killed by the OOM for using too much memory.  For
> > > all these tests the OMPI alltoall is using 1 GB or more of system
> > > memory.  I know LANL is looking into optimized alltoall, but is anyone
> > > looking into the scalability of the memory footprint?
> > > 
> > 
> > I am the one who is looking into those collective communications. Which
> > mca/coll are you using for alltoall? 
> 
>The ompi_info output had some mca/coll information in it.   I'm not
> sure which mca/coll parameter you are interested in.
> 

Could you try "mpirun -mca coll basic mpibench"?

Ollie




Re: [OMPI devel] OMPI alltoall memory footprint

2006-11-28 Thread Li-Ta Lo
On Mon, 2006-11-27 at 17:21 -0800, Matt Leininger wrote:
> On Mon, 2006-11-27 at 16:45 -0800, Matt Leininger wrote:
> > Has anyone testing OMPI's alltoall at > 2000 MPI tasks?  I'm seeing each
> > MPI task eat up > 1GB of memory (just for OMPI - not the app).  
> 
>   I gathered some more data using the alltoall benchmark in mpiBench.
> mpiBench is pretty smart about how large its buffers are.  I set it to
> use <= 100MB.
> 
>  num nodesnum MPI tasks   system mem  mpibench buffer mem
>128   1024  1   GB  65 MB
>160   1280  1.2 GB  82 MB
>192   1536  1.4 GB  98 MB
>224   1792  1.6 GB  57 MB
>256   2048  1.6-1.8 GB   < 100 MB 
> 
> The 256 node run was killed by the OOM for using too much memory.  For
> all these tests the OMPI alltoall is using 1 GB or more of system
> memory.  I know LANL is looking into optimized alltoall, but is anyone
> looking into the scalability of the memory footprint?
> 


Can you "cat /proc/pid/smaps" when running the MPI job?

Ollie




Re: [OMPI devel] 1.2b3 fails on bluesteel

2007-01-19 Thread Li-Ta Lo
On Fri, 2007-01-19 at 13:25 -0700, Greg Watson wrote:
> Bluesteel is a 64bit bproc machine. I configured with:
> 
> ./configure --with-devel-headers --disable-shared --enable-static
> 
> When I attempt to run an MPI program:
> 
> [bluesteel.lanl.gov:28663] [0,0,0] ORTE_ERROR_LOG: Not available in  
> file ras_bjs.c at line 247
>  
> --
> The bproc PLS component was not able to launch all the processes on  
> the remote
> nodes and therefore cannot continue.
> 
> On node 0 the process pid was -2 and errno was set to 11.
> 

Shared lib?

Ollie

> For reference, we tried to launch ./x
>  
> --
> [bluesteel.lanl.gov:28663] [0,0,0] ORTE_ERROR_LOG: Error in file  
> pls_bproc.c at line 943
> [bluesteel.lanl.gov:28663] [0,0,0] ORTE_ERROR_LOG: Error in file  
> pls_bproc.c at line 1141
> [bluesteel.lanl.gov:28663] [0,0,0] ORTE_ERROR_LOG: Error in file  
> rmgr_urm.c at line 460
> [bluesteel.lanl.gov:28663] mpirun: spawn failed with errno=-1
> [n0:28664] OOB: Connection to HNP lost
> 
> Output from ompi_info:
> 
>  Open MPI: 1.2b3
> Open MPI SVN revision: r13112
>  Open RTE: 1.2b3
> Open RTE SVN revision: r13112
>  OPAL: 1.2b3
> OPAL SVN revision: r13112
>Prefix: /users/gwatson/ompi_1.2b3
> Configured architecture: x86_64-unknown-linux-gnu
> Configured by: gwatson
> Configured on: Fri Jan 19 12:52:21 MST 2007
>Configure host: bluesteel.lanl.gov
>  Built by: gwatson
>  Built on: Fri Jan 19 13:07:21 MST 2007
>Built host: bluesteel.lanl.gov
>C bindings: yes
>  C++ bindings: yes
>Fortran77 bindings: yes (all)
>Fortran90 bindings: yes
> Fortran90 bindings size: small
>C compiler: gcc
>   C compiler absolute: /usr/bin/gcc
>  C++ compiler: g++
> C++ compiler absolute: /usr/bin/g++
>Fortran77 compiler: gfortran
>Fortran77 compiler abs: /usr/bin/gfortran
>Fortran90 compiler: gfortran
>Fortran90 compiler abs: /usr/bin/gfortran
>   C profiling: yes
> C++ profiling: yes
>   Fortran77 profiling: yes
>   Fortran90 profiling: yes
>C++ exceptions: no
>Thread support: posix (mpi: no, progress: no)
>Internal debug support: no
>   MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: no
>   libltdl support: yes
> Heterogeneous support: yes
> mpirun default --prefix: no
> MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2)
>MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component  
> v1.2)
> MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2)
> MCA maffinity: first_use (MCA v1.0, API v1.0, Component  
> v1.2)
> MCA timer: linux (MCA v1.0, API v1.0, Component v1.2)
> MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
> MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
>  MCA coll: basic (MCA v1.0, API v1.0, Component v1.2)
>  MCA coll: self (MCA v1.0, API v1.0, Component v1.2)
>  MCA coll: sm (MCA v1.0, API v1.0, Component v1.2)
>  MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2)
>MCA io: romio (MCA v1.0, API v1.0, Component v1.2)
> MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2)
>   MCA pml: cm (MCA v1.0, API v1.0, Component v1.2)
>   MCA pml: dr (MCA v1.0, API v1.0, Component v1.2)
>   MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2)
>   MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2)
>MCA rcache: rb (MCA v1.0, API v1.0, Component v1.2)
>MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2)
>   MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2)
>   MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2)
>   MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
>  MCA topo: unity (MCA v1.0, API v1.0, Component v1.2)
>   MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2)
>   MCA osc: rdma (MCA v1.0, API v1.0, Component v1.2)
>MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2)
>MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2)
>MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2)
>MCA errmgr: bproc (MCA v1.0, API v1.3, Component v1.2)
>   MCA gpr: null (MCA v1.0, API v1.0, Component v1.2)
>   MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2)
>   MCA gpr: replica (MCA v1.0, API v1.0, Compon

Re: [OMPI devel] 1.2b3 fails on bluesteel

2007-01-19 Thread Li-Ta Lo
On Fri, 2007-01-19 at 14:42 -0700, Greg Watson wrote:
> 
> The libraries required by the program are:
> 
> $ ldd x
>  librt.so.1 => /lib64/tls/librt.so.1 (0x2abc1000)
>  libbproc.so.4 => /usr/lib64/libbproc.so.4 (0x2acdb000)
>  libdl.so.2 => /lib64/libdl.so.2 (0x2ade2000)
>  libnsl.so.1 => /lib64/libnsl.so.1 (0x2aee5000)
>  libutil.so.1 => /lib64/libutil.so.1 (0x2affc000)
>  libm.so.6 => /lib64/tls/libm.so.6 (0x2b10)
>  libpthread.so.0 => /lib64/tls/libpthread.so.0  
> (0x2b286000)
>  libc.so.6 => /lib64/tls/libc.so.6 (0x2b39b000)
>  /lib64/ld-linux-x86-64.so.2 (0x2aaab000)
> 
> These all appear to be available on the nodes.
> 

I tried a recent (today/yesterday?) svn trunk. It works but it is
very slow (I am using tcp now).

Ollie




Re: [OMPI devel] Is it possible to get BTL transport work directly with MPI level

2007-04-03 Thread Li-Ta Lo
On Sun, 2007-04-01 at 13:12 -0600, Ralph Castain wrote:

> 
> 2. I'm not sure what you mean by mapping MPI processes to "physical"
> processes, but I assume you mean how do we assign MPI ranks to processes on
> specific nodes. You will find that done in the orte/mca/rmaps framework. We
> currently only have one component in that framework - the round-robin
> implementation - that maps either by slot or by node, as indicated by the
> user. That code is fairly heavily commented, so you hopefully can understand
> what it is doing.
> 

How does this work in a multi-core environment? the optimal way may be
putting processes on every other "slot" on a two cores system?

Ollie




Re: [OMPI devel] Is it possible to get BTL transport work directly with MPI level

2007-04-03 Thread Li-Ta Lo
On Tue, 2007-04-03 at 12:33 -0600, Ralph H Castain wrote:
> 
> 
> On 4/3/07 9:32 AM, "Li-Ta Lo"  wrote:
> 
> > On Sun, 2007-04-01 at 13:12 -0600, Ralph Castain wrote:
> > 
> >> 
> >> 2. I'm not sure what you mean by mapping MPI processes to "physical"
> >> processes, but I assume you mean how do we assign MPI ranks to processes on
> >> specific nodes. You will find that done in the orte/mca/rmaps framework. We
> >> currently only have one component in that framework - the round-robin
> >> implementation - that maps either by slot or by node, as indicated by the
> >> user. That code is fairly heavily commented, so you hopefully can 
> >> understand
> >> what it is doing.
> >> 
> > 
> > How does this work in a multi-core environment? the optimal way may be
> > putting processes on every other "slot" on a two cores system?
> 
> Well, that's a good question. At the moment, the only environments where we
> encounter multiple cores treat each core as a separate "slot" when they
> assign resources. We don't currently provide an option that says "map by
> two", so the only way to do what you describe would be to manually specify
> the mapping, slot by slot.
> 

I also don't understand how Paffinity work for this case. When orted
launch N processes on a node, does it have control on how those 
processes are started and mapped to the core/processor? Or is it
the case that O.S. puts the process on whatever cores it picks and
the paffinity module will try to "pin" the process on the core (picked
by O.S.)?

> Not very pretty.
> 
> If someone cares to suggest some alternative notation/option for requesting
> that kind of mapping flexibility, I'm certainly willing to implement it (it
> would be rather trivial to do "map by N", but might be more complicated if
> you want other things).
> 

What is the current syntax of the config file/command line? Can we do 
something like array index in those script languages e.g. [0:N:2]?

Ollie




Re: [OMPI devel] Collectives interface change

2007-08-13 Thread Li-Ta Lo
On Thu, 2007-08-09 at 14:49 -0600, Brian Barrett wrote:
> Hi all -
> 
> There was significant discussion this week at the collectives meeting  
> about improving the selection logic for collective components.  While  
> we'd like the automated collectives selection logic laid out in the  
> Collv2 document, it was decided that as a first step, we would allow  
> more than one + basic compnents to be used for a given communicator.
> 
> This mandated the change of a couple of things in the collectives  
> interface, namely how collectives module data is found (passed into a  
> function, rather tha a static pointer on the component) and a bit of  
> the initialization sequence.
> 
> The revised interface and the rest of the code is available in an svn  
> temp branch:
> 
>  https://svn.open-mpi.org/svn/ompi/tmp/bwb-coll-select
> 
> Thus far, most of the components in common use have been updated.   
> The notable exception is the tuned collectives routine, which Ollie  
> is updating in the near future.
> 
> If you have any comments on the changes, please let me know.  If not,  
> the changes will move to the trunk once Ollie is completed with  
> updating the tuned component.
> 


Done. 


Ollie




Re: [OMPI devel] Maximum Shared Memory Segment - OK to increase?

2007-08-28 Thread Li-Ta Lo
On Mon, 2007-08-27 at 15:10 -0400, Rolf vandeVaart wrote:
> We are running into a problem when running on one of our larger SMPs
> using the latest Open MPI v1.2 branch.  We are trying to run a job
> with np=128 within a single node.  We are seeing the following error:
> 
> "SM failed to send message due to shortage of shared memory."
> 
> We then increased the allowable maximum size of the shared segment to
> 2Gigabytes-1 which is the maximum allowed on 32-bit application.  We
> used the mca parameter to increase it as shown here.
> 
> -mca mpool_sm_max_size 2147483647
> 
> This allowed the program to run to completion.  Therefore, we would
> like to increase the default maximum from 512Mbytes to 2G-1 Gigabytes.
> Does anyone have an objection to this change?  Soon we are going to
> have larger CPU counts and would like to increase the odds that things
> work "out of the box" on these large SMPs.
> 


There is a serious problem with the 1.2 branch, it does not allocate
any SM area for each process at the beginning. SM areas are allocated
on demand and if some of the processes are more aggressive than the
others, it will cause starvation. This problem is fixed in the trunk
by assign at least one SM area for each process. I think this is what
you saw (starvation) and an increase of max size may not be necessary.

Ollie




Re: [OMPI devel] Maximum Shared Memory Segment - OK to increase?

2007-08-28 Thread Li-Ta Lo
On Tue, 2007-08-28 at 10:12 -0600, Brian Barrett wrote:
> On Aug 28, 2007, at 9:05 AM, Li-Ta Lo wrote:
> 
> > On Mon, 2007-08-27 at 15:10 -0400, Rolf vandeVaart wrote:
> >> We are running into a problem when running on one of our larger SMPs
> >> using the latest Open MPI v1.2 branch.  We are trying to run a job
> >> with np=128 within a single node.  We are seeing the following error:
> >>
> >> "SM failed to send message due to shortage of shared memory."
> >>
> >> We then increased the allowable maximum size of the shared segment to
> >> 2Gigabytes-1 which is the maximum allowed on 32-bit application.  We
> >> used the mca parameter to increase it as shown here.
> >>
> >> -mca mpool_sm_max_size 2147483647
> >>
> >> This allowed the program to run to completion.  Therefore, we would
> >> like to increase the default maximum from 512Mbytes to 2G-1  
> >> Gigabytes.
> >> Does anyone have an objection to this change?  Soon we are going to
> >> have larger CPU counts and would like to increase the odds that  
> >> things
> >> work "out of the box" on these large SMPs.
> >>
> >
> >
> > There is a serious problem with the 1.2 branch, it does not allocate
> > any SM area for each process at the beginning. SM areas are allocated
> > on demand and if some of the processes are more aggressive than the
> > others, it will cause starvation. This problem is fixed in the trunk
> > by assign at least one SM area for each process. I think this is what
> > you saw (starvation) and an increase of max size may not be necessary.
> 
> Although I'm pretty sure this is fixed in the v1.2 branch already.
> 

It should never happen for the new code. The only way we can get the
message is when MCA_BTL_SM_FIFO_WRITE return rc != OMPI_SUCCESS, but
the new MCA_BTL_SM_FIFO_WRITE always return rc = OMPI_SUCCESS

#define MCA_BTL_SM_FIFO_WRITE(endpoint_peer,
my_smp_rank,peer_smp_rank,hdr,rc) \
do { \
ompi_fifo_t* fifo; \
fifo=&(mca_btl_sm_component.fifo[peer_smp_rank][my_smp_rank]); \
 \
/* thread lock */ \
if(opal_using_threads()) \
opal_atomic_lock(fifo->head_lock); \
/* post fragment */ \
while(ompi_fifo_write_to_head(hdr, fifo, \
mca_btl_sm_component.sm_mpool) != OMPI_SUCCESS) \
opal_progress(); \
MCA_BTL_SM_SIGNAL_PEER(endpoint_peer); \
rc=OMPI_SUCCESS; \
if(opal_using_threads()) \
opal_atomic_unlock(fifo->head_lock); \
} while(0)

Rolf, are you using the really last 1.2 branch?

Ollie




Re: [OMPI devel] SM BTL hang issue

2007-08-29 Thread Li-Ta Lo
On Wed, 2007-08-29 at 11:36 -0400, Terry D. Dontje wrote:
> To run the code I usually do "mpirun -np 6 a.out 10" on a 2 core 
> system.  It'll print out the following and then hang:
> Target duration (seconds): 10.00
> # of messages sent in that time:  589207
> Microseconds per message: 16.972
> 


I know almost nothing about FORTRAN but the stack dump told me
it got NULL pointer reference when accessing the "me" variable
in the do .. while loop. How can this happen?

[ollie@exponential ~]$ mpirun -np 2 a.out 100
[exponential:22145] *** Process received signal ***
[exponential:22145] Signal: Segmentation fault (11)
[exponential:22145] Signal code: Address not mapped (1)
[exponential:22145] Failing at address: (nil)
[exponential:22145] [ 0] [0xb7f2a440]
[exponential:22145] [ 1] a.out(MAIN__+0x54a) [0x804909e]
[exponential:22145] [ 2] a.out(main+0x27) [0x8049127]
[exponential:22145] [ 3] /lib/libc.so.6(__libc_start_main+0xe0)
[0x4e75ef70]
[exponential:22145] [ 4] a.out [0x8048aa1]
[exponential:22145] *** End of error message ***

call MPI_Send(keep_going,1,MPI_LOGICAL,me+1,1,
 $   MPI_COMM_WORLD,ier)
 804909e:   8b 45 d4mov0xffd4(%ebp),%eax
 80490a1:   83 c0 01add$0x1,%eax

It is compiled with g77/g90.

Ollie




Re: [OMPI devel] Maximum Shared Memory Segment - OK to increase?

2007-08-30 Thread Li-Ta Lo
On Thu, 2007-08-30 at 10:26 -0400, rolf.vandeva...@sun.com wrote:
> Li-Ta Lo wrote:
> 
> >On Tue, 2007-08-28 at 10:12 -0600, Brian Barrett wrote:
> >  
> >
> >>On Aug 28, 2007, at 9:05 AM, Li-Ta Lo wrote:
> >>
> >>
> >>
> >>>On Mon, 2007-08-27 at 15:10 -0400, Rolf vandeVaart wrote:
> >>>  
> >>>
> >>>>We are running into a problem when running on one of our larger SMPs
> >>>>using the latest Open MPI v1.2 branch.  We are trying to run a job
> >>>>with np=128 within a single node.  We are seeing the following error:
> >>>>
> >>>>"SM failed to send message due to shortage of shared memory."
> >>>>
> >>>>We then increased the allowable maximum size of the shared segment to
> >>>>2Gigabytes-1 which is the maximum allowed on 32-bit application.  We
> >>>>used the mca parameter to increase it as shown here.
> >>>>
> >>>>-mca mpool_sm_max_size 2147483647
> >>>>
> >>>>This allowed the program to run to completion.  Therefore, we would
> >>>>like to increase the default maximum from 512Mbytes to 2G-1  
> >>>>Gigabytes.
> >>>>Does anyone have an objection to this change?  Soon we are going to
> >>>>have larger CPU counts and would like to increase the odds that  
> >>>>things
> >>>>work "out of the box" on these large SMPs.
> >>>>
> >>>>
> >>>>
> >>>There is a serious problem with the 1.2 branch, it does not allocate
> >>>any SM area for each process at the beginning. SM areas are allocated
> >>>on demand and if some of the processes are more aggressive than the
> >>>others, it will cause starvation. This problem is fixed in the trunk
> >>>by assign at least one SM area for each process. I think this is what
> >>>you saw (starvation) and an increase of max size may not be necessary.
> >>>  
> >>>
> >>Although I'm pretty sure this is fixed in the v1.2 branch already.
> >>
> >>
> >>
> >
> >It should never happen for the new code. The only way we can get the
> >message is when MCA_BTL_SM_FIFO_WRITE return rc != OMPI_SUCCESS, but
> >the new MCA_BTL_SM_FIFO_WRITE always return rc = OMPI_SUCCESS
> >
> >#define MCA_BTL_SM_FIFO_WRITE(endpoint_peer,
> >my_smp_rank,peer_smp_rank,hdr,rc) \
> >do { \
> >ompi_fifo_t* fifo; \
> >fifo=&(mca_btl_sm_component.fifo[peer_smp_rank][my_smp_rank]); \
> > \
> >/* thread lock */ \
> >if(opal_using_threads()) \
> >opal_atomic_lock(fifo->head_lock); \
> >/* post fragment */ \
> >while(ompi_fifo_write_to_head(hdr, fifo, \
> >mca_btl_sm_component.sm_mpool) != OMPI_SUCCESS) \
> >opal_progress(); \
> >MCA_BTL_SM_SIGNAL_PEER(endpoint_peer); \
> >rc=OMPI_SUCCESS; \
> >if(opal_using_threads()) \
> >opal_atomic_unlock(fifo->head_lock); \
> >} while(0)
> >
> >Rolf, are you using the really last 1.2 branch?
> >
> >Ollie
> >
> >  
> >
> Thanks for all the input.  It turns out I was originally *not* using
> the latest 1.2 branch.  So, we redid the tests with the latest 1.2.
> And, I am happy to report that we no longer get the"SM failed to
> send message due to shortage of shared memory" error.  However,
> now the program hangs.  So, it looks like we traded one problem for
> another.
> 

Can I see your test code?

Ollie




Re: [OMPI devel] SM BTL hang issue

2007-08-30 Thread Li-Ta Lo
On Wed, 2007-08-29 at 14:06 -0400, Terry D. Dontje wrote:
> hmmm, interesting since my version doesn't abort at all.
> 


Some problem with fortran compiler/language binding? My C translation 
doesn't have any problem.

[ollie@exponential ~]$ mpirun -np 4 a.out 10
Target duration (seconds): 10.00, #of msgs: 50331, usec per msg:
198.684707

Ollie

#include 
#include 
#include 

int main(int argc, char *argv[])
{
double duration = 10, endtime;
long nmsgs = 1;
int keep_going = 1, rank, size;
MPI_Status status;

MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);

if (size == 1) {
	fprintf(stderr, "Need at least 2 processes\n");
} else if (rank == 0) {
	duration = strtod(argv[1], NULL);
	endtime = MPI_Wtime() + duration;

	do {
	MPI_Send(&keep_going, 1, MPI_INT, 1, 0x11, MPI_COMM_WORLD);
	nmsgs += 1;
	} while (MPI_Wtime() < endtime);

	keep_going = 0;
	MPI_Send(&keep_going, 1, MPI_INT, 1, 0x11, MPI_COMM_WORLD);

	fprintf(stderr, "Target duration (seconds): %f, #of msgs: %d, usec per msg: %f\n",
		duration, nmsgs, 1.0e6*duration/nmsgs);
} else {
	do {
	MPI_Recv(&keep_going, 1, MPI_INT, rank-1, 0x11, MPI_COMM_WORLD, &status);

	if (rank == (size-1))
		continue;

	MPI_Send(&keep_going, 1, MPI_INT, rank+1, 0x11, MPI_COMM_WORLD);
	} while (keep_going);
}

MPI_Finalize();

}


Re: [OMPI devel] SM BTL hang issue

2007-08-30 Thread Li-Ta Lo
On Thu, 2007-08-30 at 12:25 -0400, terry.don...@sun.com wrote:
> Li-Ta Lo wrote:
> 
> >On Wed, 2007-08-29 at 14:06 -0400, Terry D. Dontje wrote:
> >  
> >
> >>hmmm, interesting since my version doesn't abort at all.
> >>
> >>
> >>
> >
> >
> >Some problem with fortran compiler/language binding? My C translation 
> >doesn't have any problem.
> >
> >[ollie@exponential ~]$ mpirun -np 4 a.out 10
> >Target duration (seconds): 10.00, #of msgs: 50331, usec per msg:
> >198.684707
> >
> >  
> >
> Did you oversubscribe?  I found np=10 on a 8 core system clogged things 
> up sufficiently.
> 


Yea, I used np 10 on a 2 proc, 2 hyper-thread system (total 4 threads).

Ollie





Re: [OMPI devel] SM BTL hang issue

2007-08-30 Thread Li-Ta Lo
On Thu, 2007-08-30 at 12:45 -0400, terry.don...@sun.com wrote:
> Li-Ta Lo wrote:
> 
> >On Thu, 2007-08-30 at 12:25 -0400, terry.don...@sun.com wrote:
> >  
> >
> >>Li-Ta Lo wrote:
> >>
> >>
> >>
> >>>On Wed, 2007-08-29 at 14:06 -0400, Terry D. Dontje wrote:
> >>> 
> >>>
> >>>  
> >>>
> >>>>hmmm, interesting since my version doesn't abort at all.
> >>>>
> >>>>   
> >>>>
> >>>>
> >>>>
> >>>Some problem with fortran compiler/language binding? My C translation 
> >>>doesn't have any problem.
> >>>
> >>>[ollie@exponential ~]$ mpirun -np 4 a.out 10
> >>>Target duration (seconds): 10.00, #of msgs: 50331, usec per msg:
> >>>198.684707
> >>>
> >>> 
> >>>
> >>>  
> >>>
> >>Did you oversubscribe?  I found np=10 on a 8 core system clogged things 
> >>up sufficiently.
> >>
> >>
> >>
> >
> >
> >Yea, I used np 10 on a 2 proc, 2 hyper-thread system (total 4 threads).
> >
> >  
> >
> Is this using Linux?
> 


Yes.

Ollie




Re: [OMPI devel] Any info regarding sm availible?

2007-10-03 Thread Li-Ta Lo
On Wed, 2007-10-03 at 12:43 +0200, Torje Henriksen wrote:
> Hi everyone,
> 
> 
> I'm a student at the University of Tromso, and I'm trying to 
> modify the shared memory component in the bit transfer layer 
> (ompi/mca/btl/sm), and also the queues that this 
> component uses.
> 
> I was wondering if you could point me to any information regarding these 
> components. I got the source code of course, but other than that.
> 
> I would also like to ask some more or less specific questions about these 
> and probably other parts of Open MPI. Is this the right place for such 
> questions?
> 
> Thanks for your time, open mpi seems like a very nice project, and it's 
> fun to be able to mess around with it :)
> 
> 

I have some Dia graph showing some data structures used by sm btl.
Do you want it?

Ollie




Re: [OMPI devel] Moving fragments in btl sm

2007-11-08 Thread Li-Ta Lo
On Thu, 2007-11-08 at 13:38 +0100, Torje Henriksen wrote:
> Hi,
> 
> I have a question that I shouldn't need to ask, but I'm 
> kind of lost in the code.
> 
> The btl sm component is using the circular buffers to write and read 
> fragments (sending and receiving).
> 
> In the write_to_head and read_from_tail I can only see pointers beeing set, 
> no data being moved. So where does the actual data movement/copying take 
> place? I'm thinking maybe a callback function existing somewhere :)
> 
> 
> Thank you for your help now and earlier.
> 

You are right. The "real thing" happens at the
mca_btl_sm_component_progess(). The PML/BML will call btl_register() 
to register callback function to be called when a frag is received.
In the event loop, the progress() function is called periodically to
check if there is any new frag arrived. It is complicated a little
bit by the fact that to transmit each "data" frag, there is a round
trip and two "frags" are exchanged. The send side sends the "data"
frag with header type SEND to the receiver. The receiver calls the
callback function the handle the frag and send back an ACK frag. Upon
receiving the ACK frag, the send side calls the des_cbfunc() to tell
the upper layer the the sending of this frag is completed. 

BTW, it looks like it is still list append/remove in the PML/BML layer.
I don't know when/where the real "copying" happens.

Ollie