Re: [OMPI devel] OMPI alltoall memory footprint
On Mon, 2006-11-27 at 17:21 -0800, Matt Leininger wrote: > On Mon, 2006-11-27 at 16:45 -0800, Matt Leininger wrote: > > Has anyone testing OMPI's alltoall at > 2000 MPI tasks? I'm seeing each > > MPI task eat up > 1GB of memory (just for OMPI - not the app). > > I gathered some more data using the alltoall benchmark in mpiBench. > mpiBench is pretty smart about how large its buffers are. I set it to > use <= 100MB. > > num nodesnum MPI tasks system mem mpibench buffer mem >128 1024 1 GB 65 MB >160 1280 1.2 GB 82 MB >192 1536 1.4 GB 98 MB >224 1792 1.6 GB 57 MB >256 2048 1.6-1.8 GB < 100 MB > > The 256 node run was killed by the OOM for using too much memory. For > all these tests the OMPI alltoall is using 1 GB or more of system > memory. I know LANL is looking into optimized alltoall, but is anyone > looking into the scalability of the memory footprint? > I am the one who is looking into those collective communications. Which mca/coll are you using for alltoall? Does the OOM killer kick in when calling other collective routines? If it is a problem caused by SM files, all collectives should be affected. Ollie
Re: [OMPI devel] OMPI alltoall memory footprint
On Tue, 2006-11-28 at 09:28 -0800, Matt Leininger wrote: > On Tue, 2006-11-28 at 10:00 -0700, Li-Ta Lo wrote: > > On Mon, 2006-11-27 at 17:21 -0800, Matt Leininger wrote: > > > On Mon, 2006-11-27 at 16:45 -0800, Matt Leininger wrote: > > > > Has anyone testing OMPI's alltoall at > 2000 MPI tasks? I'm seeing each > > > > MPI task eat up > 1GB of memory (just for OMPI - not the app). > > > > > > I gathered some more data using the alltoall benchmark in mpiBench. > > > mpiBench is pretty smart about how large its buffers are. I set it to > > > use <= 100MB. > > > > > > num nodesnum MPI tasks system mem mpibench buffer mem > > >128 1024 1 GB 65 MB > > >160 1280 1.2 GB 82 MB > > >192 1536 1.4 GB 98 MB > > >224 1792 1.6 GB 57 MB > > >256 2048 1.6-1.8 GB < 100 MB > > > > > > The 256 node run was killed by the OOM for using too much memory. For > > > all these tests the OMPI alltoall is using 1 GB or more of system > > > memory. I know LANL is looking into optimized alltoall, but is anyone > > > looking into the scalability of the memory footprint? > > > > > > > I am the one who is looking into those collective communications. Which > > mca/coll are you using for alltoall? > >The ompi_info output had some mca/coll information in it. I'm not > sure which mca/coll parameter you are interested in. > Could you try "mpirun -mca coll basic mpibench"? Ollie
Re: [OMPI devel] OMPI alltoall memory footprint
On Mon, 2006-11-27 at 17:21 -0800, Matt Leininger wrote: > On Mon, 2006-11-27 at 16:45 -0800, Matt Leininger wrote: > > Has anyone testing OMPI's alltoall at > 2000 MPI tasks? I'm seeing each > > MPI task eat up > 1GB of memory (just for OMPI - not the app). > > I gathered some more data using the alltoall benchmark in mpiBench. > mpiBench is pretty smart about how large its buffers are. I set it to > use <= 100MB. > > num nodesnum MPI tasks system mem mpibench buffer mem >128 1024 1 GB 65 MB >160 1280 1.2 GB 82 MB >192 1536 1.4 GB 98 MB >224 1792 1.6 GB 57 MB >256 2048 1.6-1.8 GB < 100 MB > > The 256 node run was killed by the OOM for using too much memory. For > all these tests the OMPI alltoall is using 1 GB or more of system > memory. I know LANL is looking into optimized alltoall, but is anyone > looking into the scalability of the memory footprint? > Can you "cat /proc/pid/smaps" when running the MPI job? Ollie
Re: [OMPI devel] 1.2b3 fails on bluesteel
On Fri, 2007-01-19 at 13:25 -0700, Greg Watson wrote: > Bluesteel is a 64bit bproc machine. I configured with: > > ./configure --with-devel-headers --disable-shared --enable-static > > When I attempt to run an MPI program: > > [bluesteel.lanl.gov:28663] [0,0,0] ORTE_ERROR_LOG: Not available in > file ras_bjs.c at line 247 > > -- > The bproc PLS component was not able to launch all the processes on > the remote > nodes and therefore cannot continue. > > On node 0 the process pid was -2 and errno was set to 11. > Shared lib? Ollie > For reference, we tried to launch ./x > > -- > [bluesteel.lanl.gov:28663] [0,0,0] ORTE_ERROR_LOG: Error in file > pls_bproc.c at line 943 > [bluesteel.lanl.gov:28663] [0,0,0] ORTE_ERROR_LOG: Error in file > pls_bproc.c at line 1141 > [bluesteel.lanl.gov:28663] [0,0,0] ORTE_ERROR_LOG: Error in file > rmgr_urm.c at line 460 > [bluesteel.lanl.gov:28663] mpirun: spawn failed with errno=-1 > [n0:28664] OOB: Connection to HNP lost > > Output from ompi_info: > > Open MPI: 1.2b3 > Open MPI SVN revision: r13112 > Open RTE: 1.2b3 > Open RTE SVN revision: r13112 > OPAL: 1.2b3 > OPAL SVN revision: r13112 >Prefix: /users/gwatson/ompi_1.2b3 > Configured architecture: x86_64-unknown-linux-gnu > Configured by: gwatson > Configured on: Fri Jan 19 12:52:21 MST 2007 >Configure host: bluesteel.lanl.gov > Built by: gwatson > Built on: Fri Jan 19 13:07:21 MST 2007 >Built host: bluesteel.lanl.gov >C bindings: yes > C++ bindings: yes >Fortran77 bindings: yes (all) >Fortran90 bindings: yes > Fortran90 bindings size: small >C compiler: gcc > C compiler absolute: /usr/bin/gcc > C++ compiler: g++ > C++ compiler absolute: /usr/bin/g++ >Fortran77 compiler: gfortran >Fortran77 compiler abs: /usr/bin/gfortran >Fortran90 compiler: gfortran >Fortran90 compiler abs: /usr/bin/gfortran > C profiling: yes > C++ profiling: yes > Fortran77 profiling: yes > Fortran90 profiling: yes >C++ exceptions: no >Thread support: posix (mpi: no, progress: no) >Internal debug support: no > MPI parameter check: runtime > Memory profiling support: no > Memory debugging support: no > libltdl support: yes > Heterogeneous support: yes > mpirun default --prefix: no > MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2) >MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component > v1.2) > MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2) > MCA maffinity: first_use (MCA v1.0, API v1.0, Component > v1.2) > MCA timer: linux (MCA v1.0, API v1.0, Component v1.2) > MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) > MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) > MCA coll: basic (MCA v1.0, API v1.0, Component v1.2) > MCA coll: self (MCA v1.0, API v1.0, Component v1.2) > MCA coll: sm (MCA v1.0, API v1.0, Component v1.2) > MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2) >MCA io: romio (MCA v1.0, API v1.0, Component v1.2) > MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2) > MCA pml: cm (MCA v1.0, API v1.0, Component v1.2) > MCA pml: dr (MCA v1.0, API v1.0, Component v1.2) > MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2) > MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2) >MCA rcache: rb (MCA v1.0, API v1.0, Component v1.2) >MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2) > MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2) > MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2) > MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0) > MCA topo: unity (MCA v1.0, API v1.0, Component v1.2) > MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2) > MCA osc: rdma (MCA v1.0, API v1.0, Component v1.2) >MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2) >MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2) >MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2) >MCA errmgr: bproc (MCA v1.0, API v1.3, Component v1.2) > MCA gpr: null (MCA v1.0, API v1.0, Component v1.2) > MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2) > MCA gpr: replica (MCA v1.0, API v1.0, Compon
Re: [OMPI devel] 1.2b3 fails on bluesteel
On Fri, 2007-01-19 at 14:42 -0700, Greg Watson wrote: > > The libraries required by the program are: > > $ ldd x > librt.so.1 => /lib64/tls/librt.so.1 (0x2abc1000) > libbproc.so.4 => /usr/lib64/libbproc.so.4 (0x2acdb000) > libdl.so.2 => /lib64/libdl.so.2 (0x2ade2000) > libnsl.so.1 => /lib64/libnsl.so.1 (0x2aee5000) > libutil.so.1 => /lib64/libutil.so.1 (0x2affc000) > libm.so.6 => /lib64/tls/libm.so.6 (0x2b10) > libpthread.so.0 => /lib64/tls/libpthread.so.0 > (0x2b286000) > libc.so.6 => /lib64/tls/libc.so.6 (0x2b39b000) > /lib64/ld-linux-x86-64.so.2 (0x2aaab000) > > These all appear to be available on the nodes. > I tried a recent (today/yesterday?) svn trunk. It works but it is very slow (I am using tcp now). Ollie
Re: [OMPI devel] Is it possible to get BTL transport work directly with MPI level
On Sun, 2007-04-01 at 13:12 -0600, Ralph Castain wrote: > > 2. I'm not sure what you mean by mapping MPI processes to "physical" > processes, but I assume you mean how do we assign MPI ranks to processes on > specific nodes. You will find that done in the orte/mca/rmaps framework. We > currently only have one component in that framework - the round-robin > implementation - that maps either by slot or by node, as indicated by the > user. That code is fairly heavily commented, so you hopefully can understand > what it is doing. > How does this work in a multi-core environment? the optimal way may be putting processes on every other "slot" on a two cores system? Ollie
Re: [OMPI devel] Is it possible to get BTL transport work directly with MPI level
On Tue, 2007-04-03 at 12:33 -0600, Ralph H Castain wrote: > > > On 4/3/07 9:32 AM, "Li-Ta Lo" wrote: > > > On Sun, 2007-04-01 at 13:12 -0600, Ralph Castain wrote: > > > >> > >> 2. I'm not sure what you mean by mapping MPI processes to "physical" > >> processes, but I assume you mean how do we assign MPI ranks to processes on > >> specific nodes. You will find that done in the orte/mca/rmaps framework. We > >> currently only have one component in that framework - the round-robin > >> implementation - that maps either by slot or by node, as indicated by the > >> user. That code is fairly heavily commented, so you hopefully can > >> understand > >> what it is doing. > >> > > > > How does this work in a multi-core environment? the optimal way may be > > putting processes on every other "slot" on a two cores system? > > Well, that's a good question. At the moment, the only environments where we > encounter multiple cores treat each core as a separate "slot" when they > assign resources. We don't currently provide an option that says "map by > two", so the only way to do what you describe would be to manually specify > the mapping, slot by slot. > I also don't understand how Paffinity work for this case. When orted launch N processes on a node, does it have control on how those processes are started and mapped to the core/processor? Or is it the case that O.S. puts the process on whatever cores it picks and the paffinity module will try to "pin" the process on the core (picked by O.S.)? > Not very pretty. > > If someone cares to suggest some alternative notation/option for requesting > that kind of mapping flexibility, I'm certainly willing to implement it (it > would be rather trivial to do "map by N", but might be more complicated if > you want other things). > What is the current syntax of the config file/command line? Can we do something like array index in those script languages e.g. [0:N:2]? Ollie
Re: [OMPI devel] Collectives interface change
On Thu, 2007-08-09 at 14:49 -0600, Brian Barrett wrote: > Hi all - > > There was significant discussion this week at the collectives meeting > about improving the selection logic for collective components. While > we'd like the automated collectives selection logic laid out in the > Collv2 document, it was decided that as a first step, we would allow > more than one + basic compnents to be used for a given communicator. > > This mandated the change of a couple of things in the collectives > interface, namely how collectives module data is found (passed into a > function, rather tha a static pointer on the component) and a bit of > the initialization sequence. > > The revised interface and the rest of the code is available in an svn > temp branch: > > https://svn.open-mpi.org/svn/ompi/tmp/bwb-coll-select > > Thus far, most of the components in common use have been updated. > The notable exception is the tuned collectives routine, which Ollie > is updating in the near future. > > If you have any comments on the changes, please let me know. If not, > the changes will move to the trunk once Ollie is completed with > updating the tuned component. > Done. Ollie
Re: [OMPI devel] Maximum Shared Memory Segment - OK to increase?
On Mon, 2007-08-27 at 15:10 -0400, Rolf vandeVaart wrote: > We are running into a problem when running on one of our larger SMPs > using the latest Open MPI v1.2 branch. We are trying to run a job > with np=128 within a single node. We are seeing the following error: > > "SM failed to send message due to shortage of shared memory." > > We then increased the allowable maximum size of the shared segment to > 2Gigabytes-1 which is the maximum allowed on 32-bit application. We > used the mca parameter to increase it as shown here. > > -mca mpool_sm_max_size 2147483647 > > This allowed the program to run to completion. Therefore, we would > like to increase the default maximum from 512Mbytes to 2G-1 Gigabytes. > Does anyone have an objection to this change? Soon we are going to > have larger CPU counts and would like to increase the odds that things > work "out of the box" on these large SMPs. > There is a serious problem with the 1.2 branch, it does not allocate any SM area for each process at the beginning. SM areas are allocated on demand and if some of the processes are more aggressive than the others, it will cause starvation. This problem is fixed in the trunk by assign at least one SM area for each process. I think this is what you saw (starvation) and an increase of max size may not be necessary. Ollie
Re: [OMPI devel] Maximum Shared Memory Segment - OK to increase?
On Tue, 2007-08-28 at 10:12 -0600, Brian Barrett wrote: > On Aug 28, 2007, at 9:05 AM, Li-Ta Lo wrote: > > > On Mon, 2007-08-27 at 15:10 -0400, Rolf vandeVaart wrote: > >> We are running into a problem when running on one of our larger SMPs > >> using the latest Open MPI v1.2 branch. We are trying to run a job > >> with np=128 within a single node. We are seeing the following error: > >> > >> "SM failed to send message due to shortage of shared memory." > >> > >> We then increased the allowable maximum size of the shared segment to > >> 2Gigabytes-1 which is the maximum allowed on 32-bit application. We > >> used the mca parameter to increase it as shown here. > >> > >> -mca mpool_sm_max_size 2147483647 > >> > >> This allowed the program to run to completion. Therefore, we would > >> like to increase the default maximum from 512Mbytes to 2G-1 > >> Gigabytes. > >> Does anyone have an objection to this change? Soon we are going to > >> have larger CPU counts and would like to increase the odds that > >> things > >> work "out of the box" on these large SMPs. > >> > > > > > > There is a serious problem with the 1.2 branch, it does not allocate > > any SM area for each process at the beginning. SM areas are allocated > > on demand and if some of the processes are more aggressive than the > > others, it will cause starvation. This problem is fixed in the trunk > > by assign at least one SM area for each process. I think this is what > > you saw (starvation) and an increase of max size may not be necessary. > > Although I'm pretty sure this is fixed in the v1.2 branch already. > It should never happen for the new code. The only way we can get the message is when MCA_BTL_SM_FIFO_WRITE return rc != OMPI_SUCCESS, but the new MCA_BTL_SM_FIFO_WRITE always return rc = OMPI_SUCCESS #define MCA_BTL_SM_FIFO_WRITE(endpoint_peer, my_smp_rank,peer_smp_rank,hdr,rc) \ do { \ ompi_fifo_t* fifo; \ fifo=&(mca_btl_sm_component.fifo[peer_smp_rank][my_smp_rank]); \ \ /* thread lock */ \ if(opal_using_threads()) \ opal_atomic_lock(fifo->head_lock); \ /* post fragment */ \ while(ompi_fifo_write_to_head(hdr, fifo, \ mca_btl_sm_component.sm_mpool) != OMPI_SUCCESS) \ opal_progress(); \ MCA_BTL_SM_SIGNAL_PEER(endpoint_peer); \ rc=OMPI_SUCCESS; \ if(opal_using_threads()) \ opal_atomic_unlock(fifo->head_lock); \ } while(0) Rolf, are you using the really last 1.2 branch? Ollie
Re: [OMPI devel] SM BTL hang issue
On Wed, 2007-08-29 at 11:36 -0400, Terry D. Dontje wrote: > To run the code I usually do "mpirun -np 6 a.out 10" on a 2 core > system. It'll print out the following and then hang: > Target duration (seconds): 10.00 > # of messages sent in that time: 589207 > Microseconds per message: 16.972 > I know almost nothing about FORTRAN but the stack dump told me it got NULL pointer reference when accessing the "me" variable in the do .. while loop. How can this happen? [ollie@exponential ~]$ mpirun -np 2 a.out 100 [exponential:22145] *** Process received signal *** [exponential:22145] Signal: Segmentation fault (11) [exponential:22145] Signal code: Address not mapped (1) [exponential:22145] Failing at address: (nil) [exponential:22145] [ 0] [0xb7f2a440] [exponential:22145] [ 1] a.out(MAIN__+0x54a) [0x804909e] [exponential:22145] [ 2] a.out(main+0x27) [0x8049127] [exponential:22145] [ 3] /lib/libc.so.6(__libc_start_main+0xe0) [0x4e75ef70] [exponential:22145] [ 4] a.out [0x8048aa1] [exponential:22145] *** End of error message *** call MPI_Send(keep_going,1,MPI_LOGICAL,me+1,1, $ MPI_COMM_WORLD,ier) 804909e: 8b 45 d4mov0xffd4(%ebp),%eax 80490a1: 83 c0 01add$0x1,%eax It is compiled with g77/g90. Ollie
Re: [OMPI devel] Maximum Shared Memory Segment - OK to increase?
On Thu, 2007-08-30 at 10:26 -0400, rolf.vandeva...@sun.com wrote: > Li-Ta Lo wrote: > > >On Tue, 2007-08-28 at 10:12 -0600, Brian Barrett wrote: > > > > > >>On Aug 28, 2007, at 9:05 AM, Li-Ta Lo wrote: > >> > >> > >> > >>>On Mon, 2007-08-27 at 15:10 -0400, Rolf vandeVaart wrote: > >>> > >>> > >>>>We are running into a problem when running on one of our larger SMPs > >>>>using the latest Open MPI v1.2 branch. We are trying to run a job > >>>>with np=128 within a single node. We are seeing the following error: > >>>> > >>>>"SM failed to send message due to shortage of shared memory." > >>>> > >>>>We then increased the allowable maximum size of the shared segment to > >>>>2Gigabytes-1 which is the maximum allowed on 32-bit application. We > >>>>used the mca parameter to increase it as shown here. > >>>> > >>>>-mca mpool_sm_max_size 2147483647 > >>>> > >>>>This allowed the program to run to completion. Therefore, we would > >>>>like to increase the default maximum from 512Mbytes to 2G-1 > >>>>Gigabytes. > >>>>Does anyone have an objection to this change? Soon we are going to > >>>>have larger CPU counts and would like to increase the odds that > >>>>things > >>>>work "out of the box" on these large SMPs. > >>>> > >>>> > >>>> > >>>There is a serious problem with the 1.2 branch, it does not allocate > >>>any SM area for each process at the beginning. SM areas are allocated > >>>on demand and if some of the processes are more aggressive than the > >>>others, it will cause starvation. This problem is fixed in the trunk > >>>by assign at least one SM area for each process. I think this is what > >>>you saw (starvation) and an increase of max size may not be necessary. > >>> > >>> > >>Although I'm pretty sure this is fixed in the v1.2 branch already. > >> > >> > >> > > > >It should never happen for the new code. The only way we can get the > >message is when MCA_BTL_SM_FIFO_WRITE return rc != OMPI_SUCCESS, but > >the new MCA_BTL_SM_FIFO_WRITE always return rc = OMPI_SUCCESS > > > >#define MCA_BTL_SM_FIFO_WRITE(endpoint_peer, > >my_smp_rank,peer_smp_rank,hdr,rc) \ > >do { \ > >ompi_fifo_t* fifo; \ > >fifo=&(mca_btl_sm_component.fifo[peer_smp_rank][my_smp_rank]); \ > > \ > >/* thread lock */ \ > >if(opal_using_threads()) \ > >opal_atomic_lock(fifo->head_lock); \ > >/* post fragment */ \ > >while(ompi_fifo_write_to_head(hdr, fifo, \ > >mca_btl_sm_component.sm_mpool) != OMPI_SUCCESS) \ > >opal_progress(); \ > >MCA_BTL_SM_SIGNAL_PEER(endpoint_peer); \ > >rc=OMPI_SUCCESS; \ > >if(opal_using_threads()) \ > >opal_atomic_unlock(fifo->head_lock); \ > >} while(0) > > > >Rolf, are you using the really last 1.2 branch? > > > >Ollie > > > > > > > Thanks for all the input. It turns out I was originally *not* using > the latest 1.2 branch. So, we redid the tests with the latest 1.2. > And, I am happy to report that we no longer get the"SM failed to > send message due to shortage of shared memory" error. However, > now the program hangs. So, it looks like we traded one problem for > another. > Can I see your test code? Ollie
Re: [OMPI devel] SM BTL hang issue
On Wed, 2007-08-29 at 14:06 -0400, Terry D. Dontje wrote: > hmmm, interesting since my version doesn't abort at all. > Some problem with fortran compiler/language binding? My C translation doesn't have any problem. [ollie@exponential ~]$ mpirun -np 4 a.out 10 Target duration (seconds): 10.00, #of msgs: 50331, usec per msg: 198.684707 Ollie #include #include #include int main(int argc, char *argv[]) { double duration = 10, endtime; long nmsgs = 1; int keep_going = 1, rank, size; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); if (size == 1) { fprintf(stderr, "Need at least 2 processes\n"); } else if (rank == 0) { duration = strtod(argv[1], NULL); endtime = MPI_Wtime() + duration; do { MPI_Send(&keep_going, 1, MPI_INT, 1, 0x11, MPI_COMM_WORLD); nmsgs += 1; } while (MPI_Wtime() < endtime); keep_going = 0; MPI_Send(&keep_going, 1, MPI_INT, 1, 0x11, MPI_COMM_WORLD); fprintf(stderr, "Target duration (seconds): %f, #of msgs: %d, usec per msg: %f\n", duration, nmsgs, 1.0e6*duration/nmsgs); } else { do { MPI_Recv(&keep_going, 1, MPI_INT, rank-1, 0x11, MPI_COMM_WORLD, &status); if (rank == (size-1)) continue; MPI_Send(&keep_going, 1, MPI_INT, rank+1, 0x11, MPI_COMM_WORLD); } while (keep_going); } MPI_Finalize(); }
Re: [OMPI devel] SM BTL hang issue
On Thu, 2007-08-30 at 12:25 -0400, terry.don...@sun.com wrote: > Li-Ta Lo wrote: > > >On Wed, 2007-08-29 at 14:06 -0400, Terry D. Dontje wrote: > > > > > >>hmmm, interesting since my version doesn't abort at all. > >> > >> > >> > > > > > >Some problem with fortran compiler/language binding? My C translation > >doesn't have any problem. > > > >[ollie@exponential ~]$ mpirun -np 4 a.out 10 > >Target duration (seconds): 10.00, #of msgs: 50331, usec per msg: > >198.684707 > > > > > > > Did you oversubscribe? I found np=10 on a 8 core system clogged things > up sufficiently. > Yea, I used np 10 on a 2 proc, 2 hyper-thread system (total 4 threads). Ollie
Re: [OMPI devel] SM BTL hang issue
On Thu, 2007-08-30 at 12:45 -0400, terry.don...@sun.com wrote: > Li-Ta Lo wrote: > > >On Thu, 2007-08-30 at 12:25 -0400, terry.don...@sun.com wrote: > > > > > >>Li-Ta Lo wrote: > >> > >> > >> > >>>On Wed, 2007-08-29 at 14:06 -0400, Terry D. Dontje wrote: > >>> > >>> > >>> > >>> > >>>>hmmm, interesting since my version doesn't abort at all. > >>>> > >>>> > >>>> > >>>> > >>>> > >>>Some problem with fortran compiler/language binding? My C translation > >>>doesn't have any problem. > >>> > >>>[ollie@exponential ~]$ mpirun -np 4 a.out 10 > >>>Target duration (seconds): 10.00, #of msgs: 50331, usec per msg: > >>>198.684707 > >>> > >>> > >>> > >>> > >>> > >>Did you oversubscribe? I found np=10 on a 8 core system clogged things > >>up sufficiently. > >> > >> > >> > > > > > >Yea, I used np 10 on a 2 proc, 2 hyper-thread system (total 4 threads). > > > > > > > Is this using Linux? > Yes. Ollie
Re: [OMPI devel] Any info regarding sm availible?
On Wed, 2007-10-03 at 12:43 +0200, Torje Henriksen wrote: > Hi everyone, > > > I'm a student at the University of Tromso, and I'm trying to > modify the shared memory component in the bit transfer layer > (ompi/mca/btl/sm), and also the queues that this > component uses. > > I was wondering if you could point me to any information regarding these > components. I got the source code of course, but other than that. > > I would also like to ask some more or less specific questions about these > and probably other parts of Open MPI. Is this the right place for such > questions? > > Thanks for your time, open mpi seems like a very nice project, and it's > fun to be able to mess around with it :) > > I have some Dia graph showing some data structures used by sm btl. Do you want it? Ollie
Re: [OMPI devel] Moving fragments in btl sm
On Thu, 2007-11-08 at 13:38 +0100, Torje Henriksen wrote: > Hi, > > I have a question that I shouldn't need to ask, but I'm > kind of lost in the code. > > The btl sm component is using the circular buffers to write and read > fragments (sending and receiving). > > In the write_to_head and read_from_tail I can only see pointers beeing set, > no data being moved. So where does the actual data movement/copying take > place? I'm thinking maybe a callback function existing somewhere :) > > > Thank you for your help now and earlier. > You are right. The "real thing" happens at the mca_btl_sm_component_progess(). The PML/BML will call btl_register() to register callback function to be called when a frag is received. In the event loop, the progress() function is called periodically to check if there is any new frag arrived. It is complicated a little bit by the fact that to transmit each "data" frag, there is a round trip and two "frags" are exchanged. The send side sends the "data" frag with header type SEND to the receiver. The receiver calls the callback function the handle the frag and send back an ACK frag. Upon receiving the ACK frag, the send side calls the des_cbfunc() to tell the upper layer the the sending of this frag is completed. BTW, it looks like it is still list append/remove in the PML/BML layer. I don't know when/where the real "copying" happens. Ollie