Re: [OMPI devel] SM BTL hang issue
Scott Atchley wrote: Terry, Are you testing on Linux? If so, which kernel? No, I am running into issues on Solaris but Ollie's run of the test code on Linux seems to work fine. --td See the patch to iperf to handle kernel 2.6.21 and the issue that they had with usleep(0): http://dast.nlanr.net/Projects/Iperf2.0/patch-iperf-linux-2.6.21.txt Scott On Aug 31, 2007, at 1:36 PM, Terry D. Dontje wrote: Ok, I have an update to this issue. I believe there is an implementation difference of sched_yield between Linux and Solaris. If I change the sched_yield in opal_progress to be a usleep(500) then my program completes quite quickly. I have sent a few questions to a Solaris engineer and hopefully will get some useful information. That being said, CT-6's implementation also used yield calls (note this actually is what sched_yield reduces down to in Solaris) and we did not see the same degradation issue as with Open MPI. I believe the reason is because CT-6's SM implementation is not calling CT-6's version of progress recursively and forcing all the unexpected to be read in before continuing. CT-6 also has a natural flow control in it's implementation (ie it has a fixed set fifo for eager messages. I believe both of these characteristics lend CT-6 to not being completely killed by the yield differences. --td Li-Ta Lo wrote: On Thu, 2007-08-30 at 12:45 -0400, terry.don...@sun.com wrote: Li-Ta Lo wrote: On Thu, 2007-08-30 at 12:25 -0400, terry.don...@sun.com wrote: Li-Ta Lo wrote: On Wed, 2007-08-29 at 14:06 -0400, Terry D. Dontje wrote: hmmm, interesting since my version doesn't abort at all. Some problem with fortran compiler/language binding? My C translation doesn't have any problem. [ollie@exponential ~]$ mpirun -np 4 a.out 10 Target duration (seconds): 10.00, #of msgs: 50331, usec per msg: 198.684707 Did you oversubscribe? I found np=10 on a 8 core system clogged things up sufficiently. Yea, I used np 10 on a 2 proc, 2 hyper-thread system (total 4 threads). Is this using Linux? Yes. Ollie ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] SM BTL hang issue
Terry, Are you testing on Linux? If so, which kernel? See the patch to iperf to handle kernel 2.6.21 and the issue that they had with usleep(0): http://dast.nlanr.net/Projects/Iperf2.0/patch-iperf-linux-2.6.21.txt Scott On Aug 31, 2007, at 1:36 PM, Terry D. Dontje wrote: Ok, I have an update to this issue. I believe there is an implementation difference of sched_yield between Linux and Solaris. If I change the sched_yield in opal_progress to be a usleep(500) then my program completes quite quickly. I have sent a few questions to a Solaris engineer and hopefully will get some useful information. That being said, CT-6's implementation also used yield calls (note this actually is what sched_yield reduces down to in Solaris) and we did not see the same degradation issue as with Open MPI. I believe the reason is because CT-6's SM implementation is not calling CT-6's version of progress recursively and forcing all the unexpected to be read in before continuing. CT-6 also has a natural flow control in it's implementation (ie it has a fixed set fifo for eager messages. I believe both of these characteristics lend CT-6 to not being completely killed by the yield differences. --td Li-Ta Lo wrote: On Thu, 2007-08-30 at 12:45 -0400, terry.don...@sun.com wrote: Li-Ta Lo wrote: On Thu, 2007-08-30 at 12:25 -0400, terry.don...@sun.com wrote: Li-Ta Lo wrote: On Wed, 2007-08-29 at 14:06 -0400, Terry D. Dontje wrote: hmmm, interesting since my version doesn't abort at all. Some problem with fortran compiler/language binding? My C translation doesn't have any problem. [ollie@exponential ~]$ mpirun -np 4 a.out 10 Target duration (seconds): 10.00, #of msgs: 50331, usec per msg: 198.684707 Did you oversubscribe? I found np=10 on a 8 core system clogged things up sufficiently. Yea, I used np 10 on a 2 proc, 2 hyper-thread system (total 4 threads). Is this using Linux? Yes. Ollie ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] SM BTL hang issue
Ok, I have an update to this issue. I believe there is an implementation difference of sched_yield between Linux and Solaris. If I change the sched_yield in opal_progress to be a usleep(500) then my program completes quite quickly. I have sent a few questions to a Solaris engineer and hopefully will get some useful information. That being said, CT-6's implementation also used yield calls (note this actually is what sched_yield reduces down to in Solaris) and we did not see the same degradation issue as with Open MPI. I believe the reason is because CT-6's SM implementation is not calling CT-6's version of progress recursively and forcing all the unexpected to be read in before continuing. CT-6 also has a natural flow control in it's implementation (ie it has a fixed set fifo for eager messages. I believe both of these characteristics lend CT-6 to not being completely killed by the yield differences. --td Li-Ta Lo wrote: On Thu, 2007-08-30 at 12:45 -0400, terry.don...@sun.com wrote: Li-Ta Lo wrote: On Thu, 2007-08-30 at 12:25 -0400, terry.don...@sun.com wrote: Li-Ta Lo wrote: On Wed, 2007-08-29 at 14:06 -0400, Terry D. Dontje wrote: hmmm, interesting since my version doesn't abort at all. Some problem with fortran compiler/language binding? My C translation doesn't have any problem. [ollie@exponential ~]$ mpirun -np 4 a.out 10 Target duration (seconds): 10.00, #of msgs: 50331, usec per msg: 198.684707 Did you oversubscribe? I found np=10 on a 8 core system clogged things up sufficiently. Yea, I used np 10 on a 2 proc, 2 hyper-thread system (total 4 threads). Is this using Linux? Yes. Ollie ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] SM BTL hang issue
Li-Ta Lo wrote: On Thu, 2007-08-30 at 12:25 -0400, terry.don...@sun.com wrote: Li-Ta Lo wrote: On Wed, 2007-08-29 at 14:06 -0400, Terry D. Dontje wrote: hmmm, interesting since my version doesn't abort at all. Some problem with fortran compiler/language binding? My C translation doesn't have any problem. [ollie@exponential ~]$ mpirun -np 4 a.out 10 Target duration (seconds): 10.00, #of msgs: 50331, usec per msg: 198.684707 Did you oversubscribe? I found np=10 on a 8 core system clogged things up sufficiently. Yea, I used np 10 on a 2 proc, 2 hyper-thread system (total 4 threads). Is this using Linux? --td Ollie ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] SM BTL hang issue
On Thu, 2007-08-30 at 12:25 -0400, terry.don...@sun.com wrote: > Li-Ta Lo wrote: > > >On Wed, 2007-08-29 at 14:06 -0400, Terry D. Dontje wrote: > > > > > >>hmmm, interesting since my version doesn't abort at all. > >> > >> > >> > > > > > >Some problem with fortran compiler/language binding? My C translation > >doesn't have any problem. > > > >[ollie@exponential ~]$ mpirun -np 4 a.out 10 > >Target duration (seconds): 10.00, #of msgs: 50331, usec per msg: > >198.684707 > > > > > > > Did you oversubscribe? I found np=10 on a 8 core system clogged things > up sufficiently. > Yea, I used np 10 on a 2 proc, 2 hyper-thread system (total 4 threads). Ollie
Re: [OMPI devel] SM BTL hang issue
Li-Ta Lo wrote: On Wed, 2007-08-29 at 14:06 -0400, Terry D. Dontje wrote: hmmm, interesting since my version doesn't abort at all. Some problem with fortran compiler/language binding? My C translation doesn't have any problem. [ollie@exponential ~]$ mpirun -np 4 a.out 10 Target duration (seconds): 10.00, #of msgs: 50331, usec per msg: 198.684707 Did you oversubscribe? I found np=10 on a 8 core system clogged things up sufficiently. --td Ollie #include #include #include int main(int argc, char *argv[]) { double duration = 10, endtime; long nmsgs = 1; int keep_going = 1, rank, size; MPI_Status status; MPI_Init(, ); MPI_Comm_rank(MPI_COMM_WORLD, ); MPI_Comm_size(MPI_COMM_WORLD, ); if (size == 1) { fprintf(stderr, "Need at least 2 processes\n"); } else if (rank == 0) { duration = strtod(argv[1], NULL); endtime = MPI_Wtime() + duration; do { MPI_Send(_going, 1, MPI_INT, 1, 0x11, MPI_COMM_WORLD); nmsgs += 1; } while (MPI_Wtime() < endtime); keep_going = 0; MPI_Send(_going, 1, MPI_INT, 1, 0x11, MPI_COMM_WORLD); fprintf(stderr, "Target duration (seconds): %f, #of msgs: %d, usec per msg: %f\n", duration, nmsgs, 1.0e6*duration/nmsgs); } else { do { MPI_Recv(_going, 1, MPI_INT, rank-1, 0x11, MPI_COMM_WORLD, ); if (rank == (size-1)) continue; MPI_Send(_going, 1, MPI_INT, rank+1, 0x11, MPI_COMM_WORLD); } while (keep_going); } MPI_Finalize(); } ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] SM BTL hang issue
On Wed, 2007-08-29 at 14:06 -0400, Terry D. Dontje wrote: > hmmm, interesting since my version doesn't abort at all. > Some problem with fortran compiler/language binding? My C translation doesn't have any problem. [ollie@exponential ~]$ mpirun -np 4 a.out 10 Target duration (seconds): 10.00, #of msgs: 50331, usec per msg: 198.684707 Ollie #include #include #include int main(int argc, char *argv[]) { double duration = 10, endtime; long nmsgs = 1; int keep_going = 1, rank, size; MPI_Status status; MPI_Init(, ); MPI_Comm_rank(MPI_COMM_WORLD, ); MPI_Comm_size(MPI_COMM_WORLD, ); if (size == 1) { fprintf(stderr, "Need at least 2 processes\n"); } else if (rank == 0) { duration = strtod(argv[1], NULL); endtime = MPI_Wtime() + duration; do { MPI_Send(_going, 1, MPI_INT, 1, 0x11, MPI_COMM_WORLD); nmsgs += 1; } while (MPI_Wtime() < endtime); keep_going = 0; MPI_Send(_going, 1, MPI_INT, 1, 0x11, MPI_COMM_WORLD); fprintf(stderr, "Target duration (seconds): %f, #of msgs: %d, usec per msg: %f\n", duration, nmsgs, 1.0e6*duration/nmsgs); } else { do { MPI_Recv(_going, 1, MPI_INT, rank-1, 0x11, MPI_COMM_WORLD, ); if (rank == (size-1)) continue; MPI_Send(_going, 1, MPI_INT, rank+1, 0x11, MPI_COMM_WORLD); } while (keep_going); } MPI_Finalize(); }
Re: [OMPI devel] SM BTL hang issue
hmmm, interesting since my version doesn't abort at all. --td Li-Ta Lo wrote: On Wed, 2007-08-29 at 11:36 -0400, Terry D. Dontje wrote: To run the code I usually do "mpirun -np 6 a.out 10" on a 2 core system. It'll print out the following and then hang: Target duration (seconds): 10.00 # of messages sent in that time: 589207 Microseconds per message: 16.972 I know almost nothing about FORTRAN but the stack dump told me it got NULL pointer reference when accessing the "me" variable in the do .. while loop. How can this happen? [ollie@exponential ~]$ mpirun -np 2 a.out 100 [exponential:22145] *** Process received signal *** [exponential:22145] Signal: Segmentation fault (11) [exponential:22145] Signal code: Address not mapped (1) [exponential:22145] Failing at address: (nil) [exponential:22145] [ 0] [0xb7f2a440] [exponential:22145] [ 1] a.out(MAIN__+0x54a) [0x804909e] [exponential:22145] [ 2] a.out(main+0x27) [0x8049127] [exponential:22145] [ 3] /lib/libc.so.6(__libc_start_main+0xe0) [0x4e75ef70] [exponential:22145] [ 4] a.out [0x8048aa1] [exponential:22145] *** End of error message *** call MPI_Send(keep_going,1,MPI_LOGICAL,me+1,1, $ MPI_COMM_WORLD,ier) 804909e: 8b 45 d4mov0xffd4(%ebp),%eax 80490a1: 83 c0 01add$0x1,%eax It is compiled with g77/g90. Ollie ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] SM BTL hang issue
On Wed, 2007-08-29 at 11:36 -0400, Terry D. Dontje wrote: > To run the code I usually do "mpirun -np 6 a.out 10" on a 2 core > system. It'll print out the following and then hang: > Target duration (seconds): 10.00 > # of messages sent in that time: 589207 > Microseconds per message: 16.972 > I know almost nothing about FORTRAN but the stack dump told me it got NULL pointer reference when accessing the "me" variable in the do .. while loop. How can this happen? [ollie@exponential ~]$ mpirun -np 2 a.out 100 [exponential:22145] *** Process received signal *** [exponential:22145] Signal: Segmentation fault (11) [exponential:22145] Signal code: Address not mapped (1) [exponential:22145] Failing at address: (nil) [exponential:22145] [ 0] [0xb7f2a440] [exponential:22145] [ 1] a.out(MAIN__+0x54a) [0x804909e] [exponential:22145] [ 2] a.out(main+0x27) [0x8049127] [exponential:22145] [ 3] /lib/libc.so.6(__libc_start_main+0xe0) [0x4e75ef70] [exponential:22145] [ 4] a.out [0x8048aa1] [exponential:22145] *** End of error message *** call MPI_Send(keep_going,1,MPI_LOGICAL,me+1,1, $ MPI_COMM_WORLD,ier) 804909e: 8b 45 d4mov0xffd4(%ebp),%eax 80490a1: 83 c0 01add$0x1,%eax It is compiled with g77/g90. Ollie
Re: [OMPI devel] SM BTL hang issue
If you are going to look at it, I will not bother with this. Rich On 8/29/07 10:47 AM, "Gleb Natapov"wrote: > On Wed, Aug 29, 2007 at 10:46:06AM -0400, Richard Graham wrote: >> Gleb, >> Are you looking at this ? > Not today. And I need the code to reproduce the bug. Is this possible? > >> >> Rich >> >> >> On 8/29/07 9:56 AM, "Gleb Natapov" wrote: >> >>> On Wed, Aug 29, 2007 at 04:48:07PM +0300, Gleb Natapov wrote: Is this trunk or 1.2? >>> Oops. I should read more carefully :) This is trunk. >>> On Wed, Aug 29, 2007 at 09:40:30AM -0400, Terry D. Dontje wrote: > I have a program that does a simple bucket brigade of sends and receives > where rank 0 is the start and repeatedly sends to rank 1 until a certain > amount of time has passed and then it sends and all done packet. > > Running this under np=2 always works. However, when I run with greater > than 2 using only the SM btl the program usually hangs and one of the > processes has a long stack that has a lot of the following 3 calls in it: > > [25] opal_progress(), line 187 in "opal_progress.c" > [26] mca_btl_sm_component_progress(), line 397 in "btl_sm_component.c" > [27] mca_bml_r2_progress(), line 110 in "bml_r2.c" > > When stepping through the ompi_fifo_write_to_head routine it looks like > the fifo has overflowed. > > I am wondering if what is happening is rank 0 has sent a bunch of > messages that have exhausted the > resources such that one of the middle ranks which is in the process of > sending cannot send and therefore > never gets to the point of trying to receive the messages from rank 0? > > Is the above a possible scenario or are messages periodically bled off > the SM BTL's fifos? > > Note, I have seen np=3 pass sometimes and I can get it to pass reliably > if I raise the shared memory space used by the BTL. This is using the > trunk. > > > --td > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> -- >>> Gleb. >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > -- > Gleb. > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] SM BTL hang issue
Gleb, Are you looking at this ? Rich On 8/29/07 9:56 AM, "Gleb Natapov"wrote: > On Wed, Aug 29, 2007 at 04:48:07PM +0300, Gleb Natapov wrote: >> Is this trunk or 1.2? > Oops. I should read more carefully :) This is trunk. > >> >> On Wed, Aug 29, 2007 at 09:40:30AM -0400, Terry D. Dontje wrote: >>> I have a program that does a simple bucket brigade of sends and receives >>> where rank 0 is the start and repeatedly sends to rank 1 until a certain >>> amount of time has passed and then it sends and all done packet. >>> >>> Running this under np=2 always works. However, when I run with greater >>> than 2 using only the SM btl the program usually hangs and one of the >>> processes has a long stack that has a lot of the following 3 calls in it: >>> >>> [25] opal_progress(), line 187 in "opal_progress.c" >>> [26] mca_btl_sm_component_progress(), line 397 in "btl_sm_component.c" >>> [27] mca_bml_r2_progress(), line 110 in "bml_r2.c" >>> >>> When stepping through the ompi_fifo_write_to_head routine it looks like >>> the fifo has overflowed. >>> >>> I am wondering if what is happening is rank 0 has sent a bunch of >>> messages that have exhausted the >>> resources such that one of the middle ranks which is in the process of >>> sending cannot send and therefore >>> never gets to the point of trying to receive the messages from rank 0? >>> >>> Is the above a possible scenario or are messages periodically bled off >>> the SM BTL's fifos? >>> >>> Note, I have seen np=3 pass sometimes and I can get it to pass reliably >>> if I raise the shared memory space used by the BTL. This is using the >>> trunk. >>> >>> >>> --td >>> >>> >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> -- >> Gleb. >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > -- > Gleb. > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] SM BTL hang issue
Trunk. --td Gleb Natapov wrote: Is this trunk or 1.2? On Wed, Aug 29, 2007 at 09:40:30AM -0400, Terry D. Dontje wrote: I have a program that does a simple bucket brigade of sends and receives where rank 0 is the start and repeatedly sends to rank 1 until a certain amount of time has passed and then it sends and all done packet. Running this under np=2 always works. However, when I run with greater than 2 using only the SM btl the program usually hangs and one of the processes has a long stack that has a lot of the following 3 calls in it: [25] opal_progress(), line 187 in "opal_progress.c" [26] mca_btl_sm_component_progress(), line 397 in "btl_sm_component.c" [27] mca_bml_r2_progress(), line 110 in "bml_r2.c" When stepping through the ompi_fifo_write_to_head routine it looks like the fifo has overflowed. I am wondering if what is happening is rank 0 has sent a bunch of messages that have exhausted the resources such that one of the middle ranks which is in the process of sending cannot send and therefore never gets to the point of trying to receive the messages from rank 0? Is the above a possible scenario or are messages periodically bled off the SM BTL's fifos? Note, I have seen np=3 pass sometimes and I can get it to pass reliably if I raise the shared memory space used by the BTL. This is using the trunk. --td ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] SM BTL hang issue
Is this trunk or 1.2? On Wed, Aug 29, 2007 at 09:40:30AM -0400, Terry D. Dontje wrote: > I have a program that does a simple bucket brigade of sends and receives > where rank 0 is the start and repeatedly sends to rank 1 until a certain > amount of time has passed and then it sends and all done packet. > > Running this under np=2 always works. However, when I run with greater > than 2 using only the SM btl the program usually hangs and one of the > processes has a long stack that has a lot of the following 3 calls in it: > > [25] opal_progress(), line 187 in "opal_progress.c" > [26] mca_btl_sm_component_progress(), line 397 in "btl_sm_component.c" > [27] mca_bml_r2_progress(), line 110 in "bml_r2.c" > > When stepping through the ompi_fifo_write_to_head routine it looks like > the fifo has overflowed. > > I am wondering if what is happening is rank 0 has sent a bunch of > messages that have exhausted the > resources such that one of the middle ranks which is in the process of > sending cannot send and therefore > never gets to the point of trying to receive the messages from rank 0? > > Is the above a possible scenario or are messages periodically bled off > the SM BTL's fifos? > > Note, I have seen np=3 pass sometimes and I can get it to pass reliably > if I raise the shared memory space used by the BTL. This is using the > trunk. > > > --td > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Gleb.