Re: [OMPI devel] Malloc segfaulting?
But I am compiling Open MPI with --without-memory-manager, so it should work? Anyways, I ran the tests and valgrind is reporting 2 different (potentially related) problems: 1. ==12680== Invalid read of size 4 ==12680==at 0x709DE03: ompi_cb_fifo_write_to_head (ompi_circular_buffer_fifo.h:271) ==12680==by 0x709DA77: ompi_fifo_write_to_head (ompi_fifo.h:324) ==12680==by 0x709D964: mca_btl_sm_component_progress (btl_sm_component.c:398) ==12680==by 0x705BF6B: mca_bml_r2_progress (bml_r2.c:110) ==12680==by 0x44F905B: opal_progress (opal_progress.c:187) ==12680==by 0x704F0E5: opal_condition_wait (condition.h:98) ==12680==by 0x704EFD4: mca_pml_ob1_recv (pml_ob1_irecv.c:124) ==12680==by 0x7202A62: ompi_coll_tuned_scatter_intra_binomial (coll_tuned_scatter.c:166) ==12680==by 0x71F2C08: ompi_coll_tuned_scatter_intra_dec_fixed (coll_tuned_decision_fixed.c:746) ==12680==by 0x4442494: PMPI_Scatter (pscatter.c:125) ==12680==by 0x8048F6F: main (scatter_in_place.c:73) 2. ==28775== Jump to the invalid address stated on the next line ==28775==at 0x2F305F35: ??? ==28775==by 0x704AF6B: mca_bml_r2_progress (bml_r2.c:110) ==28775==by 0x44F905B: opal_progress (opal_progress.c:187) ==28775==by 0x440BF6B: opal_condition_wait (condition.h:98) ==28775==by 0x440BDF7: ompi_request_wait (req_wait.c:46) ==28775==by 0x71EF396: ompi_coll_tuned_reduce_scatter_intra_basic_recursivehalving (coll_tuned_reduce_scatter.c:319) ==28775==by 0x71E1540: ompi_coll_tuned_reduce_scatter_intra_dec_fixed (coll_tuned_decision_fixed.c:471) ==28775==by 0x7202806: ompi_osc_pt2pt_module_fence (osc_pt2pt_sync.c:84) ==28775==by 0x44501B5: PMPI_Win_fence (pwin_fence.c:57) ==28775==by 0x80493D6: test_acc3_1 (test_acc3.c:156) ==28775==by 0x8048FD0: test_acc3 (test_acc3.c:26) ==28775==by 0x8049609: main (test_acc3.c:206) ==28775== Address 0x2F305F35 is not stack'd, malloc'd or (recently) free'd I don't know what to make of these. Here is the link to the full results: http://www.open-mpi.org/mtt/index.php?do_redir=386 Thanks, Tim On Friday 21 September 2007 10:40:21 am George Bosilca wrote: > Tim, > > Valgrind will not help ... It can help with double free or things > like this, but not with over-running memory that belong to your > application. However, in Open MPI we have something that might help > you. The option --enable-mem-debug add a unused space at the end of > each memory allocation and make sure we don't write anything there. I > think this is the simplest way to pinpoint this problem. > >Thanks, > george. > > On Sep 21, 2007, at 10:07 AM, Tim Prins wrote: > > Aurelien and Brian. > > > > Thanks for the suggestions. I reran the runs with --without-memory- > > manager and > > got (on 2 of 5000 runs): > > *** glibc detected *** corrupted double-linked list: 0xf704dff8 *** > > on one and > > *** glibc detected *** malloc(): memory corruption: 0xeda00c70 *** > > on the other. > > > > So it looks like somewhere we are over-running our allocated space. > > So now I > > am attempting to redo the run with valgrind. > > > > Tim > > > > On Thursday 20 September 2007 09:59:14 pm Brian Barrett wrote: > >> On Sep 20, 2007, at 7:02 AM, Tim Prins wrote: > >>> In our nightly runs with the trunk I have started seeing cases > >>> where we > >>> appear to be segfaulting within/below malloc. Below is a typical > >>> output. > >>> > >>> Note that this appears to only happen on the trunk, when we use > >>> openib, > >>> and are in 32 bit mode. It seems to happen randomly at a very low > >>> frequency (59 out of about 60,000 32 bit openib runs). > >>> > >>> This could be a problem with our machine, and has showed up since I > >>> started testing 32bit ofed 10 days ago. > >>> > >>> Anyways, just curious if anyone had any ideas. > >> > >> As someone else said, this usually points to a duplicate free or the > >> like in malloc. You might want to try compiling with --without- > >> memory-manager, as the ptmalloc2 in glibc frequently is more verbose > >> about where errors occurred than is the one in Open MPI. > >> > >> Brian > >> ___ > >> devel mailing list > >> de...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] 2nd cut of MTT web page
@#$%@#$% Someday, I'm going to learn to send to the right list. Sigh. Well, if anyone here cares, there's a slowly-getting-there web site for MTT going up... :-) On Sep 21, 2007, at 9:18 PM, Jeff Squyres wrote: http://www.open-mpi.org/projects/mtt/ I fixed the left-hand navigation, put up some descriptive text, and then copped out and linked to the wiki for all the real content. :-) Comments? -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
[OMPI devel] 2nd cut of MTT web page
http://www.open-mpi.org/projects/mtt/ I fixed the left-hand navigation, put up some descriptive text, and then copped out and linked to the wiki for all the real content. :-) Comments? -- Jeff Squyres Cisco Systems
[OMPI devel] Sajjad Tabib is out of the office.
I will be out of the office starting 09/21/2007 and will not return until 10/01/2007. I will respond to your message when I return.
Re: [OMPI devel] UD BTL alltoall hangs
Thanks George. I figured out the problem (two of them actually) based on a pointer from Gleb (thanks Gleb). I have two types of send queues on the UD BTL -- one is per-module, and the other is per-endpoint. I had missed looking for stuck frags on the per-endpoint queues. So something is wrong with the per-endpoint queues and their interaction with the per-module queue. Disabling the per-endpoint queue makes the problem go away, and I'm not sure I liked having them in the first place. But this still left a similar problem at 2kb messages. I had static limits set for free list lengths based on the btl_ofud_sd_num MCA parameter. Switching the max to unlimited makes this problem go away too. Good enough to get some runs through for now :) Andrew George Bosilca wrote: Andrew, There is an option on the message queue stuff, that allow you to see all internal pending requests. On the current trunk, edit the file ompi/debuggers/ompi_dll.s at line 736 and set the p_info->show_internal_requests to 1. Now compile and install it, and then restart totalview. You should be able to get access to all pending requests, even those created by the collective modules. Moreover, the missing sends should be somewhere. If they are not in the BTL, and i they are not completed, then hopefully they are in the PML in the send_pending list. As the collective works on all other BTL I suppose the communication pattern is correct, so there is something happening with the requests when using the UD BTL. If the requests are not in the PML send_pending queue, the next thing you can do is to modify the receive handles in the OB1 PML, and print all incoming match header. You will have to somehow sort the output, but at least you can figure out, what is happening with the missing messages. george. On Sep 11, 2007, at 12:37 PM, Andrew Friedley wrote: First off, I've managed to reproduce this with nbcbench using only 16 procs (two per node), and setting btl_ofud_sd_num to 12 -- eases debugging with fewer procs to look at. ompi_coll_tuned_alltoall_intra_basic_linear is the alltoall routine that is being called. What I'm seeing from totalview is that some random number of procs (1-5 usually, varies from run to run) are sitting with a send and a recv outstanding to every other proc. The other procs however have moved on to the next collective. This is hard to see with the default nbcbench code since it calls only alltoall repeatedly -- adding a barrier after the MPI_Alltoall() call makes it easier to see, as the barrier has a different tag number and communication pattern. So what I see is a few procs stuck in alltoall, while the rest are waiting in the following barrier. I've also verified with totalview that there are no outstanding send wqe's at the UD BTL, and all procs are polling progress. The procs in the alltoall are polling in the opal_condition_wait() called from ompi_request_wait_all(). Not sure what to ask or where to look further other than, what should I look at to see what requests are outstanding in the PML? Andrew George Bosilca wrote: The first step will be to figure out which version of the alltoall you're using. I suppose you use the default parameters, and then the decision function in the tuned component say it is using the linear all to all. As the name state it, this means that every node will post one receive from any other node and then will start sending to every other node the respective fragment. This will lead to a lot of outstanding sends and receives. I doubt that the receive can cause a problem, so I expect the problem is coming from the send side. Do you have TotalView installed on your odin ? If yes there is a simple way to see how many sends are pending and where ... That might pinpoint [at least] the process where you should look to see what' wrong. george. On Aug 29, 2007, at 12:37 AM, Andrew Friedley wrote: I'm having a problem with the UD BTL and hoping someone might have some input to help solve it. What I'm seeing is hangs when running alltoall benchmarks with nbcbench or an LLNL program called mpiBench -- both hang exactly the same way. With the code on the trunk running nbcbench on IU's odin using 32 nodes and a command line like this: mpirun -np 128 -mca btl ofud,self ./nbcbench -t MPI_Alltoall -p 128-128 -s 1-262144 hangs consistently when testing 256-byte messages. There are two things I can do to make the hang go away until running at larger scale. First is to increase the 'btl_ofud_sd_num' MCA param from its default value of 128. This allows you to run with more procs/nodes before hitting the hang, but AFAICT doesn't fix the actual problem. What this parameter does is control the maximum number of outstanding send WQEs posted at the IB level -- when the limit is reached, frags are queued on an opal_list_t and later sent by progress as IB sends complete. The other way I've found is to play games with calling mca_btl_ud_component_p
Re: [OMPI devel] VT integration
Excellent -- thanks. On Sep 21, 2007, at 11:30 AM, Andreas Knüpfer wrote: On Friday 21 September 2007, Jeff Squyres wrote: Per an idea that came up recently, can we make it so that ompi_info reports the version of VT that is integrated into Open MPI? good idea, we'll pick it up real soon Andreas -- Dipl. Math. Andreas Knuepfer, Center for Information Services and High Performance Computing (ZIH), TU Dresden, Willersbau A114, Zellescher Weg 12, 01062 Dresden phone +49-351-463-38323, fax +49-351-463-37773 ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] VT integration
On Friday 21 September 2007, Jeff Squyres wrote: > Per an idea that came up recently, can we make it so that ompi_info > reports the version of VT that is integrated into Open MPI? good idea, we'll pick it up real soon Andreas -- Dipl. Math. Andreas Knuepfer, Center for Information Services and High Performance Computing (ZIH), TU Dresden, Willersbau A114, Zellescher Weg 12, 01062 Dresden phone +49-351-463-38323, fax +49-351-463-37773 pgpDaSfrXdqLr.pgp Description: PGP signature
Re: [OMPI devel] UD BTL alltoall hangs
Andrew, There is an option on the message queue stuff, that allow you to see all internal pending requests. On the current trunk, edit the file ompi/debuggers/ompi_dll.s at line 736 and set the p_info- >show_internal_requests to 1. Now compile and install it, and then restart totalview. You should be able to get access to all pending requests, even those created by the collective modules. Moreover, the missing sends should be somewhere. If they are not in the BTL, and i they are not completed, then hopefully they are in the PML in the send_pending list. As the collective works on all other BTL I suppose the communication pattern is correct, so there is something happening with the requests when using the UD BTL. If the requests are not in the PML send_pending queue, the next thing you can do is to modify the receive handles in the OB1 PML, and print all incoming match header. You will have to somehow sort the output, but at least you can figure out, what is happening with the missing messages. george. On Sep 11, 2007, at 12:37 PM, Andrew Friedley wrote: First off, I've managed to reproduce this with nbcbench using only 16 procs (two per node), and setting btl_ofud_sd_num to 12 -- eases debugging with fewer procs to look at. ompi_coll_tuned_alltoall_intra_basic_linear is the alltoall routine that is being called. What I'm seeing from totalview is that some random number of procs (1-5 usually, varies from run to run) are sitting with a send and a recv outstanding to every other proc. The other procs however have moved on to the next collective. This is hard to see with the default nbcbench code since it calls only alltoall repeatedly -- adding a barrier after the MPI_Alltoall() call makes it easier to see, as the barrier has a different tag number and communication pattern. So what I see is a few procs stuck in alltoall, while the rest are waiting in the following barrier. I've also verified with totalview that there are no outstanding send wqe's at the UD BTL, and all procs are polling progress. The procs in the alltoall are polling in the opal_condition_wait() called from ompi_request_wait_all(). Not sure what to ask or where to look further other than, what should I look at to see what requests are outstanding in the PML? Andrew George Bosilca wrote: The first step will be to figure out which version of the alltoall you're using. I suppose you use the default parameters, and then the decision function in the tuned component say it is using the linear all to all. As the name state it, this means that every node will post one receive from any other node and then will start sending to every other node the respective fragment. This will lead to a lot of outstanding sends and receives. I doubt that the receive can cause a problem, so I expect the problem is coming from the send side. Do you have TotalView installed on your odin ? If yes there is a simple way to see how many sends are pending and where ... That might pinpoint [at least] the process where you should look to see what' wrong. george. On Aug 29, 2007, at 12:37 AM, Andrew Friedley wrote: I'm having a problem with the UD BTL and hoping someone might have some input to help solve it. What I'm seeing is hangs when running alltoall benchmarks with nbcbench or an LLNL program called mpiBench -- both hang exactly the same way. With the code on the trunk running nbcbench on IU's odin using 32 nodes and a command line like this: mpirun -np 128 -mca btl ofud,self ./nbcbench -t MPI_Alltoall -p 128-128 -s 1-262144 hangs consistently when testing 256-byte messages. There are two things I can do to make the hang go away until running at larger scale. First is to increase the 'btl_ofud_sd_num' MCA param from its default value of 128. This allows you to run with more procs/nodes before hitting the hang, but AFAICT doesn't fix the actual problem. What this parameter does is control the maximum number of outstanding send WQEs posted at the IB level -- when the limit is reached, frags are queued on an opal_list_t and later sent by progress as IB sends complete. The other way I've found is to play games with calling mca_btl_ud_component_progress() in mca_btl_ud_endpoint_post_send (). In fact I replaced the CHECK_FRAG_QUEUES() macro used around btl_ofud_endpoint.c:77 with a version that loops on progress until a send WQE slot is available (as opposed to queueing). Same result -- I can run at larger scale, but still hit the hang eventually. It appears that when the job hangs, progress is being polled very quickly, and after spinning for a while there are no outstanding send WQEs or queued sends in the BTL. I'm not sure where further up things are spinning/blocking, as I can't produce the hang at less than 32 nodes / 128 procs and don't have a good way of debugging that (suggestions appreciated). Furthermore, both ob1 and dr PMLs result in the same behavior, except
Re: [OMPI devel] Malloc segfaulting?
Tim, Valgrind will not help ... It can help with double free or things like this, but not with over-running memory that belong to your application. However, in Open MPI we have something that might help you. The option --enable-mem-debug add a unused space at the end of each memory allocation and make sure we don't write anything there. I think this is the simplest way to pinpoint this problem. Thanks, george. On Sep 21, 2007, at 10:07 AM, Tim Prins wrote: Aurelien and Brian. Thanks for the suggestions. I reran the runs with --without-memory- manager and got (on 2 of 5000 runs): *** glibc detected *** corrupted double-linked list: 0xf704dff8 *** on one and *** glibc detected *** malloc(): memory corruption: 0xeda00c70 *** on the other. So it looks like somewhere we are over-running our allocated space. So now I am attempting to redo the run with valgrind. Tim On Thursday 20 September 2007 09:59:14 pm Brian Barrett wrote: On Sep 20, 2007, at 7:02 AM, Tim Prins wrote: In our nightly runs with the trunk I have started seeing cases where we appear to be segfaulting within/below malloc. Below is a typical output. Note that this appears to only happen on the trunk, when we use openib, and are in 32 bit mode. It seems to happen randomly at a very low frequency (59 out of about 60,000 32 bit openib runs). This could be a problem with our machine, and has showed up since I started testing 32bit ofed 10 days ago. Anyways, just curious if anyone had any ideas. As someone else said, this usually points to a duplicate free or the like in malloc. You might want to try compiling with --without- memory-manager, as the ptmalloc2 in glibc frequently is more verbose about where errors occurred than is the one in Open MPI. Brian ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel smime.p7s Description: S/MIME cryptographic signature
Re: [OMPI devel] Malloc segfaulting?
Aurelien and Brian. Thanks for the suggestions. I reran the runs with --without-memory-manager and got (on 2 of 5000 runs): *** glibc detected *** corrupted double-linked list: 0xf704dff8 *** on one and *** glibc detected *** malloc(): memory corruption: 0xeda00c70 *** on the other. So it looks like somewhere we are over-running our allocated space. So now I am attempting to redo the run with valgrind. Tim On Thursday 20 September 2007 09:59:14 pm Brian Barrett wrote: > On Sep 20, 2007, at 7:02 AM, Tim Prins wrote: > > In our nightly runs with the trunk I have started seeing cases > > where we > > appear to be segfaulting within/below malloc. Below is a typical > > output. > > > > Note that this appears to only happen on the trunk, when we use > > openib, > > and are in 32 bit mode. It seems to happen randomly at a very low > > frequency (59 out of about 60,000 32 bit openib runs). > > > > This could be a problem with our machine, and has showed up since I > > started testing 32bit ofed 10 days ago. > > > > Anyways, just curious if anyone had any ideas. > > As someone else said, this usually points to a duplicate free or the > like in malloc. You might want to try compiling with --without- > memory-manager, as the ptmalloc2 in glibc frequently is more verbose > about where errors occurred than is the one in Open MPI. > > Brian > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
[OMPI devel] VT integration
Per an idea that came up recently, can we make it so that ompi_info reports the version of VT that is integrated into Open MPI? -- Jeff Squyres Cisco Systems