[OMPI devel] MTT GAE/GDS call this morning
Ok, I opened a Google account with the name "openmpi" this morning, and using Ethan's cell phone (heh; turns out that I had already used mine), we verified it for use with GAE. Woo hoo! "open-mpi-mtt" is the name of the Google Apps application. I'll send the relevant authentication information out-of-band. -- Jeff Squyres Cisco Systems
Re: [OMPI devel] MTT GAE/GDS call this morning
Blah! Wrong "devel" list -- sorry folks! On Apr 23, 2009, at 10:33 AM, Jeff Squyres (jsquyres) wrote: Ok, I opened a Google account with the name "openmpi" this morning, and using Ethan's cell phone (heh; turns out that I had already used mine), we verified it for use with GAE. Woo hoo! "open-mpi-mtt" is the name of the Google Apps application. I'll send the relevant authentication information out-of-band. -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
[OMPI devel] Patch to resolve valgrind warnings for 1.3.2
Although --enable-mem-debug resolves the issue, I get warnings about uninitialized bytes in writev from the opal_if_t structs in opal_ifinit: ==25777== Syscall param writev(vector[...]) points to uninitialised byte(s) ==25777==at 0x34DE2C9F0C: writev (in /lib64/libc-2.6.so) ==25777==by 0xAC233B: mca_oob_tcp_msg_send_handler (oob_tcp_msg.c:265) ==25777==by 0xABAC92: mca_oob_tcp_peer_send (oob_tcp_peer.c:197) ==25777==by 0xAC0A80: mca_oob_tcp_send_nb (oob_tcp_send.c:167) ==25777==by 0xAD025E: orte_rml_oob_send (rml_oob_send.c:137) ==25777==by 0xAD0CE3: orte_rml_oob_send_buffer (rml_oob_send.c:269) ==25777==by 0xAA50A6: allgather (grpcomm_bad_module.c:370) ==25777==by 0xAA592D: modex (grpcomm_bad_module.c:498) ==25777==by 0x92EE48: ompi_mpi_init (ompi_mpi_init.c:626) ==25777==by 0x95351C: PMPI_Init (pinit.c:80) Since this isn't a performance critical part of Open MPI, why not follow the reasoning already noted in a comment at opal/util/if.c:208 and zero-out the struct even outside OMPI_ENABLE_MEM_DEBUG. The attached patch makes this one-line change and clears up all valgrind warnings (when --with-valgrind enabled). Regards, Simon diff -r -U 5 openmpi-1.3.2/opal/util/if.c openmpi-1.3.2.edited/opal/util/if.c --- openmpi-1.3.2/opal/util/if.c 2009-04-16 20:02:42.0 +0100 +++ openmpi-1.3.2.edited/opal/util/if.c 2009-04-23 16:18:09.0 +0100 @@ -258,11 +258,12 @@ struct ifreq* ifr = (struct ifreq*) ptr; opal_if_t intf; opal_if_t *intf_ptr; int length; -OMPI_DEBUG_ZERO(intf); +/* Again, make valgrind and purify happy - this isn't performance critical. */ +memset(&intf, 0, sizeof(intf)); OBJ_CONSTRUCT(&intf, opal_list_item_t); /* compute offset for entries */ #ifdef HAVE_STRUCT_SOCKADDR_SA_LEN length = sizeof(struct sockaddr);
Re: [OMPI devel] Patch to resolve valgrind warnings for 1.3.2
Done; thanks! https://svn.open-mpi.org/trac/ompi/changeset/21060 (I added in a few more memset's and calloc's elsewhere in if.c, just to be complete) On Apr 23, 2009, at 11:36 AM, Number Cruncher wrote: Although --enable-mem-debug resolves the issue, I get warnings about uninitialized bytes in writev from the opal_if_t structs in opal_ifinit: ==25777== Syscall param writev(vector[...]) points to uninitialised byte(s) ==25777==at 0x34DE2C9F0C: writev (in /lib64/libc-2.6.so) ==25777==by 0xAC233B: mca_oob_tcp_msg_send_handler (oob_tcp_msg.c:265) ==25777==by 0xABAC92: mca_oob_tcp_peer_send (oob_tcp_peer.c:197) ==25777==by 0xAC0A80: mca_oob_tcp_send_nb (oob_tcp_send.c:167) ==25777==by 0xAD025E: orte_rml_oob_send (rml_oob_send.c:137) ==25777==by 0xAD0CE3: orte_rml_oob_send_buffer (rml_oob_send.c: 269) ==25777==by 0xAA50A6: allgather (grpcomm_bad_module.c:370) ==25777==by 0xAA592D: modex (grpcomm_bad_module.c:498) ==25777==by 0x92EE48: ompi_mpi_init (ompi_mpi_init.c:626) ==25777==by 0x95351C: PMPI_Init (pinit.c:80) Since this isn't a performance critical part of Open MPI, why not follow the reasoning already noted in a comment at opal/util/if.c:208 and zero-out the struct even outside OMPI_ENABLE_MEM_DEBUG. The attached patch makes this one-line change and clears up all valgrind warnings (when --with-valgrind enabled). Regards, Simon diff -r -U 5 openmpi-1.3.2/opal/util/if.c openmpi-1.3.2.edited/opal/ util/if.c --- openmpi-1.3.2/opal/util/if.c2009-04-16 20:02:42.0 +0100 +++ openmpi-1.3.2.edited/opal/util/if.c 2009-04-23 16:18:09.0 +0100 @@ -258,11 +258,12 @@ struct ifreq* ifr = (struct ifreq*) ptr; opal_if_t intf; opal_if_t *intf_ptr; int length; -OMPI_DEBUG_ZERO(intf); +/* Again, make valgrind and purify happy - this isn't performance critical. */ +memset(&intf, 0, sizeof(intf)); OBJ_CONSTRUCT(&intf, opal_list_item_t); /* compute offset for entries */ #ifdef HAVE_STRUCT_SOCKADDR_SA_LEN length = sizeof(struct sockaddr); -- Jeff Squyres Cisco Systems
[OMPI devel] size of items in mca_osc_pt2pt_component.p2p_c_buffers
Hi, I'm currently looking at this bug: http://www.open-mpi.org/community/lists/users/2008/12/7611.php I'm using the 1.3.2 tarball. Valgrind tells me that there is an invalid write (of size 1) in osc_pt2pt_data_move.c at line 229 which is the statement memcpy((unsigned char*) buffer->payload + written_data, packed_ddt, packed_ddt_len); in the function ompi_osc_pt2pt_sendreq_send. I have (gdb) p packed_ddt_len $2 = 44852 and (gdb) p written_data $3 = 36 but I can't figure out what the actual size of buffer->payload is. I have (gdb) p *buffer $6 = {mpireq = {super = {super = {super = { obj_magic_id = 16046253926196952813, obj_class = 0x4f5240, obj_reference_count = 1, cls_init_file_name = 0x2efe0b "class/opal_free_list.c", cls_init_lineno = 114}, opal_list_next = 0x0, opal_list_prev = 0x0, item_free = 1, opal_list_item_refcount = 0, opal_list_item_belong_to = 0x0}}, request = 0x5a35, status = { MPI_SOURCE = 23094, MPI_TAG = 23095, MPI_ERROR = 23096, _count = 23097, _cancelled = 23098}, cbfunc = 0x4e6cc5 , cbdata = 0x8681080}, payload = 0x86bc0d8, len = 23102} Is len the size of payload? In osc_pt2pt_component.c I found the statement /* adjust size to be multiple of ompi_ptr_t to avoid alignment issues*/ aligned_size = sizeof(ompi_osc_pt2pt_buffer_t) + (sizeof(ompi_osc_pt2pt_buffer_t) % sizeof(ompi_ptr_t)) + mca_osc_pt2pt_component.p2p_c_eager_size; OBJ_CONSTRUCT(&mca_osc_pt2pt_component.p2p_c_buffers, opal_free_list_t); opal_free_list_init(&mca_osc_pt2pt_component.p2p_c_buffers, aligned_size, OBJ_CLASS(ompi_osc_pt2pt_buffer_t), 1, -1, 1); but this doesn't help me to understand ... Can you help with this? Where can I find the allocation routine for the buffer? Or do you know why there could be an invalid write? Thanks + Best regards, Dorian
Re: [OMPI devel] RFC: Final cleanup of included headers
Hi, what were we talking about again ;-)? Aah, right: 1.3.2 is out the door, the dust has settled. Tuesday, next week, I'd like to apply the patch (produced by contrib/check_unnecessary_headers.sh, see the first email). This will require just two patches: - one (independent patch) for a few minor additions of mostly include string.h and other includes of header files - then the "big" one, that gets rid of inclusion of header files in 832 files (again mostly one-liners). For this, I expect to not require exclusive access to the trunk... Would that be reasonable? CU, Rainer PS: Regarding MTT, it _has_ been tested on several clusters (getting coverage of more MCAs, compilers and so on), but neither OS's, nor architectures, so... On Friday 20 March 2009 08:45:27 am Jeff Squyres wrote: > This sounds reasonable -- small changes are always appreciated. If we > can hold off on the big patch until 1.3 is about to morph into 1.4, > that would be good. > > Can you remind us about this issue when 1.3.2 gets out the door? > (sorry, not trying to be a jerk; I just know that my short term memory > is so non-existent that I'll have forgotten about it by then :- > ( ...what were we talking about again?) > > On Mar 19, 2009, at 5:12 PM, Rainer Keller wrote: > > Hi Ralph, > > > > On Wednesday 18 March 2009 09:00:36 am Ralph Castain wrote: > > > Could we hold off on this until after 1.3.2 is out the door and > > > > has a > > > > > couple of days to stabilize? All these header file changes are > > > > making > > > > > it more difficult to cleanly apply patches to the 1.3 branch. > > > > Hmm, sure, we can hold off the big patch. > > With the current plan, 1.3.2 should be out on 4/3. > > > > Some intermediate (small!) steps however I'd still like to be able > > to apply? > > > > > When we get past the next couple of weeks, the 1.3 branch should > > > > clear > > > > > out the backlog of CMRs, and we should have the usual immediate > > > > "oops" > > > > > fixes in to 1.3.2. Then this won't be such a problem. > > > > However, it would be nice, if You could test the patch on Your > > systems, prior > > to me moving it into trunk. I want to limit the "down-time" of trunk > > (There > > may be a few places, where additional headers are required -- as > > unnecessary > > headers were removed in lower-level headers). > > > > Thanks, > > Rainer > > -- > > > > Rainer Keller, PhD Tel: +1 (865) 241-6293 > > Oak Ridge National Lab Fax: +1 (865) 241-4811 > > PO Box 2008 MS 6164 Email: kel...@ornl.gov > > Oak Ridge, TN 37831-2008AIM/Skype: rusraink > > > > > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Rainer Keller, PhD Tel: +1 (865) 241-6293 Oak Ridge National Lab Fax: +1 (865) 241-4811 PO Box 2008 MS 6164 Email: kel...@ornl.gov Oak Ridge, TN 37831-2008AIM/Skype: rusraink