Re: [OMPI devel] MPI_Win_get_group
* Lisandro Dalcin [2007-07-30 18:19:21]: > On 7/30/07, George Bosilca wrote: > > In the data-type section there is an advice to implementors that > > state that a copy can simply increase the reference count if > > applicable. So, we might want to apply the same logic here ... > BTW, you just mentioned other obscure case. Do this apply to NAMED > datatypes? This issue is really cumbersome in File.Get_view(). The MPI_File_get_view description in the standard has some issues related to copies and named datatypes: see http://www-unix.mcs.anl.gov/~gropp/projects/parallel/MPI/mpi-errata/discuss/fileview/fileview-1-clean.txt Dries Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
Re: [OMPI devel] minor bug report for building openmpi-1.2.3 on cygwin
Andrew, Thanks for the info. I fix this problem in the 1.3 release. If you download the nightly build after revision 15711, it will be corrected. Thanks, george. On Jul 27, 2007, at 10:19 AM, Andrew Lofthouse wrote: Hi, I've just built and installed openmpi-1.2.3 on cygwin. It seems that most files depend on opal/mca/timer/windows/timer_windows.h, but opal/mca/timer/windows/timer_windows_component.c depends on opal/timer/windows/timer_windows_component.h (which doesn't exist). I simply copied timer_windows.h to timer_windows_component.h and it built correctly. I haven't yet compiled any MPI applications to check correct operation. Regards, AJL ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] MPI_Win_get_group
On 7/31/07, Dries Kimpe wrote: > The MPI_File_get_view description in the standard has some issues related > to copies and named datatypes: > > see > http://www-unix.mcs.anl.gov/~gropp/projects/parallel/MPI/mpi-errata/discuss/fileview/fileview-1-clean.txt Indeed, your comment was exactly the source of my comment (BTW, thank you, this helped me to fix my Python wrappers) In general, I think MPI standard should be fixed/clarified in many places regarding to handling of returned references. Testing for predefined Comm a Group handling is rather easy, but for Datatypes is really cumbersome. Perhaps a MPI_Type_is_named(MPI_Datatype, int*flag) would help a lot. What do you think? -- Lisandro Dalcín --- Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC) Instituto de Desarrollo Tecnológico para la Industria Química (INTEC) Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) PTLC - Güemes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594
Re: [OMPI devel] MPI_Win_get_group
On Jul 31, 2007, at 4:52 PM, Lisandro Dalcin wrote: In general, I think MPI standard should be fixed/clarified in many places regarding to handling of returned references. Testing for predefined Comm a Group handling is rather easy, but for Datatypes is really cumbersome. Perhaps a MPI_Type_is_named(MPI_Datatype, int*flag) would help a lot. What do you think? Just curious -- why do you need to know if a handle refers to a predefined object? -- Jeff Squyres Cisco Systems
[OMPI devel] pml failures?
I'm getting a pile of test failures when running with the openib and tcp BTLs on the trunk. Gleb is getting some failures, too, but his seem to be different than mine. Here's what I'm seeing from manual MTT runs on my SVN/development install -- did you know that MTT could do that? :-) +-+---+--+--+--+--+ | Phase | Section | Pass | Fail | Time out | Skip | +-+---+--+--+--+--+ | Test Run| intel | 442 | 0| 26 | 0| | Test Run| ibm | 173 | 3| 1| 3| +-+---+--+--+--+--+ The tests that are failing are: *** WARNING: Test: MPI_Recv_pack_c, np=16, variant=1: TIMED OUT (failed) *** WARNING: Test: MPI_Ssend_ator_c, np=16, variant=1: TIMED OUT (failed) *** WARNING: Test: MPI_Irecv_pack_c, np=16, variant=1: TIMED OUT (failed) *** WARNING: Test: MPI_Isend_ator_c, np=16, variant=1: TIMED OUT (failed) *** WARNING: Test: MPI_Irsend_rtoa_c, np=16, variant=1: TIMED OUT (failed) *** WARNING: Test: MPI_Ssend_rtoa_c, np=16, variant=1: TIMED OUT (failed) *** WARNING: Test: MPI_Send_rtoa_c, np=16, variant=1: TIMED OUT (failed) *** WARNING: Test: MPI_Send_ator_c, np=16, variant=1: TIMED OUT (failed) *** WARNING: Test: MPI_Rsend_rtoa_c, np=16, variant=1: TIMED OUT (failed) *** WARNING: Test: MPI_Reduce_loc_c, np=16, variant=1: TIMED OUT (failed) *** WARNING: Test: MPI_Isend_ator2_c, np=16, variant=1: TIMED OUT (failed) *** WARNING: Test: MPI_Issend_rtoa_c, np=16, variant=1: TIMED OUT (failed) *** WARNING: Test: MPI_Isend_rtoa_c, np=16, variant=1: TIMED OUT (failed) *** WARNING: Test: MPI_Send_ator2_c, np=16, variant=1: TIMED OUT (failed) *** WARNING: Test: MPI_Issend_ator_c, np=16, variant=1: TIMED OUT (failed) *** WARNING: Test: comm_join, np=16, variant=1: TIMED OUT (failed) *** WARNING: Test: getcount, np=16, variant=1: FAILED *** WARNING: Test: spawn, np=3, variant=1: FAILED *** WARNING: Test: spawn_multiple, np=3, variant=1: FAILED I'm not too worried about the comm spawn/join tests because I think they're heavily oversubscribing the nodes and therefore timing out. These were all from a default trunk build running with "mpirun --mca btl openib,self". For all of these tests, I'm running on 4 nodes, 4 cores each, but they have varying numbers of network interfaces: nodes 1,2 nodes 3,4 openib3 active ports 2 active ports tcp 4 tcp interfaces 3 tcp interfaces Is anyone else seeing these kinds of failures? -- Jeff Squyres Cisco Systems
[OMPI devel] openib modular wireup
Short version: -- The modular wireup code on /tmp/jms-modular-wireup seems to be working. Can people give it a whirl before I bring it back to the trunk? The more esoteric your hardware setup, the better. Longer version: --- I think that I have completed round 1 of the modular wireup work in / tmp/jms-modular-wireup, meaning that all the wireup code has been moved out of btl_openib_endpoint.* and into connect/*. The endpoint.c file now simply calls the connect interface through a function pointer (allowing the choice of the current RML-based wireup or the RDMA CM). The selected connect "module" will call back to the openib endpoint for two things: 1. post receive buffers on a locally-created-but-not-yet-connected qp 2. when the qp is fully connected and ready to be used This cleaned up the endpoint.* code a *lot*. I also simplified the RML connection code a bit -- I removed some useless sub-functions, etc. I *think* that this new connection code is all working, but per http://www.open-mpi.org/community/lists/devel/2007/07/2058.php, I'm seeing other weird failures so I'm a little reluctant to put this back on the trunk until I know that everything is working properly. Granted, the failures in the other post sound like pml errors and this should be a wholly separate issue (we would get different warnings/errors if the btl failed to connect), but still -- it seems a little safer to be prudent. Still to do: - make the static rate be exchanged and set properly during the RML wireup - RDMA CM support (it returns ERR_NOT_IMPLEMENTED right now) -- Jeff Squyres Cisco Systems
Re: [OMPI devel] MPI_Win_get_group
On 7/31/07, Jeff Squyres wrote: > Just curious -- why do you need to know if a handle refers to a > predefined object? If I understand correctly, new handles shoud be freed in order to do not leak things, to follow good programming practices, and being completelly sure a valgrind run do not report any problem. I am working in the development of MPI for Python, a port of MPI to Python, a high level language with automatic memory management. Said that, in such an environment, having to call XXX.Free() for every object i get from a call like XXX.Get_something() is really an unnecesary pain. Many things in MPI are LOCAL (datatypes, groups, predefined operations) and in general destroying them for user-space is guaranteed by MPI to not conflict with system(MPI)-space and communication (i.e. if you create a derived datatype four using it in a construction of another derived datatype, you can safely free the first). Well, for all those LOCAL objects, I could implement automatic deallocation of handles for Python (for Comm, Win, and File, that is not so easy, at freeing them is a collective operation AFAIK, and automaticaly freeing them can lead to deadlocks). My Python wrappers (mpi4py) are inteded to be used in any platform with any MPI implementation. But things are not so easy, as there are many corner cases in the MPI standard. Python es a wonderfull, powerfull language, very friendly to write things. Prove of that is the many bug reports I provided here. By using python, I can run all my unittest script in a single MPI run, thus they have the potential to find interaction problems between all parts of MPI. If any of you, OMPI developers, have some knowledge of Python, I invite you to try mpi4py, as you would be able to write very fast many many tests, not only for things that should work, but also for things that should fail. Sorry for the long mail. In short, many things in MPI are not clearly designed for languages other than C and Fortran. Even in C++ specification, there are things that are unnaceptable, like the open-door to the problem of having dangling references, which could be avoided with negligible cost. Anyway, all those issues are minor for me, and the MPI specification is just great. I hope I can find the time to contribute to the MPI-2.1 effort to better define MPI behavior in the corner cases (fortunatelly, there are a really small number of them). Regards, -- Lisandro Dalcín --- Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC) Instituto de Desarrollo Tecnológico para la Industria Química (INTEC) Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) PTLC - Güemes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594