Re: [OMPI devel] RFC - "system-wide-only" MCA parameters
On Sep 4, 2009, at 2:34 PM, Sylvain Jeaugey wrote: hg -R "$srcdir" tip | head -1 | grep "^changeset:" | cut -d: -f3 Good catch; I changed it slightly from this to: hg -v -R "$srcdir" tip | grep ^changeset: | head -n 1 | cut -d: -f3 I put this on the trunk since I couldn't push to your bb tree; you should get it in the next pull. -- Jeff Squyres jsquy...@cisco.com
[OMPI devel] version number issues
On Sep 4, 2009, at 2:56 PM, Nadia Derbey wrote: Actually, I didn't have the problem on my side, because hg is not known in my build environment. Never noticed these lines: - *** Checking versions checking for SVN version... ../configure: line 4285: hg: command not found done checking Open MPI version... 1.4a1hg I changed the subject since we're digressing a bit off the original RFC... This is an interesting failure mode that we evidently didn't consider when we wrote that script. ;-) I guess we should check $? when returning and ensure that the command executed properly. The only question is -- should we abort in this case, or just put in "unknown -- could not find hg" (or whatever) as the version? I would lean towards the latter; development machines may vary wildly in what software is installed... Unless anyone objects, I'll do the latter. -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI devel] RFC - "system-wide-only" MCA parameters
On Sep 4, 2009, at 5:47 PM, Sylvain Jeaugey wrote: Understood. So, let's say that we're only implementing a hurdle to discourage users from doing things wrong. I guess the efficiency of this will reside in the message displayed to the user ("You are about to break the entire machine and you will be fined if you try to circumvent this in any way"). Maybe the warning message should be set by administrators ($OMPI/.../no-override.txt). It would certainly be more efficient :) Ralph is certainly right: there is no way that we can prevent users -- even those with the best of intentions -- from circumventing the system when they perceive the system not working they way they want it to (such is the nature of open source). So this functionality is just adding another hurdle towards trying to help prevent that behavior. It does help in [ISV] applications where OMPI is statically linked to the app -- in that case, the user *won't* be able to just replace the system OMPI with their own. That being said, it does seem like the best-functioning hurdle would be to print a site-specific message when users try to override priv params ("Bob the sysadmin set a parameter for this system that you tried to override. See http://internal/why-ompi-is-set-this-way.hml for an explanation of OMPI site-wide settings"). This might give well- intentioned users a clue as to *why* the system is not functioning the way that they expect, potentially educating them and deterring circumventing the system. I think that's the best that we can do. -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI devel] RFC - "system-wide-only" MCA parameters
On Sep 5, 2009, at 3:00 AM, Jeff Squyres wrote: On Sep 4, 2009, at 5:47 PM, Sylvain Jeaugey wrote: Understood. So, let's say that we're only implementing a hurdle to discourage users from doing things wrong. I guess the efficiency of this will reside in the message displayed to the user ("You are about to break the entire machine and you will be fined if you try to circumvent this in any way"). Maybe the warning message should be set by administrators ($OMPI/.../no-override.txt). It would certainly be more efficient :) Ralph is certainly right: there is no way that we can prevent users -- even those with the best of intentions -- from circumventing the system when they perceive the system not working they way they want it to (such is the nature of open source). So this functionality is just adding another hurdle towards trying to help prevent that behavior. It does help in [ISV] applications where OMPI is statically linked to the app -- in that case, the user *won't* be able to just replace the system OMPI with their own. That being said, it does seem like the best-functioning hurdle would be to print a site-specific message when users try to override priv params ("Bob the sysadmin set a parameter for this system that you tried to override. See http://internal/why-ompi-is-set-this-way.hml for an explanation of OMPI site-wide settings"). This might give well-intentioned users a clue as to *why* the system is not functioning the way that they expect, potentially educating them and deterring circumventing the system. I really like this addition - if users just see something not work, they will tend to believe something is broken and try to develop workarounds. Explaining -why- it is restricted will help reduce that reaction. I think that's the best that we can do. -- Jeff Squyres jsquy...@cisco.com ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
[OMPI devel] Deadlock when creating too many communicators
Howdy, here's a creative way to deadlock a program: create and destroy 65500 and some communicators and send a message on each of them: #include #define CHECK(a)\ { \ int err = (a); \ if (err != 0) std::cout << "Error in line " << __LINE__ << std::endl; \ } int main (int argc, char *argv[]) { int a=0, b; MPI_Init (&argc, &argv); for (int i=0; i<100; ++i) { if (i % 100 == 0) std::cout<< "Duplication event " << i << std::endl; MPI_Comm dup; CHECK(MPI_Comm_dup (MPI_COMM_WORLD, &dup)); CHECK(MPI_Allreduce(&a, &b, 1, MPI_INT, MPI_MIN, dup)); CHECK(MPI_Comm_free (&dup)); } MPI_Finalize(); } --- If you run this, for example, on two processors with OpenMPI 1.2.6 or 1.3.2, you'll see that the program runs until after it produces 65500 as output, and then just hangs -- on my system somewhere in the operating system poll(), running full steam. Since I take care of destroying the communicators again, I would have expected this to work. I use creating many communicators basically as a debugging tool: every object gets its own communicator to work on to ensure that different objects don't communicate by accident with each other just because they all use MPI_COMM_WORLD. It would be nice if this mode of using MPI could be made to work. Best & thanks in advance! Wolfgang -- - Wolfgang Bangerthemail:bange...@math.tamu.edu www: http://www.math.tamu.edu/~bangerth/