Re: [OMPI devel] RFC - "system-wide-only" MCA parameters

2009-09-05 Thread Jeff Squyres

On Sep 4, 2009, at 2:34 PM, Sylvain Jeaugey wrote:


hg -R "$srcdir" tip | head -1 | grep "^changeset:" | cut -d: -f3




Good catch; I changed it slightly from this to:

hg -v -R "$srcdir" tip | grep ^changeset: | head -n 1 | cut -d: -f3

I put this on the trunk since I couldn't push to your bb tree; you  
should get it in the next pull.


--
Jeff Squyres
jsquy...@cisco.com



[OMPI devel] version number issues

2009-09-05 Thread Jeff Squyres

On Sep 4, 2009, at 2:56 PM, Nadia Derbey wrote:

Actually, I didn't have the problem on my side, because hg is not  
known

in my build environment. Never noticed these lines:

-

*** Checking versions
checking for SVN version... ../configure: line 4285: hg: command not
found
done
checking Open MPI version... 1.4a1hg




I changed the subject since we're digressing a bit off the original  
RFC...


This is an interesting failure mode that we evidently didn't consider  
when we wrote that script.  ;-)  I guess we should check $? when  
returning and ensure that the command executed properly.


The only question is -- should we abort in this case, or just put in  
"unknown -- could not find hg" (or whatever) as the version?  I would  
lean towards the latter; development machines may vary wildly in what  
software is installed...  Unless anyone objects, I'll do the latter.


--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI devel] RFC - "system-wide-only" MCA parameters

2009-09-05 Thread Jeff Squyres

On Sep 4, 2009, at 5:47 PM, Sylvain Jeaugey wrote:


Understood. So, let's say that we're only implementing a hurdle to
discourage users from doing things wrong. I guess the efficiency of  
this
will reside in the message displayed to the user ("You are about to  
break
the entire machine and you will be fined if you try to circumvent  
this in

any way").

Maybe the warning message should be set by administrators
($OMPI/.../no-override.txt). It would certainly be more efficient :)



Ralph is certainly right: there is no way that we can prevent users --  
even those with the best of intentions -- from circumventing the  
system when they perceive the system not working they way they want it  
to (such is the nature of open source).  So this functionality is just  
adding another hurdle towards trying to help prevent that behavior.   
It does help in [ISV] applications where OMPI is statically linked to  
the app -- in that case, the user *won't* be able to just replace the  
system OMPI with their own.


That being said, it does seem like the best-functioning hurdle would  
be to print a site-specific message when users try to override priv  
params ("Bob the sysadmin set a parameter for this system that you  
tried to override.  See http://internal/why-ompi-is-set-this-way.hml  
for an explanation of OMPI site-wide settings").  This might give well- 
intentioned users a clue as to *why* the system is not functioning the  
way that they expect, potentially educating them and deterring  
circumventing the system.


I think that's the best that we can do.

--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI devel] RFC - "system-wide-only" MCA parameters

2009-09-05 Thread Ralph Castain


On Sep 5, 2009, at 3:00 AM, Jeff Squyres wrote:


On Sep 4, 2009, at 5:47 PM, Sylvain Jeaugey wrote:


Understood. So, let's say that we're only implementing a hurdle to
discourage users from doing things wrong. I guess the efficiency of  
this
will reside in the message displayed to the user ("You are about to  
break
the entire machine and you will be fined if you try to circumvent  
this in

any way").

Maybe the warning message should be set by administrators
($OMPI/.../no-override.txt). It would certainly be more efficient :)



Ralph is certainly right: there is no way that we can prevent users  
-- even those with the best of intentions -- from circumventing the  
system when they perceive the system not working they way they want  
it to (such is the nature of open source).  So this functionality is  
just adding another hurdle towards trying to help prevent that  
behavior.  It does help in [ISV] applications where OMPI is  
statically linked to the app -- in that case, the user *won't* be  
able to just replace the system OMPI with their own.


That being said, it does seem like the best-functioning hurdle would  
be to print a site-specific message when users try to override priv  
params ("Bob the sysadmin set a parameter for this system that you  
tried to override.  See http://internal/why-ompi-is-set-this-way.hml  
for an explanation of OMPI site-wide settings").  This might give  
well-intentioned users a clue as to *why* the system is not  
functioning the way that they expect, potentially educating them and  
deterring circumventing the system.


I really like this addition - if users just see something not work,  
they will tend to believe something is broken and try to develop  
workarounds. Explaining -why- it is restricted will help reduce that  
reaction.





I think that's the best that we can do.

--
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] Deadlock when creating too many communicators

2009-09-05 Thread Wolfgang Bangerth

Howdy,
here's a creative way to deadlock a program: create and destroy 65500 and 
some communicators and send a message on each of them:

#include 

#define CHECK(a)\
  { \
int err = (a);  \
if (err != 0) std::cout << "Error in line " << __LINE__ << std::endl; \
  }

int main (int argc, char *argv[])
{
  int a=0, b;

  MPI_Init (&argc, &argv);

  for (int i=0; i<100; ++i)
{
  if (i % 100 == 0) std::cout<< "Duplication event " << i << std::endl;

  MPI_Comm dup;
  CHECK(MPI_Comm_dup (MPI_COMM_WORLD, &dup));
  CHECK(MPI_Allreduce(&a, &b, 1, MPI_INT, MPI_MIN, dup));
  CHECK(MPI_Comm_free (&dup));
}

  MPI_Finalize();
}
---
If you run this, for example, on two processors with OpenMPI 1.2.6 or 
1.3.2, you'll see that the program runs until after it produces 65500 as 
output, and then just hangs -- on my system somewhere in the operating 
system poll(), running full steam.

Since I take care of destroying the communicators again, I would have 
expected this to work. I use creating many communicators basically as a 
debugging tool: every object gets its own communicator to work on to 
ensure that different objects don't communicate by accident with each 
other just because they all use MPI_COMM_WORLD. It would be nice if this 
mode of using MPI could be made to work.

Best & thanks in advance!
 Wolfgang

-- 
-
Wolfgang Bangerthemail:bange...@math.tamu.edu
 www: http://www.math.tamu.edu/~bangerth/