Re: [OMPI devel] [Open MPI] #2043: sm BTL hang with GCC 4.4.x

2010-01-05 Thread Eugene Loh
Both.  It originally meant we need to get the fix out ASAP.  Follow-up 
e-mail (Louis sent privately to me, I responded just now to the users 
list to publish/archive the info) suggests that our fixes don't fix his 
problem.  So, the fixes are insufficient, or he's encountering a 
different problem.


Jeff Squyres wrote:


...just catching up after the holidays...

Just to ensure I understand: does this mean that the sm issue is *not* yet 
resolved?  Or does it mean that it *is* resolved on the 1.4 branch and we need 
to get 1.4.1 out ASAP?

On Jan 4, 2010, at 10:17 AM, Eugene Loh wrote:
 


Open MPI wrote:
   


Comment(by eugene):

Another thread has been attributed to this problem:

http://www.open-mpi.org/community/lists/users/2010/01/11674.php [[BR]]
''Dual quad core Opteron hangs on Bcast.''

In this case, repeated broadcasts are hanging with Fedora FC11.
 


Looks like we're getting the trac 2043 hang with FC11.  The problem
appears to be widespread.  Is there anything we should do to fix this
particular distribution?  Talk to the Fedora people?
   



[OMPI devel] Howto pause BTL's sending at runtime

2010-01-05 Thread Christoph Konersmann

Hi all,

I'm trying to implement a method to pause all BTL's sending packets to 
their destinations.


Currently I added a state variable to orte_process_info which will be 
changed with an external program through process_commands() in 
orte/orted/orted_comm.c (I hope it's processed globaly not locally). 
While this state is changed to something defined as PAUSE, I want the 
send_methods in PML-Layer to be halted omitting any network traffic. By 
now it's not working, cause the PML-Layer does not see the state change.


Another way would be to use a libevent thread on the bml/pml-level. I've 
read that this library is already supported/implemented, or am I wrong? 
How would I use libevent in this context? Does somebody have an example 
or hint? Or should I use the fault tolerance framework for this purpose?


Any help would be appreciated. thanks


--
Paderborn Center for Parallel Computing - PC2
University of Paderborn - Germany
http://www.pc2.de

Christoph Konersmann 


Re: [OMPI devel] [Open MPI] #2043: sm BTL hang with GCC 4.4.x

2010-01-05 Thread Jeff Squyres
...just catching up after the holidays...

Just to ensure I understand: does this mean that the sm issue is *not* yet 
resolved?  Or does it mean that it *is* resolved on the 1.4 branch and we need 
to get 1.4.1 out ASAP?



On Jan 4, 2010, at 10:17 AM, Eugene Loh wrote:

> Open MPI wrote:
> 
> >#2043: sm BTL hang with GCC 4.4.x
> >---+
> >Reporter:  eugene  |   Owner: 
> >Type:  defect  |  Status:  new
> >Priority:  major   |   Milestone:  Open MPI 1.4
> > Version:  1.3 branch  |Keywords: 
> >---+
> >
> >Comment(by eugene):
> >
> > Another thread has been attributed to this problem:
> >
> > http://www.open-mpi.org/community/lists/users/2010/01/11674.php [[BR]]
> > ''Dual quad core Opteron hangs on Bcast.''
> >
> > In this case, repeated broadcasts are hanging with Fedora FC11.
> > 
> >
> Looks like we're getting the trac 2043 hang with FC11.  The problem
> appears to be widespread.  Is there anything we should do to fix this
> particular distribution?  Talk to the Fedora people?
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 


-- 
Jeff Squyres
jsquy...@cisco.com




[OMPI devel] Thread safety levels

2010-01-05 Thread Sylvain Jeaugey

Hi list,

I'm currently playing with thread levels in Open MPI and I'm quite 
surprised by the current code.


First, the C interface :
at ompi/mpi/c/init_thread.c:56 we have :
#if OPAL_ENABLE_MPI_THREADS
*provided = MPI_THREAD_MULTIPLE;
#else
*provided = MPI_THREAD_SINGLE;
#endif
prior to the call to ompi_mpi_init() which will in turn override the 
"provided" value. Should we remove these 5 lines ?


Then at ompi/runtime/ompi_mpi_init.c:372, we have -I guess- the real code 
which is :


ompi_mpi_thread_requested = requested;
if (OPAL_HAVE_THREAD_SUPPORT == 0) {
ompi_mpi_thread_provided = *provided = MPI_THREAD_SINGLE;
ompi_mpi_main_thread = NULL;
} else if (OPAL_ENABLE_MPI_THREADS == 1) {
ompi_mpi_thread_provided = *provided = requested;
ompi_mpi_main_thread = opal_thread_get_self();
} else {
if (MPI_THREAD_MULTIPLE == requested) {
ompi_mpi_thread_provided = *provided = MPI_THREAD_SERIALIZED;
} else {
ompi_mpi_thread_provided = *provided = requested;
}
ompi_mpi_main_thread = opal_thread_get_self();
}

This code seems ok to me provided that :
 * (OPAL_ENABLE_MPI_THREADS == 1) means "Open MPI configured to provide 
thread multiple",
 * (OPAL_HAVE_THREAD_SUPPORT == 0) means "we do not have threads at all" 
though even if we do not have threads at compile time, it does in no way 
prevent us from doing THREAD_FUNNELED or THREAD_SERIALIZED.


The reality seems different at opal/include/opal_config_bottom.h:70 :

/* Do we have posix or solaris thread lib */
#define OPAL_HAVE_THREADS (OPAL_HAVE_POSIX_THREADS || OPAL_HAVE_SOLARIS_THREADS)
/* Do we have thread support? */
#define OPAL_HAVE_THREAD_SUPPORT (OPAL_ENABLE_MPI_THREADS || 
OPAL_ENABLE_PROGRESS_THREADS)

"we do not have threads at all" seems to me to be OPAL_HAVE_THREADS and 
not OPAL_HAVE_THREAD_SUPPORT. What do you think ? Maybe 
OPAL_HAVE_THREAD_SUPPORT should be renamed, too (seems misleading to me).


The result is that the current default configuration of Open MPI has 
OPAL_HAVE_THREAD_SUPPORT defined to 0 and Open MPI always returns 
THREAD_SINGLE, even if it is perfectly capable of THREAD_FUNNELED and 
THREAD_SERIALIZED.


Sylvain






Re: [OMPI devel] RFC: Suspend/resume enhancements

2010-01-05 Thread Terry Dontje
This only happens when the orte_forward_job_control MCA flag is set to 1 
and the default is that it is set to 0.  Which I believe meets Ralph's 
criteria below.


--td

Ralph Castain wrote:

I don't have any issue with this so long as (a) it is -only- active when 
someone sets a specific MCA param requesting it, and (b) that flag is -not- set 
by default.


On Jan 4, 2010, at 11:50 AM, Iain Bason wrote:

  

WHAT: Enhance the orte_forward_job_control MCA flag by:

 1. Forwarding signals to descendants of launched processes; and
 2. Forwarding signals received before process launch time.

(The orte_forward_job_control flag arranges for SIGTSTP and SIGCONT to
be forwarded.  This allows a resource manager like Sun Grid Engine to
suspend a job by sending a SIGTSTP signal to mpirun.)

WHY: Some programs do "mpirun prog.sh", and prog.sh starts multiple
processes.  Among these programs is weather prediction code from
the UK Met Office.  This code is used at multiple sites around
the world.  Since other MPI implementations* forward job control
signals this way, we risk having OMPI excluded unless we
implement this feature.

[*I have personally verified that Intel MPI does it.  I have
heard that Scali does it.  I don't know about the others.]

HOW: To allow signals to be sent to descendants of launched processes,
use the setpgrp() system call to create a new process group for
each launched process.  Then send the signal to the process group
rather than to the process.

To allow signals received before process launch time to be
delivered when the processes are launched, add a job state flag
to indicate whether the job is suspended.  Check this flag at
launch time, and send a signal immediately after launching.

WHERE: http://bitbucket.org/igb/ompi-job-control/

WHEN: We would like to integrate this into the 1.5 branch.

TIMEOUT: COB Tuesday, January 19, 2010.

Q:

 1. Will this work for Windows?

I don't know what would be required to make this work for
Windows.  The current implementation is for Unix only.

 2. Will this work for interactive ssh/rsh PLM?

It will not work any better or worse than the current
implementation.  One can suspend a job by typing Ctl-Z at a
terminal, but the mpirun process itself never gets suspended.
That means that in order to wake the job up one has to open a
different terminal to send a SIGCONT to the mpirun process.  It
would be desirable to fix this problem, but as this feature is
intended for use with resource managers like SGE it isn't
essential to make it work smoothly in an interactive shell.

 3. Will the creation of new process groups prohibit SGE from killing
a job properly?

No.  SGE has a mechanism to ensure that all a job's processes are
killed, regardless of whether they create new process groups.

 4. What about other resource managers?

Using this flag with another resource manager might cause
problems.  However, the flag may not be necessary with other
resource managers.  (If the RM can send SIGSTOP to all the
processes on all the nodes running a job, then mpirun doesn't
need to forward job control signals.)

According to the SLURM documentation, plugins are available
(e.g., linuxproc) that would allow reliable termination of all a
job's processes, regardless of whether they create new process
groups.
[https://computing.llnl.gov/linux/slurm/proctrack_plugins.html]

 5. Will the creation of new process groups prevent mpirun from
shutting down the job successfully (e.g., when it receives a
SIGTERM)?

No.  I have tested jobs both with and without calls to
MPI_Comm_Spawn, and all are properly terminated.

 6. Can we avoid creating new process groups by just signaling the
launched process plus any process that calls MPI_Init?

No.  The shell script might launch other background processes
that the user wants to suspend.  (The Met Office code does this.)

 7. Can we avoid creating new process groups by having mpirun and
orted send SIGTSTP to their own process groups, and ignore the
signal that they send to themselves?

No.  First, mpirun might be in the same process group as other
mpirun processes.  Those mpiruns could get into an infinite loop
forwarding SIGTSTPs to one another.  Second, although the default
action on receipt of SIGTSTP is to suspend the process, that only
happens if the process is not in an orphaned process group.  SGE
starts processes in orphaned process groups.

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel