Re: [OMPI devel] openmpi-1.8 - hangup using more than 4 nodes under managed state by Torque

tmishima Tue, 1 Apr 2014 19:22:28 -0400 (EDT)


Thanks Ralph.


Tetsuya

> I tracked it down - not Torque specific, but impacts all managed
environments. Will fix
>
>
> On Apr 1, 2014, at 2:23 AM, [email protected] wrote:
>
> >
> > Hi Ralph,
> >
> > I saw another hangup with openmpi-1.8 when I used more than 4 nodes
> > (having 8 cores each) under managed state by Torque. Although I'm not
> > sure you can reproduce it with SLURM, at leaset with Torque it can be
> > reproduced in this way:
> >
> > [mishima@manage ~]$ qsub -I -l nodes=4:ppn=8
> > qsub: waiting for job 8726.manage.cluster to start
> > qsub: job 8726.manage.cluster ready
> >
> > [mishima@node09 ~]$ mpirun -np 65 ~/mis/openmpi/demos/myprog
> >
--------------------------------------------------------------------------
> > There are not enough slots available in the system to satisfy the 65
slots
> > that were requested by the application:
> >  /home/mishima/mis/openmpi/demos/myprog
> >
> > Either request fewer slots for your application, or make more slots
> > available
> > for use.
> >
--------------------------------------------------------------------------
> > <<< HANG HERE!! >>>
> > Abort is in progress...hit ctrl-c again within 5 seconds to forcibly
> > terminate
> >
> > I found this behavior when I happened to input wrong procs. With less
than
> > 4
> > nodes or rsh - namely unmanaged state, it works. I'm afraid to say I
have
> > no
> > idea how to resolve it. I hope you could fix the problem.
> >
> > Tetsuya
> >
> > _______________________________________________
> > devel mailing list
> > [email protected]
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Searchable archives:
http://www.open-mpi.org/community/lists/devel/2014/04/index.php
>
> _______________________________________________
> devel mailing list
> [email protected]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/04/14438.php

Re: [OMPI devel] openmpi-1.8 - hangup using more than 4 nodes under managed state by Torque

Reply via email to