Re: [OMPI users] Display in terminal of error message using throw std::runtime_error on distant node...

2010-09-24 Thread Olivier Riff
That is already an answer that make sense. I understand that it is really
not a trivial issue. I have seen other recent threads about "running on
crashed nodes", and that the openmpi team is working hard on it. Well we
will wait and be glad to test the first versions when (I understand it will
take some time) they are released.

Thanks for this quick reply,

Olivier

2010/9/24 Jeff Squyres 

> Open MPI's fault tolerance is still somewhat rudimentary; it's a complex
> topic within the entire scope of MPI.  There has been much research into MPI
> and fault tolerance over the years; the MPI Forum itself is grappling with
> terms and definitions that make sense.  It's by no means a "solved" problem.
>
> It's unfortunately unsurprising that Open MPI may hang in the case of a
> node crash.  I wish that I had a better answer for you, but I don't.  :-\
>
>
> On Sep 24, 2010, at 3:36 AM, Olivier Riff wrote:
>
> > Hello,
> >
> > My question concerns the display of error message generated by a throw
> std::runtime_error("Explicit error message").
> > I am launching on a terminal an openMPI program on several machines
> using:
> > mpirun -v -machinefile MyMachineFile.txt MyProgram.
> > I am wondering why I cannot see an error message displayed on the
> terminal when one of my distant node (meaning not the node where the
> terminal is used) is crashing. I was expecting that following try catch
> could also generates a display in the terminal:
> > try {...My code where a crash happens... }
> > {
> >   throw std::runtime_error( "Explicit error message" );
> > }
> >
> > Generally, my problem is that one of the node crashes and the global
> application waits forever data from this node. On the terminal, nothing is
> displayed indicating that the node has crashed and generated a useful
> information of the crash nature.
> >
> > ( I don't think these information are relevant here, but just in case: I
> am using openMPI 1.4.2, on a Mandriva 2008 system )
> >
> > Thanks in advance for any help/info/advice.
> >
> > Olivier
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Display in terminal of error message using throw std::runtime_error on distant node...

2010-09-24 Thread Jeff Squyres
Open MPI's fault tolerance is still somewhat rudimentary; it's a complex topic 
within the entire scope of MPI.  There has been much research into MPI and 
fault tolerance over the years; the MPI Forum itself is grappling with terms 
and definitions that make sense.  It's by no means a "solved" problem.

It's unfortunately unsurprising that Open MPI may hang in the case of a node 
crash.  I wish that I had a better answer for you, but I don't.  :-\


On Sep 24, 2010, at 3:36 AM, Olivier Riff wrote:

> Hello,
> 
> My question concerns the display of error message generated by a throw 
> std::runtime_error("Explicit error message").
> I am launching on a terminal an openMPI program on several machines using:
> mpirun -v -machinefile MyMachineFile.txt MyProgram.
> I am wondering why I cannot see an error message displayed on the terminal 
> when one of my distant node (meaning not the node where the terminal is used) 
> is crashing. I was expecting that following try catch could also generates a 
> display in the terminal:
> try {...My code where a crash happens... } 
> {
>   throw std::runtime_error( "Explicit error message" );
> }
> 
> Generally, my problem is that one of the node crashes and the global 
> application waits forever data from this node. On the terminal, nothing is 
> displayed indicating that the node has crashed and generated a useful 
> information of the crash nature.
> 
> ( I don't think these information are relevant here, but just in case: I am 
> using openMPI 1.4.2, on a Mandriva 2008 system )
> 
> Thanks in advance for any help/info/advice.
> 
> Olivier
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




[OMPI users] Display in terminal of error message using throw std::runtime_error on distant node...

2010-09-24 Thread Olivier Riff
Hello,

My question concerns the display of error message generated by a throw
std::runtime_error("Explicit error message").
I am launching on a terminal an openMPI program on several machines using:
mpirun -v -machinefile MyMachineFile.txt MyProgram.
I am wondering why I cannot see an error message displayed on the terminal
when one of my distant node (meaning not the node where the terminal is
used) is crashing. I was expecting that following try catch could also
generates a display in the terminal:
try {...My code where a crash happens... }
{
  throw std::runtime_error( "Explicit error message" );
}

Generally, my problem is that one of the node crashes and the global
application waits forever data from this node. On the terminal, nothing is
displayed indicating that the node has crashed and generated a useful
information of the crash nature.

( I don't think these information are relevant here, but just in case: I am
using openMPI 1.4.2, on a Mandriva 2008 system )

Thanks in advance for any help/info/advice.

Olivier