Re: [OMPI devel] Pessimist Event Logger

2012-02-01 Thread Hugo Daniel Meyer
Adding some more context.

When trying to use the event logger (by using MPI_ANY_SOURCE) i get this
error:

[clus9:28158] defining message event: ../../orte/runtime/orte_data_server.c
414
[clus9:28158] [[56904,0],0] data server: lookup on service
ompi_ft_event_logger[0]
[clus9:28158] [[56904,0],0] data server: service ompi_ft_event_logger[0]
not found
[clus5:7310] *** An error occurred in
../../../../../ompi/mca/vprotocol/pessimist/vprotocol_pessimist_eventlog.h:
failed to connect to an Event Logger
[clus5:7310] *** on communicator MPI_COMM_NULL
[clus5:7310] *** MPI_ERR_INTERN: internal error
[clus5:7310] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort


The event_logger is not found, and of course the connection is not made.
The service ompi_ft_event_logger is not defined apparently.

Thanks for the help.

Hugo

2012/1/31 Hugo Daniel Meyer 

> Hello again.
>
> I've found where the connection with the event logger takes places. I've
> some doubts about the next section of code:
>
> *rc = ompi_dpm.connect_accept(MPI_COMM_SELF, 0, port, true, el_comm);*
>
> *if(OMPI_SUCCESS != rc) {*
>
> *ORTE_ERROR_LOG(rc);*
>
> *}*
>
> */* Send Rank, receive max buffer size and max_clock back */*
>
> *MPI_Comm_rank(MPI_COMM_WORLD, &rank);*
>
> *rc = mca_pml_v.host_pml.pml_send(&rank, 1, MPI_INTEGER, 0,*
>
> *
> VPROTOCOL_PESSIMIST_EVENTLOG_NEW_CLIENT_CMD,*
>
> * MCA_PML_BASE_SEND_STANDARD,*
>
> * mca_vprotocol_receiver.el_comm);*
>
> *if(OPAL_UNLIKELY(MPI_SUCCESS != rc))*
>
> *OMPI_ERRHANDLER_INVOKE(mca_vprotocol_receiver.el_comm, rc,*
>
> *   __FILE__ ": failed sending event logger
> handshake");*
>
> *rc = mca_pml_v.host_pml.pml_recv(&connect_info, 2,
> MPI_UNSIGNED_LONG_LONG,*
>
> * 0,
> VPROTOCOL_PESSIMIST_EVENTLOG_NEW_CLIENT_CMD,*
>
> * mca_vprotocol_receiver.el_comm,
> MPI_STATUS_IGNORE);*
>
> *if(OPAL_UNLIKELY(MPI_SUCCESS != rc))
>  \*
>
> *OMPI_ERRHANDLER_INVOKE(mca_vprotocol_receiver.el_comm, rc,
> \*
>
> *   __FILE__ ": failed receiving event logger
> handshake");*
>
>
> I understand that you make a connection using the dpm framework between
> the process 0 (the logger) and yourself (MPI_COMM_SELF). But then, you send
> and receive messages with pml. My question is: ¿Where is posted the recv of
> the event_logger? I didn't find where in the code the event_logger receives
> the rank, and answer the handshake.
>
> Thanks for your help.
>
> Hugo Meyer
>
> 2012/1/30 Hugo Daniel Meyer 
>
> Hello Aurelien.
>
> 2012/1/27 Aurélien Bouteiller 
>
> Hugo,
>
> It seems you want to implement some sort of remote pessimistic logging -a
> la MPICH-V1- ?
> MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes -- George
> Bosilca, Aurélien Bouteiller, Franck Cappello, Samir Djilali, Gilles Fédak,
> Cécile Germain, Thomas Hérault, Pierre Lemarinier, Oleg Lodygensky,
> Frédéric Magniette, Vincent Néri, Anton Selikhov -- In proceedings of The
> IEEE/ACM SC2002 Conference, Baltimore USA, November 2002
>
>   We could say that is similar because i use a distributed logging
> mechanism, but is a little diferent because my Memory Channels and
> Checkpoint Servers are the processing nodes, i don't have specials nodes to
> take care of the message log and checkpoints.
>
>
> In the PML-V, unlike older designs, the payload of messages and the
> non-deterministic events follow a different path. The payload of messages
> is logged on the sender's volatile memory, while the non-deterministic
> events are sent to a stable event logger, before allowing the process to
> impact the state of others (the code you have found in the previous email).
> The best depiction of this distinction can be read in this paper
> @inproceedings{DBLP:conf/europar/BouteillerHBD11,
>  author= {Aurelien Bouteiller and
>   Thomas H{\'e}rault and
>   George Bosilca and
>   Jack J. Dongarra},
>  title = {Correlated Set Coordination in Fault Tolerant Message Logging
>   Protocols},
>  booktitle = {Euro-Par 2011 Parallel Processing - 17th International
> Conference, Proceedings, Part II},
>  month = {September},
>  year  = {2011},
>  pages = {51-64},
>  publisher = {Springer},
>  series= {Lecture Notes in Computer Science},
>  volume= {6853},
>  year  = {2011},
>  isbn  = {978-3-642-23396-8},
>  doi   = {http://dx.doi.org/10.1007/978-3-642-23397-5_6},
>
>  I will take a look to this paper to clarify this distinction.
>
>
>
>
>  If you intend to store both payload and message log on a remote node, I
> suggest you look at the "sender-based" hooks, as this is where the message
> payload is managed, and adapt from here. The event loggers can already
> manage a subset only of the processes (if you launch as many EL as
> p

Re: [OMPI devel] Pessimist Event Logger

2012-01-31 Thread Hugo Daniel Meyer
Hello again.

I've found where the connection with the event logger takes places. I've
some doubts about the next section of code:

*rc = ompi_dpm.connect_accept(MPI_COMM_SELF, 0, port, true, el_comm);*

*if(OMPI_SUCCESS != rc) {*

*ORTE_ERROR_LOG(rc);*

*}*

*/* Send Rank, receive max buffer size and max_clock back */*

*MPI_Comm_rank(MPI_COMM_WORLD, &rank);*

*rc = mca_pml_v.host_pml.pml_send(&rank, 1, MPI_INTEGER, 0,*

*
VPROTOCOL_PESSIMIST_EVENTLOG_NEW_CLIENT_CMD,*

* MCA_PML_BASE_SEND_STANDARD,*

* mca_vprotocol_receiver.el_comm);*

*if(OPAL_UNLIKELY(MPI_SUCCESS != rc))*

*OMPI_ERRHANDLER_INVOKE(mca_vprotocol_receiver.el_comm, rc,*

*   __FILE__ ": failed sending event logger
handshake");*

*rc = mca_pml_v.host_pml.pml_recv(&connect_info, 2,
MPI_UNSIGNED_LONG_LONG,*

* 0,
VPROTOCOL_PESSIMIST_EVENTLOG_NEW_CLIENT_CMD,*

* mca_vprotocol_receiver.el_comm,
MPI_STATUS_IGNORE);*

*if(OPAL_UNLIKELY(MPI_SUCCESS != rc))  \
*

*OMPI_ERRHANDLER_INVOKE(mca_vprotocol_receiver.el_comm, rc,   \*

*   __FILE__ ": failed receiving event logger
handshake");*


I understand that you make a connection using the dpm framework between the
process 0 (the logger) and yourself (MPI_COMM_SELF). But then, you send and
receive messages with pml. My question is: ¿Where is posted the recv of the
event_logger? I didn't find where in the code the event_logger receives the
rank, and answer the handshake.

Thanks for your help.

Hugo Meyer

2012/1/30 Hugo Daniel Meyer 

Hello Aurelien.

2012/1/27 Aurélien Bouteiller 

Hugo,

It seems you want to implement some sort of remote pessimistic logging -a
la MPICH-V1- ?
MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes -- George
Bosilca, Aurélien Bouteiller, Franck Cappello, Samir Djilali, Gilles Fédak,
Cécile Germain, Thomas Hérault, Pierre Lemarinier, Oleg Lodygensky,
Frédéric Magniette, Vincent Néri, Anton Selikhov -- In proceedings of The
IEEE/ACM SC2002 Conference, Baltimore USA, November 2002

  We could say that is similar because i use a distributed logging
mechanism, but is a little diferent because my Memory Channels and
Checkpoint Servers are the processing nodes, i don't have specials nodes to
take care of the message log and checkpoints.


In the PML-V, unlike older designs, the payload of messages and the
non-deterministic events follow a different path. The payload of messages
is logged on the sender's volatile memory, while the non-deterministic
events are sent to a stable event logger, before allowing the process to
impact the state of others (the code you have found in the previous email).
The best depiction of this distinction can be read in this paper
@inproceedings{DBLP:conf/europar/BouteillerHBD11,
 author= {Aurelien Bouteiller and
  Thomas H{\'e}rault and
  George Bosilca and
  Jack J. Dongarra},
 title = {Correlated Set Coordination in Fault Tolerant Message Logging
  Protocols},
 booktitle = {Euro-Par 2011 Parallel Processing - 17th International
Conference, Proceedings, Part II},
 month = {September},
 year  = {2011},
 pages = {51-64},
 publisher = {Springer},
 series= {Lecture Notes in Computer Science},
 volume= {6853},
 year  = {2011},
 isbn  = {978-3-642-23396-8},
 doi   = {http://dx.doi.org/10.1007/978-3-642-23397-5_6},

 I will take a look to this paper to clarify this distinction.




 If you intend to store both payload and message log on a remote node, I
suggest you look at the "sender-based" hooks, as this is where the message
payload is managed, and adapt from here. The event loggers can already
manage a subset only of the processes (if you launch as many EL as
processes, you get a 1-1 mapping), but they never handle message payload;
you'll have to add all this yourself is it so pleases you.

  Yes, i need to store every message, not only the non-deterministics one.
In my case every node is an event logger. Let's say that i have 16
processes in four nodes (four by node), so the messages received by all
processes residing in node0 need to be stored in node1, and the received
messages received by all processes residing in node1, need to be stored in
node2, and so on.

If i understand correctly, i have to modify the behavior in
ompi/mca/vprotocol/pessimist, to manage the message payload. And another
question, is there a way to launch ELs in every node? or i will have to
modify this too?

Thanks a lot for your help Aurélien.

Hugo



Le 27 janv. 2012 à 11:19, Hugo Daniel Meyer a écrit :

> Hello Aurélien.
>
> Thanks for the clarification. Considering what you've mentioned i will
have to make some adaptations, because to me, every single message has to
be logged. So, a s

Re: [OMPI devel] Pessimist Event Logger

2012-01-30 Thread Hugo Daniel Meyer
Hello Aurelien.

2012/1/27 Aurélien Bouteiller 

> Hugo,
>
> It seems you want to implement some sort of remote pessimistic logging -a
> la MPICH-V1- ?
> MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes -- George
> Bosilca, Aurélien Bouteiller, Franck Cappello, Samir Djilali, Gilles Fédak,
> Cécile Germain, Thomas Hérault, Pierre Lemarinier, Oleg Lodygensky,
> Frédéric Magniette, Vincent Néri, Anton Selikhov -- In proceedings of The
> IEEE/ACM SC2002 Conference, Baltimore USA, November 2002


 We could say that is similar because i use a distributed logging
mechanism, but is a little diferent because my Memory Channels and
Checkpoint Servers are the processing nodes, i don't have specials nodes to
take care of the message log and checkpoints.

>
> In the PML-V, unlike older designs, the payload of messages and the
> non-deterministic events follow a different path. The payload of messages
> is logged on the sender's volatile memory, while the non-deterministic
> events are sent to a stable event logger, before allowing the process to
> impact the state of others (the code you have found in the previous email).
> The best depiction of this distinction can be read in this paper
> @inproceedings{DBLP:conf/europar/BouteillerHBD11,
>  author= {Aurelien Bouteiller and
>   Thomas H{\'e}rault and
>   George Bosilca and
>   Jack J. Dongarra},
>  title = {Correlated Set Coordination in Fault Tolerant Message Logging
>   Protocols},
>  booktitle = {Euro-Par 2011 Parallel Processing - 17th International
> Conference, Proceedings, Part II},
>  month = {September},
>  year  = {2011},
>  pages = {51-64},
>  publisher = {Springer},
>  series= {Lecture Notes in Computer Science},
>  volume= {6853},
>  year  = {2011},
>  isbn  = {978-3-642-23396-8},
>  doi   = {http://dx.doi.org/10.1007/978-3-642-23397-5_6},


I will take a look to this paper to clarify this distinction.

>

If you intend to store both payload and message log on a remote node, I
> suggest you look at the "sender-based" hooks, as this is where the message
> payload is managed, and adapt from here. The event loggers can already
> manage a subset only of the processes (if you launch as many EL as
> processes, you get a 1-1 mapping), but they never handle message payload;
> you'll have to add all this yourself is it so pleases you.
>

 Yes, i need to store every message, not only the non-deterministics one.
In my case every node is an event logger. Let's say that i have 16
processes in four nodes (four by node), so the messages received by all
processes residing in node0 need to be stored in node1, and the received
messages received by all processes residing in node1, need to be stored in
node2, and so on.
If i understand correctly, i have to modify the behavior in
ompi/mca/vprotocol/pessimist, to manage the message payload. And another
question, is there a way to launch ELs in every node? or i will have to
modify this too?

Thanks a lot for your help Aurélien.

Hugo

>
>
>
>
> Le 27 janv. 2012 à 11:19, Hugo Daniel Meyer a écrit :
>
> > Hello Aurélien.
> >
> > Thanks for the clarification. Considering what you've mentioned i will
> have to make some adaptations, because to me, every single message has to
> be logged. So, a sender not only will be sending messages to the receiver,
> but also to an event logger. Is there any considerations that i've to take
> into account when modifying the code?. My initial idea is to use the
> el_comm with a group of event loggers (because every node uses a different
> event logger in my approach), and then send the messages to them as you do
> when using MPI_ANY_SOURCE.
> >
> > Thanks for your help.
> >
> > Hugo Meyer
> >
> > 2012/1/27 Aurélien Bouteiller 
> > Hugo,
> >
> > Your program does not have non-deterministic events. Therefore, there
> are no events to log. If you add MPI_ANY_SOURCE, you should see this code
> being called. Please contact me again if you need more help.
> >
> > Aurelien
> >
> >
> > Le 27 janv. 2012 à 10:21, Hugo Daniel Meyer a écrit :
> >
> > > Hello @ll.
> > >
> > > George, i'm using some pieces of the pessimist vprotocol. I've
> observed that when you do a send, you call vprotocol_receiver_event_flush
> and here the macro __VPROTOCOL_RECEIVER_SEND_BUFFER is called. I've noticed
> that here you try send a copy of the message to process 0 using the
> el_comm. This section of code is never executed, at least in my examples.
> So, the message is never sent to the Event Logger, am i correct with this?
>  I think that this is happening because the
> mca_vprotocol_pessimist.event_buffer_length is always 0.
> > >
> > > Is there something that i've got to turn on, or i will have to modify
> this behavior manually to connect and send messages to the EL?
> > >
> > > Thanks in advance.
> > >
> > > Hugo Meyer
> > > ___
> > > devel mailing list
> > > de...@open-mp

Re: [OMPI devel] Pessimist Event Logger

2012-01-27 Thread Aurélien Bouteiller
Hugo, 

It seems you want to implement some sort of remote pessimistic logging -a la 
MPICH-V1- ? 
MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes -- George 
Bosilca, Aurélien Bouteiller, Franck Cappello, Samir Djilali, Gilles Fédak, 
Cécile Germain, Thomas Hérault, Pierre Lemarinier, Oleg Lodygensky, Frédéric 
Magniette, Vincent Néri, Anton Selikhov -- In proceedings of The IEEE/ACM 
SC2002 Conference, Baltimore USA, November 2002

In the PML-V, unlike older designs, the payload of messages and the 
non-deterministic events follow a different path. The payload of messages is 
logged on the sender's volatile memory, while the non-deterministic events are 
sent to a stable event logger, before allowing the process to impact the state 
of others (the code you have found in the previous email). The best depiction 
of this distinction can be read in this paper 
@inproceedings{DBLP:conf/europar/BouteillerHBD11,
  author= {Aurelien Bouteiller and
   Thomas H{\'e}rault and
   George Bosilca and
   Jack J. Dongarra},
  title = {Correlated Set Coordination in Fault Tolerant Message Logging
   Protocols},
  booktitle = {Euro-Par 2011 Parallel Processing - 17th International 
Conference, Proceedings, Part II},
  month = {September},
  year  = {2011},
  pages = {51-64},
  publisher = {Springer},
  series= {Lecture Notes in Computer Science},
  volume= {6853},
  year  = {2011},
  isbn  = {978-3-642-23396-8},
  doi   = {http://dx.doi.org/10.1007/978-3-642-23397-5_6},




If you intend to store both payload and message log on a remote node, I suggest 
you look at the "sender-based" hooks, as this is where the message payload is 
managed, and adapt from here. The event loggers can already manage a subset 
only of the processes (if you launch as many EL as processes, you get a 1-1 
mapping), but they never handle message payload; you'll have to add all this 
yourself is it so pleases you. 

Hope it clarifies. 
Aurelien




Le 27 janv. 2012 à 11:19, Hugo Daniel Meyer a écrit :

> Hello Aurélien.
> 
> Thanks for the clarification. Considering what you've mentioned i will have 
> to make some adaptations, because to me, every single message has to be 
> logged. So, a sender not only will be sending messages to the receiver, but 
> also to an event logger. Is there any considerations that i've to take into 
> account when modifying the code?. My initial idea is to use the el_comm with 
> a group of event loggers (because every node uses a different event logger in 
> my approach), and then send the messages to them as you do when using 
> MPI_ANY_SOURCE. 
> 
> Thanks for your help.
> 
> Hugo Meyer
> 
> 2012/1/27 Aurélien Bouteiller 
> Hugo,
> 
> Your program does not have non-deterministic events. Therefore, there are no 
> events to log. If you add MPI_ANY_SOURCE, you should see this code being 
> called. Please contact me again if you need more help.
> 
> Aurelien
> 
> 
> Le 27 janv. 2012 à 10:21, Hugo Daniel Meyer a écrit :
> 
> > Hello @ll.
> >
> > George, i'm using some pieces of the pessimist vprotocol. I've observed 
> > that when you do a send, you call vprotocol_receiver_event_flush and here 
> > the macro __VPROTOCOL_RECEIVER_SEND_BUFFER is called. I've noticed that 
> > here you try send a copy of the message to process 0 using the el_comm. 
> > This section of code is never executed, at least in my examples. So, the 
> > message is never sent to the Event Logger, am i correct with this?  I think 
> > that this is happening because the 
> > mca_vprotocol_pessimist.event_buffer_length is always 0.
> >
> > Is there something that i've got to turn on, or i will have to modify this 
> > behavior manually to connect and send messages to the EL?
> >
> > Thanks in advance.
> >
> > Hugo Meyer
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> --
> * Dr. Aurélien Bouteiller
> * Researcher at Innovative Computing Laboratory
> * University of Tennessee
> * 1122 Volunteer Boulevard, suite 350
> * Knoxville, TN 37996
> * 865 974 6321
> 
> 
> 
> 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
* Dr. Aurélien Bouteiller
* Researcher at Innovative Computing Laboratory
* University of Tennessee
* 1122 Volunteer Boulevard, suite 350
* Knoxville, TN 37996
* 865 974 6321







signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [OMPI devel] Pessimist Event Logger

2012-01-27 Thread Hugo Daniel Meyer
Hello Aurélien.

Thanks for the clarification. Considering what you've mentioned i will have
to make some adaptations, because to me, every single message has to be
logged. So, a sender not only will be sending messages to the receiver, but
also to an event logger. Is there any considerations that i've to take into
account when modifying the code?. My initial idea is to use the el_comm
with a group of event loggers (because every node uses a different event
logger in my approach), and then send the messages to them as you do when
using MPI_ANY_SOURCE.

Thanks for your help.

Hugo Meyer

2012/1/27 Aurélien Bouteiller 

> Hugo,
>
> Your program does not have non-deterministic events. Therefore, there are
> no events to log. If you add MPI_ANY_SOURCE, you should see this code being
> called. Please contact me again if you need more help.
>
> Aurelien
>
>
> Le 27 janv. 2012 à 10:21, Hugo Daniel Meyer a écrit :
>
> > Hello @ll.
> >
> > George, i'm using some pieces of the pessimist vprotocol. I've observed
> that when you do a send, you call vprotocol_receiver_event_flush and here
> the macro __VPROTOCOL_RECEIVER_SEND_BUFFER is called. I've noticed that
> here you try send a copy of the message to process 0 using the el_comm.
> This section of code is never executed, at least in my examples. So, the
> message is never sent to the Event Logger, am i correct with this?  I think
> that this is happening because the
> mca_vprotocol_pessimist.event_buffer_length is always 0.
> >
> > Is there something that i've got to turn on, or i will have to modify
> this behavior manually to connect and send messages to the EL?
> >
> > Thanks in advance.
> >
> > Hugo Meyer
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> --
> * Dr. Aurélien Bouteiller
> * Researcher at Innovative Computing Laboratory
> * University of Tennessee
> * 1122 Volunteer Boulevard, suite 350
> * Knoxville, TN 37996
> * 865 974 6321
>
>
>
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


Re: [OMPI devel] Pessimist Event Logger

2012-01-27 Thread Aurélien Bouteiller
Hugo, 

Your program does not have non-deterministic events. Therefore, there are no 
events to log. If you add MPI_ANY_SOURCE, you should see this code being 
called. Please contact me again if you need more help.

Aurelien


Le 27 janv. 2012 à 10:21, Hugo Daniel Meyer a écrit :

> Hello @ll.
> 
> George, i'm using some pieces of the pessimist vprotocol. I've observed that 
> when you do a send, you call vprotocol_receiver_event_flush and here the 
> macro __VPROTOCOL_RECEIVER_SEND_BUFFER is called. I've noticed that here you 
> try send a copy of the message to process 0 using the el_comm. This section 
> of code is never executed, at least in my examples. So, the message is never 
> sent to the Event Logger, am i correct with this?  I think that this is 
> happening because the mca_vprotocol_pessimist.event_buffer_length is always 0.
> 
> Is there something that i've got to turn on, or i will have to modify this 
> behavior manually to connect and send messages to the EL?
> 
> Thanks in advance.
> 
> Hugo Meyer
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
* Dr. Aurélien Bouteiller
* Researcher at Innovative Computing Laboratory
* University of Tennessee
* 1122 Volunteer Boulevard, suite 350
* Knoxville, TN 37996
* 865 974 6321







signature.asc
Description: Message signed with OpenPGP using GPGMail


[OMPI devel] Pessimist Event Logger

2012-01-27 Thread Hugo Daniel Meyer
Hello @ll.

George, i'm using some pieces of the pessimist vprotocol. I've observed
that when you do a send, you call vprotocol_receiver_event_flush and here
the macro *__VPROTOCOL_RECEIVER_SEND_BUFFER* is called. I've noticed that
here you try send a copy of the message to process 0 using the el_comm.
This section of code is never executed, at least in my examples. So, the
message is never sent to the Event Logger, am i correct with this?  I think
that this is happening because the *
mca_vprotocol_pessimist.event_buffer_length* is always 0.

Is there something that i've got to turn on, or i will have to modify this
behavior manually to connect and send messages to the EL?

Thanks in advance.

Hugo Meyer