Adding some more context.

When trying to use the event logger (by using MPI_ANY_SOURCE) i get this
error:

[clus9:28158] defining message event: ../../orte/runtime/orte_data_server.c
414
[clus9:28158] [[56904,0],0] data server: lookup on service
ompi_ft_event_logger[0]
[clus9:28158] [[56904,0],0] data server: service ompi_ft_event_logger[0]
not found
[clus5:7310] *** An error occurred in
../../../../../ompi/mca/vprotocol/pessimist/vprotocol_pessimist_eventlog.h:
failed to connect to an Event Logger
[clus5:7310] *** on communicator MPI_COMM_NULL
[clus5:7310] *** MPI_ERR_INTERN: internal error
[clus5:7310] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort


The event_logger is not found, and of course the connection is not made.
The service ompi_ft_event_logger is not defined apparently.

Thanks for the help.

Hugo

2012/1/31 Hugo Daniel Meyer <meyer.h...@gmail.com>

> Hello again.
>
> I've found where the connection with the event logger takes places. I've
> some doubts about the next section of code:
>
> *rc = ompi_dpm.connect_accept(MPI_COMM_SELF, 0, port, true, el_comm);*
>
> *    if(OMPI_SUCCESS != rc) {*
>
> *        ORTE_ERROR_LOG(rc);*
>
> *    }*
>
> *    /* Send Rank, receive max buffer size and max_clock back */*
>
> *    MPI_Comm_rank(MPI_COMM_WORLD, &rank);*
>
> *    rc = mca_pml_v.host_pml.pml_send(&rank, 1, MPI_INTEGER, 0,*
>
> *
> VPROTOCOL_PESSIMIST_EVENTLOG_NEW_CLIENT_CMD,*
>
> *                                     MCA_PML_BASE_SEND_STANDARD,*
>
> *                                     mca_vprotocol_receiver.el_comm);*
>
> *    if(OPAL_UNLIKELY(MPI_SUCCESS != rc))*
>
> *        OMPI_ERRHANDLER_INVOKE(mca_vprotocol_receiver.el_comm, rc,*
>
> *                               __FILE__ ": failed sending event logger
> handshake");*
>
> *    rc = mca_pml_v.host_pml.pml_recv(&connect_info, 2,
> MPI_UNSIGNED_LONG_LONG,*
>
> *                                     0,
> VPROTOCOL_PESSIMIST_EVENTLOG_NEW_CLIENT_CMD,*
>
> *                                     mca_vprotocol_receiver.el_comm,
> MPI_STATUS_IGNORE);*
>
> *    if(OPAL_UNLIKELY(MPI_SUCCESS != rc))
>  \*
>
> *        OMPI_ERRHANDLER_INVOKE(mca_vprotocol_receiver.el_comm, rc,
> \*
>
> *                               __FILE__ ": failed receiving event logger
> handshake");*
>
>
> I understand that you make a connection using the dpm framework between
> the process 0 (the logger) and yourself (MPI_COMM_SELF). But then, you send
> and receive messages with pml. My question is: ¿Where is posted the recv of
> the event_logger? I didn't find where in the code the event_logger receives
> the rank, and answer the handshake.
>
> Thanks for your help.
>
> Hugo Meyer
>
> 2012/1/30 Hugo Daniel Meyer <meyer.h...@gmail.com>
>
> Hello Aurelien.
>
> 2012/1/27 Aurélien Bouteiller <boute...@eecs.utk.edu>
>
> Hugo,
>
> It seems you want to implement some sort of remote pessimistic logging -a
> la MPICH-V1- ?
> MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes -- George
> Bosilca, Aurélien Bouteiller, Franck Cappello, Samir Djilali, Gilles Fédak,
> Cécile Germain, Thomas Hérault, Pierre Lemarinier, Oleg Lodygensky,
> Frédéric Magniette, Vincent Néri, Anton Selikhov -- In proceedings of The
> IEEE/ACM SC2002 Conference, Baltimore USA, November 2002
>
>   We could say that is similar because i use a distributed logging
> mechanism, but is a little diferent because my Memory Channels and
> Checkpoint Servers are the processing nodes, i don't have specials nodes to
> take care of the message log and checkpoints.
>
>
> In the PML-V, unlike older designs, the payload of messages and the
> non-deterministic events follow a different path. The payload of messages
> is logged on the sender's volatile memory, while the non-deterministic
> events are sent to a stable event logger, before allowing the process to
> impact the state of others (the code you have found in the previous email).
> The best depiction of this distinction can be read in this paper
> @inproceedings{DBLP:conf/europar/BouteillerHBD11,
>  author    = {Aurelien Bouteiller and
>               Thomas H{\'e}rault and
>               George Bosilca and
>               Jack J. Dongarra},
>  title     = {Correlated Set Coordination in Fault Tolerant Message Logging
>               Protocols},
>  booktitle = {Euro-Par 2011 Parallel Processing - 17th International
> Conference, Proceedings, Part II},
>  month         = {September},
>  year      = {2011},
>  pages     = {51-64},
>  publisher = {Springer},
>  series    = {Lecture Notes in Computer Science},
>  volume    = {6853},
>  year      = {2011},
>  isbn      = {978-3-642-23396-8},
>  doi       = {http://dx.doi.org/10.1007/978-3-642-23397-5_6},
>
>  I will take a look to this paper to clarify this distinction.
>
>
>
>
>  If you intend to store both payload and message log on a remote node, I
> suggest you look at the "sender-based" hooks, as this is where the message
> payload is managed, and adapt from here. The event loggers can already
> manage a subset only of the processes (if you launch as many EL as
> processes, you get a 1-1 mapping), but they never handle message payload;
> you'll have to add all this yourself is it so pleases you.
>
>   Yes, i need to store every message, not only the non-deterministics
> one. In my case every node is an event logger. Let's say that i have 16
> processes in four nodes (four by node), so the messages received by all
> processes residing in node0 need to be stored in node1, and the received
> messages received by all processes residing in node1, need to be stored in
> node2, and so on.
>
> If i understand correctly, i have to modify the behavior in
> ompi/mca/vprotocol/pessimist, to manage the message payload. And another
> question, is there a way to launch ELs in every node? or i will have to
> modify this too?
>
> Thanks a lot for your help Aurélien.
>
> Hugo
>
>
>
> Le 27 janv. 2012 à 11:19, Hugo Daniel Meyer a écrit :
>
> > Hello Aurélien.
> >
> > Thanks for the clarification. Considering what you've mentioned i will
> have to make some adaptations, because to me, every single message has to
> be logged. So, a sender not only will be sending messages to the receiver,
> but also to an event logger. Is there any considerations that i've to take
> into account when modifying the code?. My initial idea is to use the
> el_comm with a group of event loggers (because every node uses a different
> event logger in my approach), and then send the messages to them as you do
> when using MPI_ANY_SOURCE.
> >
> > Thanks for your help.
> >
> > Hugo Meyer
> >
> > 2012/1/27 Aurélien Bouteiller <boute...@eecs.utk.edu>
> > Hugo,
> >
> > Your program does not have non-deterministic events. Therefore, there
> are no events to log. If you add MPI_ANY_SOURCE, you should see this code
> being called. Please contact me again if you need more help.
> >
> > Aurelien
> >
> >
> > Le 27 janv. 2012 à 10:21, Hugo Daniel Meyer a écrit :
> >
> > > Hello @ll.
> > >
> > > George, i'm using some pieces of the pessimist vprotocol. I've
> observed that when you do a send, you call vprotocol_receiver_event_flush
> and here the macro __VPROTOCOL_RECEIVER_SEND_BUFFER is called. I've noticed
> that here you try send a copy of the message to process 0 using the
> el_comm. This section of code is never executed, at least in my examples.
> So, the message is never sent to the Event Logger, am i correct with this?
>  I think that this is happening because the
> mca_vprotocol_pessimist.event_buffer_length is always 0.
> > >
> > > Is there something that i've got to turn on, or i will have to modify
> this behavior manually to connect and send messages to the EL?
> > >
> > > Thanks in advance.
> > >
> > > Hugo Meyer
> > > _______________________________________________
> > > devel mailing list
> > > de...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> > --
> > * Dr. Aurélien Bouteiller
> > * Researcher at Innovative Computing Laboratory
> > * University of Tennessee
> > * 1122 Volunteer Boulevard, suite 350
> > * Knoxville, TN 37996
> > * 865 974 6321
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> --
> * Dr. Aurélien Bouteiller
> * Researcher at Innovative Computing Laboratory
> * University of Tennessee
> * 1122 Volunteer Boulevard, suite 350
> * Knoxville, TN 37996
> * 865 974 6321
>
>
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>

Reply via email to