Hi,

at least for the specific test program I used, the negative values for
the peer attribute disappeared after George's modifications in 20844.

One remark: after installation, I had to remove the '#include
"ompi_config.h"' line  in the "include/peruse.h" header to get PERUSE
applications to compile. Otherwise I got a missing header error message
for ompi_config.h. 

Regards,
Kiril


On Mon, 2009-03-23 at 16:34 -0400, George Bosilca wrote:
> You are absolutely right, the peer should never be set to -1 on any of  
> the PERUSE callbacks. I checked the code this morning and figure out  
> what was the problem. We report the peer and the tag attached to a  
> request before setting the right values (some code moved around). I  
> submitted a patch and created a "move request" to have this correction  
> as soon as possible on one of our stable releases. The move request  
> can be followed using our TRAC system and the following link 
> (https://svn.open-mpi.org/trac/ompi/ticket/1845 
> ). If you want to play with this change please update your Open MPI  
> installation to a nightly build or a fresh checkout from the SVN with  
> at least revision 20844 (a nightly including this change will be  
> posted on our website tomorrow morning).
> 
>    Thanks,
>      george.
> 
> On Mar 23, 2009, at 13:23 , Samuel K. Gutierrez wrote:
> 
> > Hi Kiril,
> >
> > Appreciate the quick response.
> >
> >> Hi Samuel,
> >>
> >> On Sat, 21 Mar 2009 18:18:54 -0600 (MDT)
> >>  "Samuel K. Gutierrez" <sam...@lanl.gov> wrote:
> >>> Hi All,
> >>>
> >>> I'm writing a simple profiling library which utilizes
> >>> PERUSE.  My callback
> >>
> >> So am I :)
> >>
> >>> function counts communication events (see example code
> >>> below).  I noticed
> >>> that in OMPI v1.3 spec->peer is sometimes a negative
> >>> value (OMPI v1.2.6
> >>> did not exhibit this behavior).  I added some boundary
> >>> checks, but it
> >>> seems as if this is a bug?  I hope I'm not missing
> >>> something...
> >>
> >> It took me quite some time to reproduce the error - I also
> >
> > Sorry about that - I should have provided more information.
> >
> >> got peer value "-1" for the Peruse peruse_comm_spec_t
> >> struct. I only managed to reproduce this with
> >> communication of a process with itself, which is an
> >> unusual scenario. Anyway, for all the tests I did, the
> >> error happened only when:
> >>
> >> -a process communicates with itself
> >> -the MPI receive call is made
> >> -the Peruse event "PERUSE_COMM_MSG_REMOVE_FROM_UNEX_Q" is
> >> triggered
> >
> > That's interesting... Nice work!
> >
> >>
> >>
> >> The file ompi/mca/pml/ob1/pml_ob1_recvreq.c seems to be
> >> the place where the above event is called with a wrong
> >> value of the peer attribute.
> >>
> >> I will let you know if I find something.
> >
> > I will also take a look.
> >
> >>
> >>
> >> Best regards,
> >> Kiril
> >>
> >>>
> >>> The peruse test provided in the OMPI v1.3 source
> >>> exhibits similar behavior:
> >>> mpirun -np 2 ./mpi_peruse | grep peer:-1
> >>>
> >>> int callback(peruse_event_h event_h, MPI_Aint unique_id,
> >>> peruse_comm_spec_t *spec, void *param) {
> >>>   if (spec->peer == rank) {
> >>>       return MPI_SUCCESS;
> >>>   }
> >>>   rrCounts[spec->peer]++;
> >>>   return MPI_SUCCESS;
> >>> }
> >>>
> >>>
> >>> Any insight is greatly appreciated.
> >>>
> >>> Thanks,
> >>>
> >>> Samuel K. Gutierrez
> >>> _______________________________________________
> >>> devel mailing list
> >>> de...@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>
> >>
> >
> > Appreciate the help,
> >
> > Samuel K. Gutierrez
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to