[OMPI users] Bug in 1.3 nightly

2008-12-16 Thread Gabriele Fatigati
Dear OpenMPI developers,
trying to compile 1.3 nightly version , i get the follow error:

../../../orte/.libs/libopen-rte.so: undefined reference to `ORTE_NAME_PRINT'
../../../orte/.libs/libopen-rte.so: undefined reference to `ORTE_JOBID_PRINT'


The version affected are:

openmpi-1.3rc3r20130
openmpi-1.3rc3r20107
openmpi-1.3rc3r20092
openmpi-1.3rc2r20084

Thanks in advance.


-- 
Ing. Gabriele Fatigati

Parallel programmer

CINECA Systems & Tecnologies Department

Supercomputing Group

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.itTel:   +39 051 6171722

g.fatig...@cineca.it


Re: [OMPI users] Bug in 1.3 nightly

2008-12-16 Thread Lenny Verkhovsky
I didn't see any errors on 1.3rc3r20130, I am running mtt nightly
and it seems to be fine on x86-64 Centos5.

On Tue, Dec 16, 2008 at 10:27 AM, Gabriele Fatigati
 wrote:
> Dear OpenMPI developers,
> trying to compile 1.3 nightly version , i get the follow error:
>
> ../../../orte/.libs/libopen-rte.so: undefined reference to `ORTE_NAME_PRINT'
> ../../../orte/.libs/libopen-rte.so: undefined reference to `ORTE_JOBID_PRINT'
>
>
> The version affected are:
>
> openmpi-1.3rc3r20130
> openmpi-1.3rc3r20107
> openmpi-1.3rc3r20092
> openmpi-1.3rc2r20084
>
> Thanks in advance.
>
>
> --
> Ing. Gabriele Fatigati
>
> Parallel programmer
>
> CINECA Systems & Tecnologies Department
>
> Supercomputing Group
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.itTel:   +39 051 6171722
>
> g.fatig...@cineca.it
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Onesided + derived datatypes

2008-12-16 Thread George Bosilca
This issue should be fixed starting from revision r20134. The patch  
waits to be tested on SPARC64 before being pushed in the next 1.2 and  
1.3 releases.


  Thanks for the test application,
george.

On Dec 13, 2008, at 11:58 , George Bosilca wrote:

No. It fixes an issue when correctly rebuilding (i.e. with the real  
displacements) the data-type on the remote side, but it didn't fix  
the wrong values problem.


 george.

On Dec 13, 2008, at 07:59 , Jeff Squyres wrote:

George -- you had a commit after this (r20123) -- did that fix the  
problem?



On Dec 12, 2008, at 8:14 PM, George Bosilca wrote:


Dorian,

I looked into this again. So far I can confirm that the datatype  
is correctly created, and always contain the correct values  
(internally). If instead of one sided you use send/recv then the  
output is exactly what you expect. With the one sided there are  
several strange things. What I can say so far is that everything  
works fine, except when the block indexed datatype is used as the  
remote datatype in the MPI_Put operation. In this case the remote  
memory is not modified.


george.


On Dec 12, 2008, at 08:20 , Dorian Krause wrote:


Hi again.

I adapted my testing program by overwriting the window buffer
complete with 1. This allows me to see at which places OpenMPI  
writes.

The result is:

*** -DO1=1 -DV1=1 *** (displ 3,2,1,0 ,  
MPI_Type_create_indexed_block)

mem[0] = {  0.,  0.,  0.}
mem[1] = {  0.,  0.,  0.}
mem[2] = {  0.,  0.,  0.}
mem[3] = { nan, nan, nan}
mem[4] = { nan, nan, nan}
mem[5] = { nan, nan, nan}
mem[6] = { nan, nan, nan}
mem[7] = { nan, nan, nan}
mem[8] = { nan, nan, nan}
mem[9] = { nan, nan, nan}
*** -DO1=1 -DV2=1 *** MPI_Type_contiguous(4, mpi_double3, &mpit)
mem[0] = {  0.,  1.,  2.}
mem[1] = {  3.,  4.,  5.}
mem[2] = {  6.,  7.,  8.}
mem[3] = {  9., 10., 11.}
mem[4] = { nan, nan, nan}
mem[5] = { nan, nan, nan}
mem[6] = { nan, nan, nan}
mem[7] = { nan, nan, nan}
mem[8] = { nan, nan, nan}
mem[9] = { nan, nan, nan}
*** -DO2=1 -DV1=1 *** (displ 0,1,2,3 ,  
MPI_Type_create_indexed_block)

mem[0] = {  0.,  0.,  0.}
mem[1] = {  0.,  0.,  0.}
mem[2] = {  0.,  0.,  0.}
mem[3] = {  0.,  0.,  0.}
mem[4] = { nan, nan, nan}
mem[5] = { nan, nan, nan}
mem[6] = { nan, nan, nan}
mem[7] = { nan, nan, nan}
mem[8] = { nan, nan, nan}
mem[9] = { nan, nan, nan}
*** -DO2=1 -DV2=1 *** MPI_Type_contiguous(4, mpi_double3, &mpit)
mem[0] = {  0.,  1.,  2.}
mem[1] = {  3.,  4.,  5.}
mem[2] = {  6.,  7.,  8.}
mem[3] = {  9., 10., 11.}
mem[4] = { nan, nan, nan}
mem[5] = { nan, nan, nan}
mem[6] = { nan, nan, nan}
mem[7] = { nan, nan, nan}
mem[8] = { nan, nan, nan}
mem[9] = { nan, nan, nan}

Note that for the reversed ordering (3,2,1,0) only 3 lines are  
written. If I use displacements 3,2,1,8

I get

*** -DO1=1 -DV1=1 ***
mem[0] = {  0.,  0.,  0.}
mem[1] = {  0.,  0.,  0.}
mem[2] = {  0.,  0.,  0.}
mem[3] = { nan, nan, nan}
mem[4] = { nan, nan, nan}
mem[5] = { nan, nan, nan}
mem[6] = { nan, nan, nan}
mem[7] = { nan, nan, nan}
mem[8] = {  0.,  0.,  0.}
mem[9] = { nan, nan, nan}

but 3,2,8,1 yields

*** -DO1=1 -DV1=1 ***
mem[0] = {  0.,  0.,  0.}
mem[1] = {  0.,  0.,  0.}
mem[2] = {  0.,  0.,  0.}
mem[3] = { nan, nan, nan}
mem[4] = { nan, nan, nan}
mem[5] = { nan, nan, nan}
mem[6] = { nan, nan, nan}
mem[7] = { nan, nan, nan}
mem[8] = { nan, nan, nan}
mem[9] = { nan, nan, nan}

Dorian



-Ursprüngliche Nachricht-
Von: "Dorian Krause" 
Gesendet: 12.12.08 13:49:25
An: Open MPI Users 
Betreff: Re: [OMPI users] Onesided + derived datatypes




Thanks George (and Brian :)).

The MPI_Put error is gone. Did you take a look at the problem
that with the block_indexed type the PUT doesn't work? I'm
still getting the following output (V1 corresponds to the datatype
created with MPI_Type_create_indexed_block while the V2 type
is created with MPI_Type_contiguous, the ordering doesn't care  
anymore after

your fix) which confuses me
because I remember that (on one machine) MPI_Put with  
MPI_Type_create_indexed
worked until the invalid datatype error showed up (after a  
couple of timesteps).


*** -DO1=1 -DV1=1 ***
mem[0] = {  0.,  0.,  0.}
mem[1] = {  0.,  0.,  0.}
mem[2] = {  0.,  0.,  0.}
mem[3] = {  0.,  0.,  0.}
mem[4] = {  0.,  0

Re: [OMPI users] ompi-checkpoint is hanging

2008-12-16 Thread Josh Hursey

Matthias,

I think that the patch attached to the ticket below should address  
your issue:

 https://svn.open-mpi.org/trac/ompi/ticket/1619

I was able to reproduce this problem fairly reliably with a particular  
benchmark, on a particular configuration and very frequent  
checkpoints. With this patch I was not able to reproduce the problem,  
so I think this fixes the problem.


In the process of tracking this bug, I believe that there is a problem  
with the way the checkpoint/restart coordination component handles  
MPI_ANY_SOURCE and MPI_ANY_TAG. I'll pursue a fix for these cases, but  
it will be much more involved than the one currently attached to the  
ticket.


Let me know if this patch fixes the problem that you are seeing.

Thank you for your patience and the bug report,
Josh

On Oct 31, 2008, at 9:49 AM, Matthias Hovestadt wrote:


Hi!

I'll work on a patch, and let you know when it is ready.  
Unfortunately it probably won't be for a couple weeks. :(


Ok, thanks a lot for letting me know. In three weeks we'll
have a booth at ICT
(http://ec.europa.eu/information_society/events/ict/2008)
where we plan to showcase fault tolerance mechanisms, having
OMPI as major checkpointing component. I think I will use the
time until ICT for finding a workaround for this issue... :-)


Best,
Matthias
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users