Re: [OMPI users] Open MPI 1.2.4 verbosity w.r.t. osc pt2pt

2007-12-13 Thread Dirk Eddelbuettel

On 13 December 2007 at 13:17, Lisandro Dalcin wrote:
| Perhaps I was not clear enough. There are many public ways of
| importing modules in Python. Modules can came mainly from two sources:
| pure Python code, or C compiled code. In the later case (called
| extension modules), they are normally a shared object
| (.so,.dylib,.dll) exporting only a simbol: 'void
| init(void)', this functions bootstraps the initialization
| of the extension module. What is somewhat hidden is the actual way
| Python loads this shared object and calls the init function. I believe
| the reason for this is related to de gory details of dlopen()ing in
| different OS's/archs combination.
| 
| Anyway, Python enables you to temporarily change the flags to be used
| in dlopen() calls, what is not (currently) so easy is to get the value
| of RTLD_GLOBAL in a portable way.
| 
| Jeff, in short: I believe I solved (with the help of Brian) the issue
| in the context of Python and the platforms we have access to. So, from
| our side, things are working.
| 
| However, I believe this issue is going to cause trouble to any other
| guy trying  to load shared objects using MPI. This makes me think that
| Open MPI should be in charge or solving this, but I understand the
| situation is complex and all us are usually out of time. I'll try to
| re-read all the stuff to better understand the beast.

Just to recap: when we tried to address the same issue for the 'Rmpi' package
for GNU R, it was actually the hint in FAQ for Open MPI itself (!!) that lead
Hao (ie Rmpi's author) to the use of the RTLD_GLOBAL flag.  So what Lisandro
is asking for is already (at least somewhat) addressed and documented at the
Open MPI site.

Anyway, great to hear that things work for Python too. It's always good to
have more tools.

Dirk

-- 
Three out of two people have difficulties with fractions.


Re: [OMPI users] Open MPI 1.2.4 verbosity w.r.t. osc pt2pt

2007-12-13 Thread Jeff Squyres

On Dec 13, 2007, at 6:01 PM, Brian W. Barrett wrote:


But it is easy for OPen MPI to figure out whether it's statically or
dynamically linked, as libtool compiles the code twice if building  
both

static and shared and you could poke at #defines to figure out what's
going on -- easy enough to deal with.  I just don't think we should be
doing so :).



I agree.

But FWIW: how do you do this in the "--enable-static --enable-shared"  
case?  I couldn't think of how to do a different #define for two  
different compiles of the same source file without significant hackery.


--
Jeff Squyres
Cisco Systems


[OMPI users] ORTE_ERROR_LOG: Data unpack had inadequate space in file gpr_replica_cmd_processor.c at line 361

2007-12-13 Thread Qiang Xu
I installed OpenMPI-1.2.4 on our cluster.
Here is the compute node infor

[qiang@compute-0-1 ~]$ uname -a
Linux compute-0-1.local 2.6.9-42.0.2.ELsmp #1 SMP Wed Aug 23 00:17:26 CDT 2006 
i686 i686 i386 GNU/Linux
[qiang@compute-0-1 bin]$ gcc -v
Reading specs from /usr/lib/gcc/i386-redhat-linux/3.4.6/specs
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man 
--infodir=/usr/share/info --enable-shared --enable-threads=posix 
--disable-checking --with-system-zlib --enable-__cxa_atexit 
--disable-libunwind-exceptions --enable-java-awt=gtk --host=i386-redhat-linux
Thread model: posix
gcc version 3.4.6 20060404 (Red Hat 3.4.6-8)

Then I compiled NAS bechmarks, got some warning but went through.
[qiang@compute-0-1 NPB2.3-MPI]$ make suite
make[1]: Entering directory `/home/qiang/NPB2.3/NPB2.3-MPI'
   =
   =  NAS Parallel Benchmarks 2.3  =
   =  MPI/F77/C=
   =

cd MG; make NPROCS=16 CLASS=B
make[2]: Entering directory `/home/qiang/NPB2.3/NPB2.3-MPI/MG'
make[3]: Entering directory `/home/qiang/NPB2.3/NPB2.3-MPI/sys'
cc -g  -o setparams setparams.c
make[3]: Leaving directory `/home/qiang/NPB2.3/NPB2.3-MPI/sys'
../sys/setparams mg 16 B
make.def modified. Rebuilding npbparams.h just in case
rm -f npbparams.h
../sys/setparams mg 16 B
mpif77 -c -I~/MyMPI/include  mg.f
mg.f: In subroutine `zran3':
mg.f:1001: warning:
 call mpi_allreduce(rnmu,ss,1,dp_type,
  1
mg.f:2115: (continued):
call mpi_allreduce(jg(0,i,1), jg_temp,4,MPI_INTEGER,
 2
Argument #1 of `mpi_allreduce' is one type at (2) but is some other type at (1) 
[info -f g77 M GLOBALS]
mg.f:1001: warning:
 call mpi_allreduce(rnmu,ss,1,dp_type,
  1
mg.f:2115: (continued):
call mpi_allreduce(jg(0,i,1), jg_temp,4,MPI_INTEGER,
 2
Argument #2 of `mpi_allreduce' is one type at (2) but is some other type at (1) 
[info -f g77 M GLOBALS]
mg.f:1001: warning:
 call mpi_allreduce(rnmu,ss,1,dp_type,
  1
mg.f:2139: (continued):
call mpi_allreduce(jg(0,i,0), jg_temp,4,MPI_INTEGER,
 2
Argument #1 of `mpi_allreduce' is one type at (2) but is some other type at (1) 
[info -f g77 M GLOBALS]
mg.f:1001: warning:
 call mpi_allreduce(rnmu,ss,1,dp_type,
  1
mg.f:2139: (continued):
call mpi_allreduce(jg(0,i,0), jg_temp,4,MPI_INTEGER,
 2
Argument #2 of `mpi_allreduce' is one type at (2) but is some other type at (1) 
[info -f g77 M GLOBALS]
cd ../common; mpif77 -c -I~/MyMPI/include  print_results.f
cd ../common; mpif77 -c -I~/MyMPI/include  randdp.f
cd ../common; mpif77 -c -I~/MyMPI/include  timers.f
mpif77  -o ../bin/mg.B.16 mg.o ../common/print_results.o ../common/randdp.o 
../common/timers.o -L~/MyMPI/lib -lmpi_f77
make[2]: Leaving directory `/home/qiang/NPB2.3/NPB2.3-MPI/MG'
make[1]: Leaving directory `/home/qiang/NPB2.3/NPB2.3-MPI'
make[1]: Entering directory `/home/qiang/NPB2.3/NPB2.3-MPI'

But when I tried to run it, I got the following error messages:
[qiang@compute-0-1 bin]$ mpirun -machinefile m8 -n 16 mg.C.16
[compute-0-1.local:11144] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate 
space in file dss/dss_unpack.c at line 90
[compute-0-1.local:11144] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate 
space in file gpr_replica_cmd_processor.c at line 361

I found some info on the mailling list, but it doesn't help for my case.
Could anyone give me some advice? Or I have to upgrade the GNU compiler?

Thanks.

Qiang


[OMPI users] Bad behavior in Allgatherv when a count is 0

2007-12-13 Thread Moreland, Kenneth
I have found that on rare occasion Allgatherv fails to pass the data to
all processes.  Given some magical combination of receive counts and
displacements, one or more processes are missing some or all of some
arrays in their receive buffer.  A necessary, but not sufficient,
condition seems to be that one of the receive counts is 0.  Beyond that
I have not figured out any real pattern, but the example program listed
below demonstrates the failure.  I have tried it on OpenMPI version
1.2.3 and 1.2.4; it fails on both.  However, it works fine with version
1.1.2, so the problem must have been introduced since then.

-Ken

     Kenneth Moreland
***  Sandia National Laboratories
***  
*** *** ***  email: kmo...@sandia.gov
**  ***  **  phone: (505) 844-8919
***  fax:   (505) 845-0833



#include 

#include 
#include 

int main(int argc, char **argv)
{
  int rank;
  int size;
  MPI_Comm smallComm;
  int senddata[5], recvdata[100];
  int lengths[3], offsets[3];
  int i, j;

  MPI_Init(&argc, &argv);

  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Comm_size(MPI_COMM_WORLD, &size);
  if (size != 3)
{
printf("Need 3 processes.");
MPI_Abort(MPI_COMM_WORLD, 1);
}

  for (i = 0; i < 100; i++) recvdata[i] = -1;
  for (i = 0; i < 5; i++) senddata[i] = rank*10 + i;
  lengths[0] = 5;  lengths[1] = 0;  lengths[2] = 5;
  offsets[0] = 3;  offsets[1] = 9;  offsets[2] = 10;
  MPI_Allgatherv(senddata, lengths[rank], MPI_INT,
 recvdata, lengths, offsets, MPI_INT, MPI_COMM_WORLD);

  for (i = 0; i < size; i++)
{
for (j = 0; j < lengths[i]; j++)
  {
  if (recvdata[offsets[i]+j] != 10*i+j)
{
printf("%d: Got bad data from rank %d, index %d: %d\n", rank, i,
j,
   recvdata[offsets[i]+j]);
break;
}
  }
}

  MPI_Finalize();

  return 0;
}





Re: [OMPI users] undefined reference to `pthread_atfork' (Lahey Fujitsu compiler AMD64)

2007-12-13 Thread Brian W. Barrett

On Wed, 12 Dec 2007, Alex Pletzer wrote:


I'm on a AMD64 box (Linux quartic.txcorp.com 2.6.19-1.2288.fc5 #1 SMP
Sat Feb 10 14:59:35 EST 2007 x86_64 x86_64 x86_64 GNU/Linux) and
compiled openmpi-1.2.4 using the Lahey-Fujitsu compiler (lfc). The
compilation of openmpi went fine.

 $ ../configure --enable-mpi-f90 --enable-mpi-f77 --enable-mpi-cxx
--prefix=/home/research/pletzer/local/x86_64/openmpi-1.2.4/ FC=lfc
F77=lfc FCFLAGS=-O2 FFLAGS=-O2 --disable-shared --enable-static

However, when compiling a test code with mpif90, I get the following error:





[pletzer@quartic test]$ mpif90 t.f90
Encountered 0 errors, 0 warnings in file t.f90.
/home/research/pletzer/local/x86_64/openmpi-1.2.4//lib/libopen-pal.a(lt1-malloc.o):
In function `ptmalloc_init':
malloc.c:(.text+0x4b71): undefined reference to `pthread_atfork'


Open MPI only supports statically linking an application when the 
--without-memory-manager option is given to Open MPI's configure.  You can 
build Open MPI statically (ie, make a libmpi.a) without that option, but 
you can not statically link an application (ie, use libc.a) without it.


Brian


Re: [OMPI users] Open MPI 1.2.4 verbosity w.r.t. osc pt2pt

2007-12-13 Thread Brian W. Barrett

On Thu, 13 Dec 2007, Jeff Squyres wrote:


Specifically: it would probably require some significant hackery in
the OMPI build process to put in a #define that indicates whether OMPI
is being built statically or not.  But the AM/LT process shields this
information from the build process by design (part of the issue is
that AM/LT allows both static and shared libraries to be built
simultaneously).  We'd then have to introduce some global symbol that
could be queried that is outside of the MPI interface.  Neither of
these things are attractive.  :-(


Well, if libmpi.a is static, then this whole conversation is pointless 
because you're not going to dlopen it in the first place.


But it is easy for OPen MPI to figure out whether it's statically or 
dynamically linked, as libtool compiles the code twice if building both 
static and shared and you could poke at #defines to figure out what's 
going on -- easy enough to deal with.  I just don't think we should be 
doing so :).



Brian


Re: [OMPI users] parpack with openmpi

2007-12-13 Thread Brock Palen
I solved the problem and the quote 'we have met the enemy and he is  
us' fits prefectly.


The reason was I had a stale object file laying around from when i  
used a different compiler.  Removing mpif.h as they are listed in the  
PARPACK ARmake.inc and recompiling worked.


Sorry for the red herring.

Brock Palen
Center for Advanced Computing
bro...@umich.edu
(734)936-1985


On Dec 12, 2007, at 7:17 PM, Brock Palen wrote:

Yes, the software came with its own.  And i removed it, mpif77  
takes care of not having mpif.h in the directory just as it should.


I should mention (sorry)  that the single, complex and double  
complex examples work.  only the double (real) examples fail.


Brock Palen
Center for Advanced Computing
bro...@umich.edu
(734)936-1985


On Dec 12, 2007, at 6:51 PM, Jeff Squyres wrote:


This *usually* happens when you include the mpif.h from a different
MPI implementation.  Can you check that?

On Dec 12, 2007, at 5:15 PM, Brock Palen wrote:


Has anyone ever built parpack (http://www.caam.rice.edu/software/
ARPACK/)  with openmpi?  It compiles but some of the examples give:

[nyx-login1.engin.umich.edu:12173] *** on communicator  
MPI_COMM_WORLD
[nyx-login1.engin.umich.edu:12173] *** MPI_ERR_TYPE: invalid  
datatype
[nyx-login1.engin.umich.edu:12173] *** MPI_ERRORS_ARE_FATAL  
(goodbye)

[nyx-login1.engin.umich.edu:12174] *** An error occurred in MPI_Recv
[nyx-login1.engin.umich.edu:12174] *** on communicator  
MPI_COMM_WORLD


I checked all the data types are:  MPI_DOUBLE_PRECISION  Im not sure
where to look next.

Brock Palen
Center for Advanced Computing
bro...@umich.edu
(734)936-1985


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users








Re: [OMPI users] error with Vprotocol pessimist

2007-12-13 Thread Aurelien Bouteiller
If you want to use the pessimist message logging you have to use the "- 
mca vprotocol pessimist" flag on your command line. This should work  
despite the bug because if I understand correctly, the issue you  
experience should occur only when fault tolerance is disabled.
I have troubles to reproduce the particular bug you are experiencing.  
What compiler and what architecture are you using ?


Aurelien
Le 13 déc. 07 à 07:58, Thomas Ropars a écrit :


I still have the same error after update (r16951).

I have the lib/openmpi/mca_pml_v.so file in my builld and the command
line I use is: mpirun -np 4 my_application

Thomas


Aurelien Bouteiller wrote:

I could reproduce and fix the bug. It will be corrected in trunk as
soon as the svn is online again. Thanks for reporting the problem.

Aurelien

Le 11 déc. 07 à 15:02, Aurelien Bouteiller a écrit :



I cannot reproduce the error. Please make sure you have the lib/
openmpi/mca_pml_v.so file in your build. If you don't, maybe you
forgot to run autogen.sh at the root of the trunk when you
removed .ompi_ignore.

If this does not fix the problem, please let me know your command  
line

options to mpirun.

Aurelien

Le 11 déc. 07 à 14:36, Aurelien Bouteiller a écrit :



Mmm, I'll investigate this today.

Aurelien
Le 11 déc. 07 à 08:46, Thomas Ropars a écrit :



Hi,

I've tried to test the message logging component vprotocol
pessimist.
(svn checkout revision 16926)
When I run an mpi application, I get the following error :

mca: base: component_find: unable to open vprotocol pessimist:
/local/openmpi/lib/openmpi/mca_vprotocol_pessimist.so: undefined
symbol:
pml_v_output (ignored)


Regards

Thomas
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Dr. Aurelien Bouteiller, Sr. Research Associate
Innovative Computing Laboratory - MPI group
+1 865 974 6321
1122 Volunteer Boulevard
Claxton Education Building Suite 350
Knoxville, TN 37996


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





Re: [OMPI users] Open MPI 1.2.4 verbosity w.r.t. osc pt2pt

2007-12-13 Thread Lisandro Dalcin
On 12/13/07, Jeff Squyres  wrote:
> On Dec 12, 2007, at 7:47 PM, Lisandro Dalcin wrote:
> Specifically: it would probably require some significant hackery in
> the OMPI build process to put in a #define that indicates whether OMPI
> is being built statically or not.  But the AM/LT process shields this
> information from the build process by design (part of the issue is
> that AM/LT allows both static and shared libraries to be built
> simultaneously).

I see.

> We'd then have to introduce some global symbol that
> could be queried that is outside of the MPI interface.  Neither of
> these things are attractive.  :-(

You are right, a globlal symbols is not attractive at all just for this.

> > AFAIK, Python does not. It uses specific, private code for this,
> > handling the loading of extension modules according to the OS's and
> > their idiosyncracies. However, Python enable users to change the flags
> > used for dlopen'ing your extension modules; the tricky part is to get
> > the correct values RTLD_GLOBAL in a portable way.
>
> That's somewhat surprising -- there's no public interfaces for modules
> to portably load sub-modules?  Bummer.

Perhaps I was not clear enough. There are many public ways of
importing modules in Python. Modules can came mainly from two sources:
pure Python code, or C compiled code. In the later case (called
extension modules), they are normally a shared object
(.so,.dylib,.dll) exporting only a simbol: 'void
init(void)', this functions bootstraps the initialization
of the extension module. What is somewhat hidden is the actual way
Python loads this shared object and calls the init function. I believe
the reason for this is related to de gory details of dlopen()ing in
different OS's/archs combination.

Anyway, Python enables you to temporarily change the flags to be used
in dlopen() calls, what is not (currently) so easy is to get the value
of RTLD_GLOBAL in a portable way.

Jeff, in short: I believe I solved (with the help of Brian) the issue
in the context of Python and the platforms we have access to. So, from
our side, things are working.

However, I believe this issue is going to cause trouble to any other
guy trying  to load shared objects using MPI. This makes me think that
Open MPI should be in charge or solving this, but I understand the
situation is complex and all us are usually out of time. I'll try to
re-read all the stuff to better understand the beast.

Regards,


-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



Re: [OMPI users] Problems with GATHERV on one process

2007-12-13 Thread Moreland, Kenneth
Excellent.  Thanks.

-Ken

> -Original Message-
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org]
On
> Behalf Of Jeff Squyres
> Sent: Thursday, December 13, 2007 6:02 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] Problems with GATHERV on one process
> 
> Correct.  Here's the original commit that fixed the problem:
> 
>  https://svn.open-mpi.org/trac/ompi/changeset/16360
> 
> And the commit to the v1.2 branch:
> 
>  https://svn.open-mpi.org/trac/ompi/changeset/16519
> 
> 
> On Dec 12, 2007, at 2:43 PM, Moreland, Kenneth wrote:
> 
> > Thanks Tim.  I've since noticed similar problems with MPI_Allgatherv
> > and
> > MPI_Scatterv.  I'm guessing they are all related.  Do you happen to
> > know
> > if those are being fixed as well?
> >
> > -Ken
> >
> >> -Original Message-
> >> From: users-boun...@open-mpi.org
[mailto:users-boun...@open-mpi.org]
> > On
> >> Behalf Of Tim Mattox
> >> Sent: Tuesday, December 11, 2007 3:34 PM
> >> To: Open MPI Users
> >> Subject: Re: [OMPI users] Problems with GATHERV on one process
> >>
> >> Hello Ken,
> >> This is a known bug, which is fixed in the upcoming 1.2.5 release.
> >> We
> >> expect 1.2.5
> >> to come out very soon.  We should have a new release candidate for
> > 1.2.5
> >> posted
> >> by tomorrow.
> >>
> >> See these tickets about the bug if you care to look:
> >> https://svn.open-mpi.org/trac/ompi/ticket/1166
> >> https://svn.open-mpi.org/trac/ompi/ticket/1157
> >>
> >> On Dec 11, 2007 2:48 PM, Moreland, Kenneth 
wrote:
> >>> I recently ran into a problem with GATHERV while running some
> > randomized
> >>> tests on my MPI code.  The problem seems to occur when running
> >>> MPI_Gatherv with a displacement on a communicator with a single
> > process.
> >>> The code listed below exercises this errant behavior.  I have
tried
> > it
> >>> on OpenMPI 1.1.2 and 1.2.4.
> >>>
> >>> Granted, this is not a situation that one would normally run into
in
> > a
> >>> real application, but I just wanted to check to make sure I was
not
> >>> doing anything wrong.
> >>>
> >>> -Ken
> >>>
> >>>
> >>>
> >>> #include 
> >>>
> >>> #include 
> >>> #include 
> >>>
> >>> int main(int argc, char **argv)
> >>> {
> >>>  int rank;
> >>>  MPI_Comm smallComm;
> >>>  int senddata[4], recvdata[4], length, offset;
> >>>
> >>>  MPI_Init(&argc, &argv);
> >>>
> >>>  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
> >>>
> >>>  // Split up into communicators of size 1.
> >>>  MPI_Comm_split(MPI_COMM_WORLD, rank, 0, &smallComm);
> >>>
> >>>  // Now try to do a gatherv.
> >>>  senddata[0] = 5; senddata[1] = 6; senddata[2] = 7; senddata[3] =
> > 8;
> >>>  recvdata[0] = 0; recvdata[1] = 0; recvdata[2] = 0; recvdata[3] =
> > 0;
> >>>  length = 3;
> >>>  offset = 1;
> >>>  MPI_Gatherv(senddata, length, MPI_INT,
> >>>  recvdata, &length, &offset, MPI_INT, 0, smallComm);
> >>>  if (senddata[0] != recvdata[offset])
> >>>{
> >>>printf("%d: %d != %d?\n", rank, senddata[0], recvdata[offset]);
> >>>}
> >>>  else
> >>>{
> >>>printf("%d: Everything OK.\n", rank);
> >>>}
> >>>
> >>>  return 0;
> >>> }
> >>>
> >>>     Kenneth Moreland
> >>>***  Sandia National Laboratories
> >>> ***
> >>> *** *** ***  email: kmo...@sandia.gov
> >>> **  ***  **  phone: (505) 844-8919
> >>>***  fax:   (505) 845-0833
> >>>
> >>>
> >>>
> >>> ___
> >>> users mailing list
> >>> us...@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>
> >>
> >> --
> >> Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
> >> tmat...@gmail.com || timat...@open-mpi.org
> >>I'm a bright... http://www.the-brights.net/
> >> ___
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> --
> Jeff Squyres
> Cisco Systems
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users





Re: [OMPI users] Compiling 1.2.4 using Intel Compiler 10.1.007 on Leopard

2007-12-13 Thread Jeff Squyres

On Dec 12, 2007, at 1:18 PM, Warner Yuen wrote:

It seems that the problems are partially the compilers fault, maybe  
the updated compilers didn't catch all the problems filed against  
the last release? Why else would I need to add the "-no-multibyte- 
chars" flag for pretty much everything that I build with ICC?  Also,  
its odd that I have to use /lib/cpp when using Intel ICC/ICPC  
whereas with GCC things just find their way correctly. Again, IFORT  
and GCC together seem fine.


I'm afraid that I'm not enough of an OS X expert to know the answers  
to these questions...  :-(


Lastly... not that I use these... but MPICH-2.1 and MPICH-1.2.7 for  
Myrinet built just fine.


Here are the output files:


I actually didn't see the error in there -- did you grab stdout and  
stderr when building?


--
Jeff Squyres
Cisco Systems


Re: [OMPI users] Open MPI 1.2.4 verbosity w.r.t. osc pt2pt

2007-12-13 Thread Jeff Squyres

On Dec 12, 2007, at 7:47 PM, Lisandro Dalcin wrote:


You should, yes.


OK, but now I realize that I cannot simply call libtool dlopen()
inconditionally, as libmpi.so could not exist in a static lib build.


Right.  Or it could be libmpi.dylib (OS X).  I don't know if other  
extensions exist out there.


However, in this case, it would be easy enough to just try a few named  
extensions (libmpi.) -- they'll either all fail or one of them  
will succeed.  But you would still need to tell if you're linked  
against libmpi.a or not -- dlopen'ing a shared library version when  
you already have a static version resident can cause problems (per the  
chart on the wiki).


Actually, regardless of who does the dlopen -- you or me -- we need to  
know this info (whether the linked-against libmpi was shared or  
static).  Hmm.  I can't think of a good way to do this off the top of  
my head.


After a little more thought, I think only the application can know  
this -- the application's build system can know whether it is linking  
against libmpi.a or libmpi.so and set some #define (or whatever) to  
know whether it needs to dlopen libmpi or not.  :-\


Specifically: it would probably require some significant hackery in  
the OMPI build process to put in a #define that indicates whether OMPI  
is being built statically or not.  But the AM/LT process shields this  
information from the build process by design (part of the issue is  
that AM/LT allows both static and shared libraries to be built  
simultaneously).  We'd then have to introduce some global symbol that  
could be queried that is outside of the MPI interface.  Neither of  
these things are attractive.  :-(



Also, see my later post: doesn't perl/python have some kind of
portable dlopen anyway?  They're opening your module...?


Sorry -- it looks like the post I was referring to got stuck in my  
outbox and didn't get sent until earlier this morning.



AFAIK, Python does not. It uses specific, private code for this,
handling the loading of extension modules according to the OS's and
their idiosyncracies. However, Python enable users to change the flags
used for dlopen'ing your extension modules; the tricky part is to get
the correct values RTLD_GLOBAL in a portable way.


That's somewhat surprising -- there's no public interfaces for modules  
to portably load sub-modules?  Bummer.



Is there any another way of setting a MCA parameter?

See http://www.open-mpi.org/faq/?category=tuning#setting-mca-params.


OK, it seems there isn't a programatic way. Anyway, putenv() should
not be a souce of portability problems.


No, we have no API for setting MCA params other than altering the  
environment.  Also, most MCA params are read during MPI_INIT and not  
re-checked later during the run (it would be a bad idea, for example,  
to check MCA param values during the critical performance path).


--
Jeff Squyres
Cisco Systems


Re: [OMPI users] Problems with GATHERV on one process

2007-12-13 Thread Jeff Squyres

Correct.  Here's the original commit that fixed the problem:

https://svn.open-mpi.org/trac/ompi/changeset/16360

And the commit to the v1.2 branch:

https://svn.open-mpi.org/trac/ompi/changeset/16519


On Dec 12, 2007, at 2:43 PM, Moreland, Kenneth wrote:

Thanks Tim.  I've since noticed similar problems with MPI_Allgatherv  
and
MPI_Scatterv.  I'm guessing they are all related.  Do you happen to  
know

if those are being fixed as well?

-Ken


-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org]

On

Behalf Of Tim Mattox
Sent: Tuesday, December 11, 2007 3:34 PM
To: Open MPI Users
Subject: Re: [OMPI users] Problems with GATHERV on one process

Hello Ken,
This is a known bug, which is fixed in the upcoming 1.2.5 release.   
We

expect 1.2.5
to come out very soon.  We should have a new release candidate for

1.2.5

posted
by tomorrow.

See these tickets about the bug if you care to look:
https://svn.open-mpi.org/trac/ompi/ticket/1166
https://svn.open-mpi.org/trac/ompi/ticket/1157

On Dec 11, 2007 2:48 PM, Moreland, Kenneth  wrote:

I recently ran into a problem with GATHERV while running some

randomized

tests on my MPI code.  The problem seems to occur when running
MPI_Gatherv with a displacement on a communicator with a single

process.

The code listed below exercises this errant behavior.  I have tried

it

on OpenMPI 1.1.2 and 1.2.4.

Granted, this is not a situation that one would normally run into in

a

real application, but I just wanted to check to make sure I was not
doing anything wrong.

-Ken



#include 

#include 
#include 

int main(int argc, char **argv)
{
 int rank;
 MPI_Comm smallComm;
 int senddata[4], recvdata[4], length, offset;

 MPI_Init(&argc, &argv);

 MPI_Comm_rank(MPI_COMM_WORLD, &rank);

 // Split up into communicators of size 1.
 MPI_Comm_split(MPI_COMM_WORLD, rank, 0, &smallComm);

 // Now try to do a gatherv.
 senddata[0] = 5; senddata[1] = 6; senddata[2] = 7; senddata[3] =

8;

 recvdata[0] = 0; recvdata[1] = 0; recvdata[2] = 0; recvdata[3] =

0;

 length = 3;
 offset = 1;
 MPI_Gatherv(senddata, length, MPI_INT,
 recvdata, &length, &offset, MPI_INT, 0, smallComm);
 if (senddata[0] != recvdata[offset])
   {
   printf("%d: %d != %d?\n", rank, senddata[0], recvdata[offset]);
   }
 else
   {
   printf("%d: Everything OK.\n", rank);
   }

 return 0;
}

    Kenneth Moreland
   ***  Sandia National Laboratories
***
*** *** ***  email: kmo...@sandia.gov
**  ***  **  phone: (505) 844-8919
   ***  fax:   (505) 845-0833



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
tmat...@gmail.com || timat...@open-mpi.org
   I'm a bright... http://www.the-brights.net/
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems


Re: [OMPI users] error with Vprotocol pessimist

2007-12-13 Thread Thomas Ropars

I still have the same error after update (r16951).

I have the lib/openmpi/mca_pml_v.so file in my builld and the command 
line I use is: mpirun -np 4 my_application


Thomas


Aurelien Bouteiller wrote:
I could reproduce and fix the bug. It will be corrected in trunk as  
soon as the svn is online again. Thanks for reporting the problem.


Aurelien

Le 11 déc. 07 à 15:02, Aurelien Bouteiller a écrit :

  

I cannot reproduce the error. Please make sure you have the lib/
openmpi/mca_pml_v.so file in your build. If you don't, maybe you
forgot to run autogen.sh at the root of the trunk when you
removed .ompi_ignore.

If this does not fix the problem, please let me know your command line
options to mpirun.

Aurelien

Le 11 déc. 07 à 14:36, Aurelien Bouteiller a écrit :



Mmm, I'll investigate this today.

Aurelien
Le 11 déc. 07 à 08:46, Thomas Ropars a écrit :

  

Hi,

I've tried to test the message logging component vprotocol  
pessimist.

(svn checkout revision 16926)
When I run an mpi application, I get the following error :

mca: base: component_find: unable to open vprotocol pessimist:
/local/openmpi/lib/openmpi/mca_vprotocol_pessimist.so: undefined
symbol:
pml_v_output (ignored)


Regards

Thomas
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Dr. Aurelien Bouteiller, Sr. Research Associate
Innovative Computing Laboratory - MPI group
+1 865 974 6321
1122 Volunteer Boulevard
Claxton Education Building Suite 350
Knoxville, TN 37996


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
  

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
  


Re: [OMPI users] Open MPI 1.2.4 verbosity w.r.t. osc pt2pt

2007-12-13 Thread Jeff Squyres
Brian raised a good point -- your project must already have a portable  
solution for dlopen() since it's loading your plugin.


Can you not use that?



On Dec 12, 2007, at 8:40 AM, Jeff Squyres wrote:


On Dec 11, 2007, at 9:08 AM, Lisandro Dalcin wrote:


(for a nicely-formatted refresher of the issues, check out 
https://svn.open-mpi.org/trac/ompi/wiki/Linkers)


Sorry for the late response...

I've finally 'solved' this issue by using RTLD_GLOBAL for loading the
Python extension module that actually calls MPI_Init(). However, I'm
not completelly sure if my hackery is completelly portable.

Looking briefly at the end of the link to the wiki page, you say that
if the explicit linking to libmpi on componets is removed, then
dlopen() has to be explicitelly called.


Correct.


Well, this would be a mayor headhace for me, because portability
issues. Please note that I've developed mpi4py on a rather old 32  
bits

linux box, but it works in many different plataforms and OS's. I do
really do not have the time of testing and figure out how to
appropriatelly call dlopen() in platforms/OS's that I even do not  
have

access!!


Yes, this is problematic; dlopen is fun on all the various OS's...

FWIW: we use the Libtool DL library for this kind of portability; OMPI
itself doesn't have all the logic for the different OS loaders.


Anyway, perhaps OpenMPI could provide an extension: a function call,
let say 'ompi_load_dso()' or something like that, that can be called
before MPI_Init() for setting-up the monster. What to you think about
this? Would it be hard for you?



(after much thinking...) Perhaps a better solution would be an MCA
parameter: if the logical "mca_do_dlopen_hackery" (or whatever) MCA
parameter is found to be true during the very beginning of MPI_INIT
(down in the depths of opal_init(), actually), then we will
lt_dlopen[_advise]("/libmpi").  For completeness, we'll do the
corresponding dlclose in opal_finalize().  I need to think about this
a bit more and run it by Brian Barrett... he's quite good at finding
holes in these kinds of complex scenarios.  :-)

This should hypothetically allow you to do a simple putenv() before
calling MPI_INIT and then the Right magic should occur.

--
Jeff Squyres
Cisco Systems
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems