[OMPI users] MPI_Gatherv error

2011-04-21 Thread Zhangping Wei
Dear all,
I am a beginner of MPI, right now I try to use MPI_GATHERV in my code, the test 
code just gather the value of array A to store them in array B, but I found an 
error listed as follows, 

'Fatal error in MPI_Gatherv: Invalid count, error stack:
PMPI_Gatherv<398>: MPI_Gatherv failed 

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-04-21 Thread Ralph Castain

On Apr 21, 2011, at 4:41 PM, Brock Palen wrote:

> Given that part of our cluster is TCP only, openib wouldn't even startup on 
> those hosts

That is correct - it would have no impact on those hosts

> and this would be ignored on hosts with IB adaptors?  

Ummm...not sure I understand this one. The param -will- be used on hosts with 
IB adaptors because that is what it is controlling.

However, it -won't- have any impact on hosts without IB adaptors, which is what 
I suspect you meant to ask?


> 
> Just checking thanks!
> 
> Brock Palen
> www.umich.edu/~brockp
> Center for Advanced Computing
> bro...@umich.edu
> (734)936-1985
> 
> 
> 
> On Apr 21, 2011, at 6:21 PM, Jeff Squyres wrote:
> 
>> Over IB, I'm not sure there is much of a drawback.  It might be slightly 
>> slower to establish QP's, but I don't think that matters much.
>> 
>> Over iWARP, rdmacm can cause connection storms as you scale to thousands of 
>> MPI processes.
>> 
>> 
>> On Apr 20, 2011, at 5:03 PM, Brock Palen wrote:
>> 
>>> We managed to have another user hit the bug that causes collectives (this 
>>> time MPI_Bcast() ) to hang on IB that was fixed by setting:
>>> 
>>> btl_openib_cpc_include rdmacm
>>> 
>>> My question is if we set this to the default on our system with an 
>>> environment variable does it introduce any performance or other issues we 
>>> should be aware of?
>>> 
>>> Is there a reason we should not use rdmacm?
>>> 
>>> Thanks!
>>> 
>>> Brock Palen
>>> www.umich.edu/~brockp
>>> Center for Advanced Computing
>>> bro...@umich.edu
>>> (734)936-1985
>>> 
>>> 
>>> 
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-04-21 Thread Brock Palen
Given that part of our cluster is TCP only, openib wouldn't even startup on 
those hosts and this would be ignored on hosts with IB adaptors?  

Just checking thanks!

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Apr 21, 2011, at 6:21 PM, Jeff Squyres wrote:

> Over IB, I'm not sure there is much of a drawback.  It might be slightly 
> slower to establish QP's, but I don't think that matters much.
> 
> Over iWARP, rdmacm can cause connection storms as you scale to thousands of 
> MPI processes.
> 
> 
> On Apr 20, 2011, at 5:03 PM, Brock Palen wrote:
> 
>> We managed to have another user hit the bug that causes collectives (this 
>> time MPI_Bcast() ) to hang on IB that was fixed by setting:
>> 
>> btl_openib_cpc_include rdmacm
>> 
>> My question is if we set this to the default on our system with an 
>> environment variable does it introduce any performance or other issues we 
>> should be aware of?
>> 
>> Is there a reason we should not use rdmacm?
>> 
>> Thanks!
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> Center for Advanced Computing
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 




Re: [OMPI users] Need help buiding OpenMPI with Intel v12.0 compilers on Linux

2011-04-21 Thread Jeff Squyres
On Apr 20, 2011, at 10:44 AM, Ormiston, Scott J. wrote:

> I originally thought the configure was fine, but now tht I check through the 
> config.log, I see that it had errors:
> 
> conftest.c(49): error #2379: cannot open source file "ac_nonexistent.h"
>  #include 

It's normal and expected for there to be lots of errors in config.log.  

There's a bunch of tests in configure that are designed to succeed on some 
systems and fail on others.  

So don't read anything into the failures that you see in config.log -- unless 
configure itself fails.  Then we generally go look at the *last* failures in 
config.log to start backtracking to figure out what went wrong.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] using openib and psm together

2011-04-21 Thread Jeff Squyres
I believe it was mainly a startup issue -- there's a complicated sequence of 
events that happens during MPI_INIT.  IIRC, the issue was that if OMPI had 
software support for PSM, it assumed that the lack of PSM hardware was 
effectively an error.

v1.5 made the startup sequence a little more flexible; the PSM bits in OMPI can 
say "Oh yes, we have PSM support, but I don't see any PSM hardware, so just 
ignore me... please move along... nothing to see here..."

OMPI's openib BTL has had this kind of support for a long time, but PSM and 
verbs are treated a little differently in the startup sequence because they're 
fundamentally different kinds of transports (abstraction-wise, anyway).



On Apr 21, 2011, at 6:01 AM, Dave Love wrote:

> We have an installation with both Mellanox and Qlogic IB adaptors (in
> distinct islands), so I built open-mpi 1.4.3 with openib and psm
> support.
> 
> Now I've just read this in the OFED source, but I can't see any relevant
> issue in the open-mpi tracker:
> 
>  OpenMPI support
>  ---
>  It is recommended to use the OpenMPI v1.5 development branch. Prior versions
>  of OpenMPI have an issue with support PSM network transports mixed with 
> standard
>  Verbs transport (BTL openib). This prevents an OpenMPI installation with
>  network modules available for PSM and Verbs to work correctly on nodes with
>  no QLogic IB hardware. This has been fixed in the latest development branch
>  allowing a single OpenMPI installation to target IB hardware via PSM or Verbs
>  as well as alternate transports seamlessly.
> 
> Do I definitely need 1.5 (and is 1.5.3 good enough?) to have openib and
> psm working correctly?  Also what are the symptoms of it not working
> correctly?
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] huge VmRSS on rank 0 after MPI_Init when using "btl_openib_receive_queues" option

2011-04-21 Thread Jeff Squyres
Does it vary exactly according to your receive_queues specification?

On Apr 19, 2011, at 9:03 AM, Eloi Gaudry wrote:

> hello,
> 
> i would like to get your input on this:
> when launching a parallel computation on 128 nodes using openib and the "-mca 
> btl_openib_receive_queues P,65536,256,192,128" option, i observe a rather 
> large resident memory consumption (2GB: 65336*256*128) on the process with 
> rank 0 (and only this process) just after a call to MPI_Init.
> 
> i'd like to know why the other processes doesn't behave the same:
> - other processes located on the same nodes don't use that amount of memory
> - all others processes (i.e. located on any other nodes) neither
> 
> i'm using OpenMPI-1.4.2, built with gcc-4.3.4 and '--enable-cxx-exceptions 
> --with-pic --with-threads=posix' options.
> 
> thanks for your help,
> éloi
> 
> -- 
> Eloi Gaudry
> Senior Product Development Engineer
> 
> Free Field Technologies
> Company Website: http://www.fft.be
> Direct Phone Number: +32 10 495 147
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-04-21 Thread Jeff Squyres
Over IB, I'm not sure there is much of a drawback.  It might be slightly slower 
to establish QP's, but I don't think that matters much.

Over iWARP, rdmacm can cause connection storms as you scale to thousands of MPI 
processes.


On Apr 20, 2011, at 5:03 PM, Brock Palen wrote:

> We managed to have another user hit the bug that causes collectives (this 
> time MPI_Bcast() ) to hang on IB that was fixed by setting:
> 
> btl_openib_cpc_include rdmacm
> 
> My question is if we set this to the default on our system with an 
> environment variable does it introduce any performance or other issues we 
> should be aware of?
> 
> Is there a reason we should not use rdmacm?
> 
> Thanks!
> 
> Brock Palen
> www.umich.edu/~brockp
> Center for Advanced Computing
> bro...@umich.edu
> (734)936-1985
> 
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Removing Portals BTLs

2011-04-21 Thread Ralph Castain
Sure - instead of what you did, just add --without-portals to your original 
configure. The exact option depends on what portals you have installed.

Here is the relevant part of the "./configure -h" output:

  --with-portals=DIR  Specify the installation directory of PORTALS
  --with-portals-libs=LIBS
  Libraries to link with for portals
  --with-portals4(=DIR)   Build Portals4 support, optionally adding
  DIR/include, DIR/lib, and DIR/lib64 to the search
  path for headers and libraries
  --with-portals4-libdir=DIR
  Search for Portals4 libraries in DIR

Just do --without-portals or --without-portals4 (you don't need the matching 
libdir option), whichever matches what you have.




On Apr 21, 2011, at 11:34 AM, Paul Monday wrote:

> Hi,
> 
> I am trying to get rid of the following error message when I use mpirun.
> 
> mca: base: component_find: "mca_ess_portals_utcp" does not appear to be a 
> valid
> ess MCA dynamic component (ignored):
> /usr/local/lib/openmpi/mca_ess_portals_utcp.so: undefined symbol:
> mca_ess_portals_utcp_component
> 
> I am trying to remove the portals components altogether...here's why:
> 
> When I originally built openmpi, I used a simple configuration string:
> ./configure --with-threads=posix --enable-mpi-threads --with-openib=/usr
> --with-openib-libdir=/usr/lib64 --disable-mpi-cxx
> 
> This gives me an error while the make is running, most likely a problem with 
> my
> Portals installation.  So, I just want to skip Portals BTLs.
> /usr/bin/ld: /usr/local/lib/libp3api.a(libp3api_a-acl.o): relocation
> R_X86_64_32S against `p3_api_process' can not be used when making a shared
> object; recompile with -fPIC
> /usr/local/lib/libp3api.a: could not read symbols: Bad value
> collect2: ld returned 1 exit status
> make[2]: *** [libmpi.la] Error 1
> make[2]: Leaving directory `/mnt/shared/apps/openmpi-1.4.3/ompi'
> make[1]: *** [all-recursive] Error 1
> make[1]: Leaving directory `/mnt/shared/apps/openmpi-1.4.3/ompi'
> make: *** [all-recursive] Error 1
> 
> So I changed the configuration to:
> ./configure --with-threads=posix --enable-mpi-threads --with-openib=/usr
> --with-openib-libdir=/usr/lib64 --disable-mpi-cxx
> --enable-mca-no-build=btl-portals,ess-portals_utcp,common-portals,mtl-portals
> 
> This allowed OpenMPI to build, but then I receive the runtime error above.  Is
> there a way to stop the Portals pieces from even trying to build and run?
> 
> Paul Monday
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] Removing Portals BTLs

2011-04-21 Thread Paul Monday
Hi,

I am trying to get rid of the following error message when I use mpirun.

mca: base: component_find: "mca_ess_portals_utcp" does not appear to be a valid
ess MCA dynamic component (ignored):
/usr/local/lib/openmpi/mca_ess_portals_utcp.so: undefined symbol:
mca_ess_portals_utcp_component

I am trying to remove the portals components altogether...here's why:

When I originally built openmpi, I used a simple configuration string:
./configure --with-threads=posix --enable-mpi-threads --with-openib=/usr
--with-openib-libdir=/usr/lib64 --disable-mpi-cxx

This gives me an error while the make is running, most likely a problem with my
Portals installation.  So, I just want to skip Portals BTLs.
/usr/bin/ld: /usr/local/lib/libp3api.a(libp3api_a-acl.o): relocation
R_X86_64_32S against `p3_api_process' can not be used when making a shared
object; recompile with -fPIC
/usr/local/lib/libp3api.a: could not read symbols: Bad value
collect2: ld returned 1 exit status
make[2]: *** [libmpi.la] Error 1
make[2]: Leaving directory `/mnt/shared/apps/openmpi-1.4.3/ompi'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/mnt/shared/apps/openmpi-1.4.3/ompi'
make: *** [all-recursive] Error 1

So I changed the configuration to:
./configure --with-threads=posix --enable-mpi-threads --with-openib=/usr
--with-openib-libdir=/usr/lib64 --disable-mpi-cxx
--enable-mca-no-build=btl-portals,ess-portals_utcp,common-portals,mtl-portals

This allowed OpenMPI to build, but then I receive the runtime error above.  Is
there a way to stop the Portals pieces from even trying to build and run?

Paul Monday



Re: [OMPI users] Bug in MPI_scatterv Fortran-90 implementation

2011-04-21 Thread Jeff Squyres
I do believe you found a bona-fide bug.

Could you try the attached patch?  (I think it should only affect f90 "large" 
builds)  You should be able to check it quickly via:

cd top_of_ompi_source_tree
patch -p0 < scatterv-f90.patch
cd ompi/mpi/f90
make clean
rm mpi_scatterv_f90.f90
make all install



On Apr 21, 2011, at 10:37 AM, Stanislav Sazykin wrote:

> Hello,
> 
> I came across what appears to be an error in implementation of
> MPI_scatterv Fortran-90 version. I am using OpenMPI 1.4.3 on Linux.
> This comes up when OpenMPI was configured with
> --with-mpi-f90-size=medium or --with-mpi-f90-size=large
> 
> The standard specifies that the interface is
> MPI_SCATTERV(SENDBUF, SENDCOUNTS, DISPLS, SENDTYPE, RECVBUF,
>RECVCOUNT, RECVTYPE, ROOT, COMM, IERROR)
>SENDBUF(*), RECVBUF(*)
>INTEGERSENDCOUNTS(*), DISPLS(*), SENDTYPE
> 
> so that SENDCOUNTS and DISPLS are integer arrays. However, if
> I compile a fortran code with calls to MPI_scatterv and compile
> with argument checks, two Fortran compilers (Intel and Lahey)
> produce fatal errors saying there is no matching interface.
> 
> Looking in the source code of OpenMPI, I see that  in
> ompi/mpi/f90/scripts, the script mpi_scatterv_f90.f90.sh that
> is invoked when running "make" produces Fortran interfaces
> that list both SENDCOUNTS and DISPLS as
> 
> integer, intent(in) ::
> 
> This appears to be an error as it would be illegal to pass a scalar
> variable and receive it as an array in the subroutine. I have not
> figured out what happens in the code at this invocation (the code
> is complicated), but seems like a segfault situation.
> 
> -- 
> Stan Sazykin
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


scatterv-f90.patch
Description: Binary data


[OMPI users] Bug in MPI_scatterv Fortran-90 implementation

2011-04-21 Thread Stanislav Sazykin

Hello,

I came across what appears to be an error in implementation of
MPI_scatterv Fortran-90 version. I am using OpenMPI 1.4.3 on Linux.
This comes up when OpenMPI was configured with
--with-mpi-f90-size=medium or --with-mpi-f90-size=large

The standard specifies that the interface is
MPI_SCATTERV(SENDBUF, SENDCOUNTS, DISPLS, SENDTYPE, RECVBUF,
RECVCOUNT, RECVTYPE, ROOT, COMM, IERROR)
SENDBUF(*), RECVBUF(*)
INTEGERSENDCOUNTS(*), DISPLS(*), SENDTYPE

so that SENDCOUNTS and DISPLS are integer arrays. However, if
I compile a fortran code with calls to MPI_scatterv and compile
with argument checks, two Fortran compilers (Intel and Lahey)
produce fatal errors saying there is no matching interface.

Looking in the source code of OpenMPI, I see that  in
ompi/mpi/f90/scripts, the script mpi_scatterv_f90.f90.sh that
is invoked when running "make" produces Fortran interfaces
that list both SENDCOUNTS and DISPLS as

integer, intent(in) ::

This appears to be an error as it would be illegal to pass a scalar
variable and receive it as an array in the subroutine. I have not
figured out what happens in the code at this invocation (the code
is complicated), but seems like a segfault situation.

--
Stan Sazykin


[OMPI users] using openib and psm together

2011-04-21 Thread Dave Love
We have an installation with both Mellanox and Qlogic IB adaptors (in
distinct islands), so I built open-mpi 1.4.3 with openib and psm
support.

Now I've just read this in the OFED source, but I can't see any relevant
issue in the open-mpi tracker:

  OpenMPI support
  ---
  It is recommended to use the OpenMPI v1.5 development branch. Prior versions
  of OpenMPI have an issue with support PSM network transports mixed with 
standard
  Verbs transport (BTL openib). This prevents an OpenMPI installation with
  network modules available for PSM and Verbs to work correctly on nodes with
  no QLogic IB hardware. This has been fixed in the latest development branch
  allowing a single OpenMPI installation to target IB hardware via PSM or Verbs
  as well as alternate transports seamlessly.

Do I definitely need 1.5 (and is 1.5.3 good enough?) to have openib and
psm working correctly?  Also what are the symptoms of it not working
correctly?