[OMPI users] OpenMPI exits when subsequent tail -f in script is interrupted

2011-04-22 Thread Pablo Lopez Rios

Hi,

I'm having a bit of a problem with wrapping mpirun in a script. The 
script needs to run an MPI job in the background and tail -f the output. 
Pressing Ctrl+C should stop tail -f, and the MPI job should continue. 
However mpirun seems to detect the SIGINT that was meant for tail, and 
kills the job immediately. I've tried workarounds involving nohup, 
disown, trap, subshells (including calling the script from within 
itself), etc, to no avail.


The problem is that this doesn't happen if I run the command directly 
instead, without mpirun. Attached is a script that reproduces the 
problem. It runs a simple counting script in the background which takes 
10 seconds to run, and tails the output. If called with "nompi" as first 
argument, it will simply run bash -c "$SCRIPT" >& "$out" &, and with 
"mpi" it will do the same with 'mpirun -np 1' prepended. The output I 
get is:



$ ./ompi_bug.sh mpi
mpi:
1
2
3
4
^C
$ ./ompi_bug.sh nompi
nompi:
1
2
3
4
^C
$ cat output.*
mpi:
1
2
3
4
mpirun: killing job...

--
mpirun noticed that process rank 0 with PID 1222 on node pablomme exited 
on signal 0 (Unknown signal 0).

--
mpirun: clean termination accomplished

nompi:
1
2
3
4
5
6
7
8
9
10
Done


This convinces me that there is something strange with OpenMPI, since I 
expect no difference in signal handling when running a simple command 
with or without mpirun in the middle.


I've tried looking for options to change this behaviour, but I don't 
seem to find any. Is there one, preferably in the form of an environment 
variable? Or is this a bug?


I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also 
v1.2.8 as distributed with OpenSUSE 11.3.


Thanks,
Pablo


ompi_bug.sh.gz
Description: GNU Zip compressed data


Re: [OMPI users] intel compiler linking issue and issue of environment variable on remote node, with open mpi 1.4.3

2011-04-22 Thread Ralph Castain

On Apr 22, 2011, at 1:42 PM, ya...@adina.com wrote:

> Open MPI 1.4.3 + Intel Compilers V8.1 summary:
> (in case someone likes to refer to it later)
> 
> (1) To make all Open MPI executables statically linked and 
> independent of any dynamic libraries,
> "--disable-shared" and "--enable-static" options should BOTH be 
> fowarded to configure, and "-i-static"
> option should be specified for intel compilers too.
> 
> (2) It is confirmed that environment variables could be forwarded to 
> slave nodes, such as $PATH 
> and $LD_LIBARY_PATH, by specifying options to mpirun. 
> However, mpirun will invoke orted daemon on
> master and slave nodes.

This is not correct - mpirun will not invoke an orted daemon on the master 
node. mpirun itself acts as the local daemon.

> These environment variables passed to 
> slave nodes via mpirun options does not 
> take into effect before orted started.

This is not entirely correct. It depends on the launcher. For rsh/ssh 
launchers, we do indeed set the environmental variables prior to executing the 
orted daemon. Some launch environments do not support that functionality.


> So if orted daemon needs 
> these environment variables to run,
> the only way is to set these environment variables in a shared 
> .bashrc or .profile file, visible to 
> both master and slave nodes, say, on a shared NFS partition. It 
> seems no other way to resolve this kind
> of dependence.
> 
> Regards,
> Yiguang
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-04-22 Thread Brock Palen
On Apr 21, 2011, at 6:49 PM, Ralph Castain wrote:

> 
> On Apr 21, 2011, at 4:41 PM, Brock Palen wrote:
> 
>> Given that part of our cluster is TCP only, openib wouldn't even startup on 
>> those hosts
> 
> That is correct - it would have no impact on those hosts
> 
>> and this would be ignored on hosts with IB adaptors?  
> 
> Ummm...not sure I understand this one. The param -will- be used on hosts with 
> IB adaptors because that is what it is controlling.
> 
> However, it -won't- have any impact on hosts without IB adaptors, which is 
> what I suspect you meant to ask?

Correct typo, Thanks, I am going to add the environment variable to our OpenMPI 
modules so rdmacm is our default for now,  Thanks!

> 
> 
>> 
>> Just checking thanks!
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> Center for Advanced Computing
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> On Apr 21, 2011, at 6:21 PM, Jeff Squyres wrote:
>> 
>>> Over IB, I'm not sure there is much of a drawback.  It might be slightly 
>>> slower to establish QP's, but I don't think that matters much.
>>> 
>>> Over iWARP, rdmacm can cause connection storms as you scale to thousands of 
>>> MPI processes.
>>> 
>>> 
>>> On Apr 20, 2011, at 5:03 PM, Brock Palen wrote:
>>> 
 We managed to have another user hit the bug that causes collectives (this 
 time MPI_Bcast() ) to hang on IB that was fixed by setting:
 
 btl_openib_cpc_include rdmacm
 
 My question is if we set this to the default on our system with an 
 environment variable does it introduce any performance or other issues we 
 should be aware of?
 
 Is there a reason we should not use rdmacm?
 
 Thanks!
 
 Brock Palen
 www.umich.edu/~brockp
 Center for Advanced Computing
 bro...@umich.edu
 (734)936-1985
 
 
 
 
 ___
 users mailing list
 us...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> 
>>> -- 
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> 
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> 
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 




Re: [OMPI users] Bug in MPI_scatterv Fortran-90 implementation

2011-04-22 Thread Jeff Squyres
Oops!  Missed that; thanks.

I've committed the change to the trunk and filed CMRs to bring the fix to v1.4 
and v1.5.  

Thanks for reporting the issue.


On Apr 22, 2011, at 1:03 AM, Stanislav Sazykin wrote:

> Jeff,
> 
> No, the patch did not solve the problem. Looking more,
> there is another place where the interfaces come up, in
> mpi-f90-interfaces.h.sh in ompi/mpi/f90/scripts
> 
> If I manually change the two arguments to arrays from scalars
> in both scripts after running configure but before "make",
> then it works.
> 
> Stan Sazykin
> 
> 
> On 4/21/2011 11:07, Jeff Squyres wrote:
>> I do believe you found a bona-fide bug.
>> 
>> Could you try the attached patch?  (I think it should only affect f90 
>> "large" builds)  You should be able to check it quickly via:
>> 
>> cd top_of_ompi_source_tree
>> patch -p0<  scatterv-f90.patch
>> cd ompi/mpi/f90
>> make clean
>> rm mpi_scatterv_f90.f90
>> make all install
>> 
>> 
>> 
>> On Apr 21, 2011, at 10:37 AM, Stanislav Sazykin wrote:
>> 
>>> Hello,
>>> 
>>> I came across what appears to be an error in implementation of
>>> MPI_scatterv Fortran-90 version. I am using OpenMPI 1.4.3 on Linux.
>>> This comes up when OpenMPI was configured with
>>> --with-mpi-f90-size=medium or --with-mpi-f90-size=large
>>> 
>>> The standard specifies that the interface is
>>> MPI_SCATTERV(SENDBUF, SENDCOUNTS, DISPLS, SENDTYPE, RECVBUF,
>>>RECVCOUNT, RECVTYPE, ROOT, COMM, IERROR)
>>> SENDBUF(*), RECVBUF(*)
>>>INTEGERSENDCOUNTS(*), DISPLS(*), SENDTYPE
>>> 
>>> so that SENDCOUNTS and DISPLS are integer arrays. However, if
>>> I compile a fortran code with calls to MPI_scatterv and compile
>>> with argument checks, two Fortran compilers (Intel and Lahey)
>>> produce fatal errors saying there is no matching interface.
>>> 
>>> Looking in the source code of OpenMPI, I see that  in
>>> ompi/mpi/f90/scripts, the script mpi_scatterv_f90.f90.sh that
>>> is invoked when running "make" produces Fortran interfaces
>>> that list both SENDCOUNTS and DISPLS as
>>> 
>>> integer, intent(in) ::
>>> 
>>> This appears to be an error as it would be illegal to pass a scalar
>>> variable and receive it as an array in the subroutine. I have not
>>> figured out what happens in the code at this invocation (the code
>>> is complicated), but seems like a segfault situation.
>>> 
>>> --
>>> Stan Sazykin
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] MPI_Gatherv error

2011-04-22 Thread David Zhang
I wonder if this is related to the problem reported in
[OMPI users] Bug in MPI_scatterv Fortran-90 implementation

On Thu, Apr 21, 2011 at 7:19 PM, Zhangping Wei wrote:

> Dear all,
>
> I am a beginner of MPI, right now I try to use MPI_GATHERV in my code, the
> test code just gather the value of array A to store them in array B, but I
> found an error listed as follows,
>
> 'Fatal error in MPI_Gatherv: Invalid count, error stack:
>
> PMPI_Gatherv<398>: MPI_Gatherv failed 

Re: [OMPI users] huge VmRSS on rank 0 after MPI_Init when using "btl_openib_receive_queues" option

2011-04-22 Thread Eloi Gaudry
it varies with the receive_queues specification *and* with the number of 
mpi processes:  memory_consumed = nb_mpi_process * nb_buffers * 
(buffer_size + low_buffer_count_watermark + credit_window_size )


éloi


On 04/22/2011 12:26 AM, Jeff Squyres wrote:

Does it vary exactly according to your receive_queues specification?

On Apr 19, 2011, at 9:03 AM, Eloi Gaudry wrote:


hello,

i would like to get your input on this:
when launching a parallel computation on 128 nodes using openib and the "-mca 
btl_openib_receive_queues P,65536,256,192,128" option, i observe a rather large 
resident memory consumption (2GB: 65336*256*128) on the process with rank 0 (and only 
this process) just after a call to MPI_Init.

i'd like to know why the other processes doesn't behave the same:
- other processes located on the same nodes don't use that amount of memory
- all others processes (i.e. located on any other nodes) neither

i'm using OpenMPI-1.4.2, built with gcc-4.3.4 and '--enable-cxx-exceptions 
--with-pic --with-threads=posix' options.

thanks for your help,
éloi

--
Eloi Gaudry
Senior Product Development Engineer

Free Field Technologies
Company Website: http://www.fft.be
Direct Phone Number: +32 10 495 147

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





--
Eloi Gaudry
Senior Product Development Engineer

Free Field Technologies
Company Website: http://www.fft.be
Direct Phone Number: +32 10 495 147



Re: [OMPI users] Bug in MPI_scatterv Fortran-90 implementation

2011-04-22 Thread Stanislav Sazykin

Jeff,

No, the patch did not solve the problem. Looking more,
there is another place where the interfaces come up, in
mpi-f90-interfaces.h.sh in ompi/mpi/f90/scripts

If I manually change the two arguments to arrays from scalars
in both scripts after running configure but before "make",
then it works.

Stan Sazykin


On 4/21/2011 11:07, Jeff Squyres wrote:

I do believe you found a bona-fide bug.

Could you try the attached patch?  (I think it should only affect f90 "large" 
builds)  You should be able to check it quickly via:

cd top_of_ompi_source_tree
patch -p0<  scatterv-f90.patch
cd ompi/mpi/f90
make clean
rm mpi_scatterv_f90.f90
make all install



On Apr 21, 2011, at 10:37 AM, Stanislav Sazykin wrote:


Hello,

I came across what appears to be an error in implementation of
MPI_scatterv Fortran-90 version. I am using OpenMPI 1.4.3 on Linux.
This comes up when OpenMPI was configured with
--with-mpi-f90-size=medium or --with-mpi-f90-size=large

The standard specifies that the interface is
MPI_SCATTERV(SENDBUF, SENDCOUNTS, DISPLS, SENDTYPE, RECVBUF,
RECVCOUNT, RECVTYPE, ROOT, COMM, IERROR)
 SENDBUF(*), RECVBUF(*)
INTEGERSENDCOUNTS(*), DISPLS(*), SENDTYPE

so that SENDCOUNTS and DISPLS are integer arrays. However, if
I compile a fortran code with calls to MPI_scatterv and compile
with argument checks, two Fortran compilers (Intel and Lahey)
produce fatal errors saying there is no matching interface.

Looking in the source code of OpenMPI, I see that  in
ompi/mpi/f90/scripts, the script mpi_scatterv_f90.f90.sh that
is invoked when running "make" produces Fortran interfaces
that list both SENDCOUNTS and DISPLS as

integer, intent(in) ::

This appears to be an error as it would be illegal to pass a scalar
variable and receive it as an array in the subroutine. I have not
figured out what happens in the code at this invocation (the code
is complicated), but seems like a segfault situation.

--
Stan Sazykin
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users