Re: [OMPI users] OMPI users] OMPI users] OMPI users] which info is needed for SIGSEGV inJava foropenmpi-dev-124-g91e9686on Solaris

2014-10-27 Thread Gilles Gouaillardet
Ralph,

On 2014/10/28 0:46, Ralph Castain wrote:
> Actually, I propose to also remove that issue. Simple enough to use a
> hash_table_32 to handle the jobids, and let that point to a
> hash_table_32 of vpids. Since we rarely have more than one jobid
> anyway, the memory overhead actually decreases with this model, and we
> get rid of that annoying need to memcpy everything. 
sounds good to me.
from an implementation/performance point of view, should we put treat
the local jobid differently ?
(e.g. use a special variable for the hash_table_32 of the vpids of the
current jobid)
>> as far as i am concerned, i am fine with your proposed suggestion to
>> dump opal_identifier_t.
>>
>> about the patch, did you mean you have something ready i can apply to my
>> PR ?
>> or do you expect me to do the changes (i am ok to do it if needed)
> Why don’t I grab your branch, create a separate repo based on it (just to 
> keep things clean), push it to my area and give you write access? We can then 
> collaborate on the changes and create a PR from there. This way, you don’t 
> need to give me write access to your entire repo.
>
> Make sense?
ok to work on an other "somehow shared" repo for that issue.
i am not convinced you should grab my branch since all the changes i
made are will be no more valid.
anyway, feel free to fork a repo from my branch or the master and i will
work from here.

Cheers,

Gilles



Re: [OMPI users] OpenMPI 1.8.3 configure fails, Mac OS X 10.9.5, Intel Compilers

2014-10-27 Thread Ralph Castain
FWIW: I just tested with the Intel 15 compilers on Mac 10.10 and it works fine, 
so apparently the problem has been fixed. You should be able to upgrade to the 
15 versions, so that might be the best solution


> On Oct 27, 2014, at 11:06 AM, Bosler, Peter Andrew  wrote:
> 
> Good morning,
> 
> I’m trying to build OpenMPI with the Intel 14.01 compilers with the following 
> configure line
> ./configure --prefix=/opt/openmpi-1.8.3/intel-14.01 CC=icc CXX=icpc FC=ifort
> On a 6-core 3.5 GHz Intel Xeon E5 Mac Pro running Mac OS X 10.9.5.  
> 
> Configure outputs a pthread error, complaining that different threads don’t 
> have the same PID.
> I also get the same error with OpenMPI 1.8.2 and the Intel compilers.   
> I was able to build OpenMPI 1.8.3 with both LLVM 5.1 and GCC 4.9 so something 
> is going wrong with the Intel compilers threading interface.  
> 
> Interestingly, OpenMPI 1.8.3 and the Intel 14.01 compilers work fine on my 
> Macbook pro : same OS, different CPU (2.8 Ghz Intel Core i7), same configure 
> line.
> 
> Is there an environment variable or configure option that I need to set to 
> avoid this error on the Mac Pro?
> 
> Thanks for your help.
> 
> Pete Bosler
> 
> P.S. The specific warnings and error from openmpi-1.8.3/configure are the 
> following (and the whole output file is attached):
> 
> … Lots of output …
> configure: WARNING: ulimit.h: present but cannot be compiled
> configure: WARNING: ulimit.h: check for missing prerequisite headers?
> configure: WARNING: ulimit.h: see the Autoconf documentation
> configure: WARNING: ulimit.h: section "Present But Cannot Be Compiled"
> configure: WARNING: ulimit.h: proceeding with the compiler's result
> configure: WARNING: ## 
> -- ##
> configure: WARNING: ## Report this to 
> http://www.open-mpi.org/community/help/ ##
> configure: WARNING: ## 
> -- ##
> … Lots more output …
> checking if threads have different pids (pthreads on linux)... yes
> configure: WARNING: This version of Open MPI only supports environments where
> configure: WARNING: threads have the same PID.  Please use an older version of
> configure: WARNING: Open MPI if you need support on systems with different
> configure: WARNING: PIDs for threads in the same process.  Open MPI 1.4.x
> configure: WARNING: supports such systems, as does at least some versions the
> configure: WARNING: Open MPI 1.5.x series.
> configure: error: Cannot continue
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/10/25618.php



Re: [OMPI users] Problem with Yosemite

2014-10-27 Thread Guillaume Houzeaux

On 24/10/14 18:09 pm, Ralph Castain wrote:

I was able to build and run the trunk without problem on Yosemite with:

gcc (MacPorts gcc49 4.9.1_0) 4.9.1
GNU Fortran (MacPorts gcc49 4.9.1_0) 4.9.1

Will test 1.8 branch now, though I believe the fortran support in 1.8 
is up-to-date



Dear all,

I reinstalled everything following your instructions and finally 
everyhting works well 

Thanks a lot for your help,

now back to work,

Guillaume

--
/Le camembert, de son fumet de venaison, avait vaincu les odeurs plus 
sourdes du marolles
et du limbourg; il élargissait ses exhalaisons, étouffait les autres 
senteurs sous une
abondance surprenante d'haleines gâtées. Cependant, au milieu de cette 
phrase vigoureuse,
le parmesan jetait par moments un filet mince de flûte champêtre ; 
tandis que les brie y
mettaient des douceurs fades de tambourins humides. Il y eut une reprise 
suffocante du
livarot. Et cette symphonie se tint un moment sur une note aiguë du 
géromé anisé,

prolongée en point d'orgue.

Emile Zola - Le Ventre de Paris

/

Guillaume Houzeaux
Team Leader
Dpt. Computer Applications in Science and Engineering
Barcelona Supercomputing Center (BSC-CNS)
Edificio NEXUS I, Office 204
c) Gran Capitan 2-4
08034 Barcelona, Spain

Tel: +34 93 405 4291
Fax: +34 93 413 7721
Skype user: guillaume_houzeaux_bsc
WWW: CASE department 
ResearcherID: D-4950-2012
Scientific Profile: 
 
 
 





WARNING / LEGAL TEXT: This message is intended only for the use of the
individual or entity to which it is addressed and may contain
information which is privileged, confidential, proprietary, or exempt
from disclosure under applicable law. If you are not the intended
recipient or the person responsible for delivering the message to the
intended recipient, you are strictly prohibited from disclosing,
distributing, copying, or in any way using this message. If you have
received this communication in error, please notify the sender and
destroy and delete any copies you may have received.

http://www.bsc.es/disclaimer<>

[OMPI users] OpenMPI 1.8.3 configure fails, Mac OS X 10.9.5, Intel Compilers

2014-10-27 Thread Bosler, Peter Andrew
Good morning,

I'm trying to build OpenMPI with the Intel 14.01 compilers with the following 
configure line
./configure --prefix=/opt/openmpi-1.8.3/intel-14.01 CC=icc CXX=icpc FC=ifort
On a 6-core 3.5 GHz Intel Xeon E5 Mac Pro running Mac OS X 10.9.5.

Configure outputs a pthread error, complaining that different threads don't 
have the same PID.
I also get the same error with OpenMPI 1.8.2 and the Intel compilers.
I was able to build OpenMPI 1.8.3 with both LLVM 5.1 and GCC 4.9 so something 
is going wrong with the Intel compilers threading interface.

Interestingly, OpenMPI 1.8.3 and the Intel 14.01 compilers work fine on my 
Macbook pro : same OS, different CPU (2.8 Ghz Intel Core i7), same configure 
line.

Is there an environment variable or configure option that I need to set to 
avoid this error on the Mac Pro?

Thanks for your help.

Pete Bosler

P.S. The specific warnings and error from openmpi-1.8.3/configure are the 
following (and the whole output file is attached):

... Lots of output ...
configure: WARNING: ulimit.h: present but cannot be compiled
configure: WARNING: ulimit.h: check for missing prerequisite headers?
configure: WARNING: ulimit.h: see the Autoconf documentation
configure: WARNING: ulimit.h: section "Present But Cannot Be Compiled"
configure: WARNING: ulimit.h: proceeding with the compiler's result
configure: WARNING: ## 
-- ##
configure: WARNING: ## Report this to 
http://www.open-mpi.org/community/help/ ##
configure: WARNING: ## 
-- ##
... Lots more output ...
checking if threads have different pids (pthreads on linux)... yes
configure: WARNING: This version of Open MPI only supports environments where
configure: WARNING: threads have the same PID.  Please use an older version of
configure: WARNING: Open MPI if you need support on systems with different
configure: WARNING: PIDs for threads in the same process.  Open MPI 1.4.x
configure: WARNING: supports such systems, as does at least some versions the
configure: WARNING: Open MPI 1.5.x series.
configure: error: Cannot continue




== Configuring Open MPI


*** Startup tests
checking build system type... x86_64-apple-darwin13.4.0
checking host system type... x86_64-apple-darwin13.4.0
checking target system type... x86_64-apple-darwin13.4.0
checking for gcc... icc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether icc accepts -g... yes
checking for icc option to accept ISO C89... none needed
checking how to run the C preprocessor... icc -E
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for ANSI C header files... no
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking minix/config.h usability... no
checking minix/config.h presence... no
checking for minix/config.h... no
checking whether it is safe to define __EXTENSIONS__... yes
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... config/install-sh -c -d
checking for gawk... no
checking for mawk... no
checking for nawk... no
checking for awk... awk
checking whether make sets $(MAKE)... yes
checking for style of include used by make... GNU
checking how to create a ustar tar archive... none
checking dependency style of icc... gcc3
checking whether make supports nested variables... yes

*** Checking versions
checking Open MPI version... 1.8.3
checking Open MPI release date... Sep 25, 2014
checking Open MPI Subversion repository version... r32794
checking Open MPI Run-Time Environment version... 1.8.3
checking Open MPI Run-Time Environment release date... Sep 25, 2014
checking Open MPI Run-Time Environment Subversion repository version... r32794
checking Open SHMEM version... 1.8.3
checking Open SHMEM release date... Sep 25, 2014
checking Open SHMEM Subversion repository version... r32794
checking Open Portable Access Layer version... 1.8.3
checking Open Portable Access Layer release date... Sep 25, 2014
checking Open Portable Access Layer Subversion repository version... r32794
checking for bootstrap Autoconf version... 2.69
checking for bootstrap Automake version... 1.12
checking for boostrap Libtool version... 2.4.2

*** Initialization, setup
configure: builddir: /Users/pabosle/so

Re: [OMPI users] HAMSTER MPI+Yarn

2014-10-27 Thread Ralph Castain
FWIW: the “better” solution is to move Hadoop to an HPC-like RM such as Slurm. 
We did this as Pivotal as well as at Intel, but in both cases business moves at 
the very end of the project (Greenplum becoming Pivotal, and Intel moving its 
Hadoop work into Cloudera) blocked its release. Frustrating, as all the work 
was done in both cases :-(



> On Oct 27, 2014, at 10:28 AM, Brock Palen  wrote:
> 
> Thanks this is good feedback.
> 
> I was worried with the dynamic nature of Yarn containers that it would be 
> hard to coordinate wire up, and you have confirmed that.
> 
> Thanks
> 
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> XSEDE Campus Champion
> bro...@umich.edu
> (734)936-1985
> 
> 
> 
>> On Oct 27, 2014, at 11:25 AM, Ralph Castain  wrote:
>> 
>> 
>>> On Oct 26, 2014, at 9:56 PM, Brock Palen  wrote:
>>> 
>>> We are starting to look at supporting MPI on our Hadoop/Spark YARN based 
>>> cluster.
>> 
>> You poor soul…
>> 
>>> I found a bunch of referneces to Hamster, but what I don't find is if it 
>>> was ever merged into regular OpenMPI, and if so is it just another RM 
>>> integration?  Or does it need more setup?
>> 
>> When I left Pivotal, it was based on a copy of the OMPI trunk that sat 
>> somewhere in the 1.7 series, I believe. Last contact I had indicated they 
>> were trying to update, but I’m not sure they were successful.
>> 
>>> 
>>> I found this:
>>> http://pivotalhd.docs.pivotal.io/doc/2100/Hamster.html
>> 
>> Didn’t know they had actually (finally) released it, so good to know. Just 
>> so you are aware, there are major problems running MPI under Yarn as it just 
>> isn’t designed for MPI support. What we did back then was add a JNI layer so 
>> that ORTE could run underneath it, and then added a PMI-like service to 
>> provide the wireup support (since Yarn couldn’t be used to exchange the info 
>> itself). You also have the issue that Yarn doesn’t understand the need for 
>> all the procs to be launched together, and so you have to modify Yarn so it 
>> will ensure that the MPI procs are all running or else you’ll hang in 
>> MPI_Init.
>> 
>>> 
>>> Which appears to imply extra setup required.  Is this documented anywhere 
>>> for OpenMPI?
>> 
>> I’m afraid you’ll just have to stick with the Pivotal-provided version as 
>> the integration is rather complicated. Don’t expect much in the way of 
>> performance! This was purely intended as a way for “casual” MPI users to 
>> make use of “free” time on their Hadoop cluster, not for any serious 
>> technical programming.
>> 
>>> 
>>> Brock Palen
>>> www.umich.edu/~brockp
>>> CAEN Advanced Computing
>>> XSEDE Campus Champion
>>> bro...@umich.edu
>>> (734)936-1985
>>> 
>>> 
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2014/10/25593.php
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/10/25613.php
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/10/25616.php



Re: [OMPI users] HAMSTER MPI+Yarn

2014-10-27 Thread Brock Palen
Thanks this is good feedback.

I was worried with the dynamic nature of Yarn containers that it would be hard 
to coordinate wire up, and you have confirmed that.

Thanks

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



> On Oct 27, 2014, at 11:25 AM, Ralph Castain  wrote:
> 
> 
>> On Oct 26, 2014, at 9:56 PM, Brock Palen  wrote:
>> 
>> We are starting to look at supporting MPI on our Hadoop/Spark YARN based 
>> cluster.
> 
> You poor soul…
> 
>>  I found a bunch of referneces to Hamster, but what I don't find is if it 
>> was ever merged into regular OpenMPI, and if so is it just another RM 
>> integration?  Or does it need more setup?
> 
> When I left Pivotal, it was based on a copy of the OMPI trunk that sat 
> somewhere in the 1.7 series, I believe. Last contact I had indicated they 
> were trying to update, but I’m not sure they were successful.
> 
>> 
>> I found this:
>> http://pivotalhd.docs.pivotal.io/doc/2100/Hamster.html
> 
> Didn’t know they had actually (finally) released it, so good to know. Just so 
> you are aware, there are major problems running MPI under Yarn as it just 
> isn’t designed for MPI support. What we did back then was add a JNI layer so 
> that ORTE could run underneath it, and then added a PMI-like service to 
> provide the wireup support (since Yarn couldn’t be used to exchange the info 
> itself). You also have the issue that Yarn doesn’t understand the need for 
> all the procs to be launched together, and so you have to modify Yarn so it 
> will ensure that the MPI procs are all running or else you’ll hang in 
> MPI_Init.
> 
>> 
>> Which appears to imply extra setup required.  Is this documented anywhere 
>> for OpenMPI?
> 
> I’m afraid you’ll just have to stick with the Pivotal-provided version as the 
> integration is rather complicated. Don’t expect much in the way of 
> performance! This was purely intended as a way for “casual” MPI users to make 
> use of “free” time on their Hadoop cluster, not for any serious technical 
> programming.
> 
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> XSEDE Campus Champion
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/10/25593.php
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/10/25613.php



Re: [OMPI users] MPI_Init seems to hang, but works after a, minute or two

2014-10-27 Thread maxinator333



Hello,


After compiling and running a MPI program, it seems to hang at
MPI_Init(), but it eventually will work after a minute or two.

While the problem occured on my Notebook it did not on my desktop PC.

It can be a timeout on a network interface.
I see a similar issue with wireless ON but not with wireless OFF
on my notebook.

In the past I saw with some virtual driver of Telecom company
for the 3G driver.


Both run on Win 7, cygwin 64 Bit, OpenMPI version 1.8.3 r32794
(ompi_info), g++ v 4.8.3. I actually synced the cygwin installations
later on, and it still didn't work, but it did for a short time after a
restart...

Regards
Marco


Thank you :)
Deactivating my WLAN did indeed the trick!
It also seems to not work, if a LAN-cable is plugged in. No difference 
if I am correctly connected (to the internet/gateway) or not (wrong IP, 
e.g. static given IP instead of mandatory DHCP)

Again: deactivating the relevant LAN helps
It seems, that in contrast to LAN, for WLAN it makes a difference, if 
I'm connected to some network or not. If not connected, it seems to 
work, without deactivating the whole hardware.


Re: [OMPI users] OMPI users] OMPI users] OMPI users] which info is needed for SIGSEGV inJava foropenmpi-dev-124-g91e9686on Solaris

2014-10-27 Thread Ralph Castain

> On Oct 26, 2014, at 11:12 PM, Gilles Gouaillardet 
>  wrote:
> 
> Ralph,
> 
> this is also a solution.
> the pro is it seems more lightweight than PR #249
> the two cons i can see are :
> - opal_process_name_t alignment goes from 64 to 32 bits
> - some functions (opal_hash_table_*) takes an uint64_t as argument so we
> still need to use memcpy in order to
>  * guarantee 64 bits alignment on some archs (such as sparc)
>  * avoid ugly cast such as uint64_t id = *(uint64_t *)&process_name;

Actually, I propose to also remove that issue. Simple enough to use a 
hash_table_32 to handle the jobids, and let that point to a hash_table_32 of 
vpids. Since we rarely have more than one jobid anyway, the memory overhead 
actually decreases with this model, and we get rid of that annoying need to 
memcpy everything.

> 
> as far as i am concerned, i am fine with your proposed suggestion to
> dump opal_identifier_t.
> 
> about the patch, did you mean you have something ready i can apply to my
> PR ?
> or do you expect me to do the changes (i am ok to do it if needed)

Why don’t I grab your branch, create a separate repo based on it (just to keep 
things clean), push it to my area and give you write access? We can then 
collaborate on the changes and create a PR from there. This way, you don’t need 
to give me write access to your entire repo.

Make sense?
Ralph

> 
> Cheers,
> 
> Gilles
> 
> On 2014/10/27 11:04, Ralph Castain wrote:
>> Just took a glance thru 249 and have a few suggestions on it - will pass 
>> them along tomorrow. I think the right solution is to (a) dump 
>> opal_identifier_t in favor of using opal_process_name_t everywhere in the 
>> opal layer, (b) typedef orte_process_name_t to opal_process_name_t, and (c) 
>> leave ompi_process_name_t as typedef’d to the RTE component in the MPI 
>> layer. This lets other RTEs decide for themselves how they want to handle it.
>> 
>> If you add changes to your branch, I can pass you a patch with my suggested 
>> alterations.
>> 
>>> On Oct 26, 2014, at 5:55 PM, Gilles Gouaillardet 
>>>  wrote:
>>> 
>>> No :-(
>>> I need some extra work to stop declaring orte_process_name_t and 
>>> ompi_process_name_t variables.
>>> #249 will make things much easier.
>>> One option is to use opal_process_name_t everywhere or typedef orte and 
>>> ompi types to the opal one.
>>> An other (lightweight but error prone imho) is to change variable 
>>> declaration only.
>>> Any thought ?
>>> 
>>> Ralph Castain  wrote:
 Will PR#249 solve it? If so, we should just go with it as I suspect that 
 is the long-term solution.
 
> On Oct 26, 2014, at 4:25 PM, Gilles Gouaillardet 
>  wrote:
> 
> It looks like we faced a similar issue :
> opal_process_name_t is 64 bits aligned wheteas orte_process_name_t is 32 
> bits aligned. If you run an alignment sensitive cpu such as sparc and you 
> are not lucky (so to speak) you can run into this issue.
> i will make a patch for this shortly
> 
> Ralph Castain  wrote:
>> Afraid this must be something about the Sparc - just ran on a Solaris 11 
>> x86 box and everything works fine.
>> 
>> 
>>> On Oct 26, 2014, at 8:22 AM, Siegmar Gross 
>>>  wrote:
>>> 
>>> Hi Gilles,
>>> 
>>> I wanted to explore which function is called, when I call MPI_Init
>>> in a C program, because this function should be called from a Java
>>> program as well. Unfortunately C programs break with a Bus Error
>>> once more for openmpi-dev-124-g91e9686 on Solaris. I assume that's
>>> the reason why I get no useful backtrace for my Java program.
>>> 
>>> tyr small_prog 117 mpicc -o init_finalize init_finalize.c
>>> tyr small_prog 118 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec
>>> ...
>>> (gdb) run -np 1 init_finalize
>>> Starting program: /usr/local/openmpi-1.9.0_64_gcc/bin/mpiexec -np 1 
>>> init_finalize
>>> [Thread debugging using libthread_db enabled]
>>> [New Thread 1 (LWP 1)]
>>> [New LWP2]
>>> [tyr:19240] *** Process received signal ***
>>> [tyr:19240] Signal: Bus Error (10)
>>> [tyr:19240] Signal code: Invalid address alignment (1)
>>> [tyr:19240] Failing at address: 7bd1c10c
>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libopen-pal.so.0.0.0:opal_backtrace_print+0x2c
>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libopen-pal.so.0.0.0:0xdcc04
>>> /lib/sparcv9/libc.so.1:0xd8b98
>>> /lib/sparcv9/libc.so.1:0xcc70c
>>> /lib/sparcv9/libc.so.1:0xcc918
>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libopen-pal.so.0.0.0:opal_proc_set_name+0x1c
>>>  [ Signal 10 (BUS)]
>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/openmpi/mca_pmix_native.so:0x103e8
>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/openmpi/mca_ess_pmi.so:0x33dc
>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libopen-rte.so.0.0.0:orte_init+0x67c

Re: [OMPI users] HAMSTER MPI+Yarn

2014-10-27 Thread Ralph Castain

> On Oct 26, 2014, at 9:56 PM, Brock Palen  wrote:
> 
> We are starting to look at supporting MPI on our Hadoop/Spark YARN based 
> cluster.

You poor soul…

>  I found a bunch of referneces to Hamster, but what I don't find is if it was 
> ever merged into regular OpenMPI, and if so is it just another RM 
> integration?  Or does it need more setup?

When I left Pivotal, it was based on a copy of the OMPI trunk that sat 
somewhere in the 1.7 series, I believe. Last contact I had indicated they were 
trying to update, but I’m not sure they were successful.

> 
> I found this:
> http://pivotalhd.docs.pivotal.io/doc/2100/Hamster.html 
> 

Didn’t know they had actually (finally) released it, so good to know. Just so 
you are aware, there are major problems running MPI under Yarn as it just isn’t 
designed for MPI support. What we did back then was add a JNI layer so that 
ORTE could run underneath it, and then added a PMI-like service to provide the 
wireup support (since Yarn couldn’t be used to exchange the info itself). You 
also have the issue that Yarn doesn’t understand the need for all the procs to 
be launched together, and so you have to modify Yarn so it will ensure that the 
MPI procs are all running or else you’ll hang in MPI_Init.

> 
> Which appears to imply extra setup required.  Is this documented anywhere for 
> OpenMPI?

I’m afraid you’ll just have to stick with the Pivotal-provided version as the 
integration is rather complicated. Don’t expect much in the way of performance! 
This was purely intended as a way for “casual” MPI users to make use of “free” 
time on their Hadoop cluster, not for any serious technical programming.

> 
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> XSEDE Campus Champion
> bro...@umich.edu
> (734)936-1985
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/10/25593.php



Re: [OMPI users] WG: Bug in OpenMPI-1.8.3: storage limition in shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

2014-10-27 Thread Nathan Hjelm
On Mon, Oct 27, 2014 at 02:15:45PM +, michael.rach...@dlr.de wrote:
> Dear Gilles,
> 
> This is  the system response on the login node of cluster5:
> 
> cluster5:~/dat> mpirun -np 1 df -h
> Filesystem  Size  Used Avail Use% Mounted on
> /dev/sda31  228G  5.6G  211G   3% /
> udev 32G  232K   32G   1% /dev
> tmpfs32G 0   32G   0% /dev/shm
> /dev/sda11  291M   39M  237M  15% /boot
> /dev/gpfs10 495T  280T  216T  57% /gpfs10
> /dev/loop1  3.2G  3.2G 0 100% /media
> cluster5:~/dat> mpirun -np 1 df -hi
> Filesystem Inodes IUsed IFree IUse% Mounted on
> /dev/sda3115M  253K   15M2% /
> udev0 0 0 - /dev
> tmpfs7.9M 3  7.9M1% /dev/shm
> /dev/sda1176K41   76K1% /boot
> /dev/gpfs10  128M   67M   62M   53% /gpfs10
> /dev/loop1  0 0 0 - /media
> cluster5:~/dat>
> 
> 
> And this the system response on the compute node of cluster5:
> 
> rachner@r5i5n13:~>  mpirun -np 1 df -h
> Filesystem  Size  Used Avail Use% Mounted on
> tmpfs63G  1.4G   62G   3% /
> udev 63G   92K   63G   1% /dev
> tmpfs63G 0   63G   0% /dev/shm
> tmpfs   150M   12M  139M   8% /tmp

This is the problem right here. /tmp can only be used to back a total of
139M of shared memory. /dev/shm can back up to 63G so using that will
solve your problem.

Try setting adding -mca shmem_mmap_relocate_backing_file true to your
mpirun line or add shmem_mmap_relocate_backing_file = true to your
installation's /etc/openmpi-mca-params.conf

-Nathan


pgpOl0hwQ3Qey.pgp
Description: PGP signature


Re: [OMPI users] WG: Bug in OpenMPI-1.8.3: storage limition in shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

2014-10-27 Thread Michael.Rachner
Dear Gilles,

This is  the system response on the login node of cluster5:

cluster5:~/dat> mpirun -np 1 df -h
Filesystem  Size  Used Avail Use% Mounted on
/dev/sda31  228G  5.6G  211G   3% /
udev 32G  232K   32G   1% /dev
tmpfs32G 0   32G   0% /dev/shm
/dev/sda11  291M   39M  237M  15% /boot
/dev/gpfs10 495T  280T  216T  57% /gpfs10
/dev/loop1  3.2G  3.2G 0 100% /media
cluster5:~/dat> mpirun -np 1 df -hi
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sda3115M  253K   15M2% /
udev0 0 0 - /dev
tmpfs7.9M 3  7.9M1% /dev/shm
/dev/sda1176K41   76K1% /boot
/dev/gpfs10  128M   67M   62M   53% /gpfs10
/dev/loop1  0 0 0 - /media
cluster5:~/dat>


And this the system response on the compute node of cluster5:

rachner@r5i5n13:~>  mpirun -np 1 df -h
Filesystem  Size  Used Avail Use% Mounted on
tmpfs63G  1.4G   62G   3% /
udev 63G   92K   63G   1% /dev
tmpfs63G 0   63G   0% /dev/shm
tmpfs   150M   12M  139M   8% /tmp
/dev/gpfs10 495T  280T  216T  57% /gpfs10
rachner@r5i5n13:~>  mpirun -np 1 df -hi
Filesystem Inodes IUsed IFree IUse% Mounted on
tmpfs 16M   63K   16M1% /
udev0 0 0 - /dev
tmpfs 16M 3   16M1% /dev/shm
tmpfs 16M   183   16M1% /tmp
/dev/gpfs10  128M   67M   62M   53% /gpfs10
rachner@r5i5n13:~>

You wrote: 
"From the logs, the error message makes sense to me : there is not enough space 
in /tmp Since the compute nodes have a lot of memory, you might want to try 
using /dev/shm instead of /tmp for the backing files"

I do not understand that system output.  Is it required now to switch to   
/dev/shm  ?   And how can I do that?  Or must our operators change something 
(the cluster is very new)? 

Greetings
 Michael Rachner


-Ursprüngliche Nachricht-
Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Gilles 
Gouaillardet
Gesendet: Montag, 27. Oktober 2014 14:49
An: Open MPI Users
Betreff: Re: [OMPI users] WG: Bug in OpenMPI-1.8.3: storage limition in shared 
memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

Michael,

Could you please run
mpirun -np 1 df -h
mpirun -np 1 df -hi
on both compute and login nodes

Thanks

Gilles

michael.rach...@dlr.de wrote:
>Dear developers of OPENMPI,
>
>We have now installed and tested the bugfixed OPENMPI Nightly Tarball  of 
>2014-10-24  (openmpi-dev-176-g9334abc.tar.gz) on Cluster5 .
>As before (with OPENMPI-1.8.3 release version) the small Ftn-testprogram runs 
>correctly on the login-node.
>As before the program aborts on the compute node, but now with a different 
>error message: 
>
>The following message appears when launching the program with 2 processes: 
>mpiexec -np 2 -bind-to core -tag-output ./a.out
>
>[1,0]: on nodemaster: iwin= 685 :
>[1,0]:  total storage [MByte] alloc. in shared windows so far:   
>137.
>[ [1,0]: === allocation of shared window no. iwin= 686
>[1,0]:  starting now with idim_1=   5
>---
>-- It appears as if there is not enough space for 
>/tmp/openmpi-sessions-rachner@r5i5n13_0/48127/1/shared_window_688.r5i5n
>13 (the shared-memory backing file). It is likely that your MPI job 
>will now either abort or experience performance degradation.
>
>  Local host:  r5i5n13
>  Space Requested: 204256 B
>  Space Available: 208896 B
>---
>--- [r5i5n13:26917] *** An error occurred in MPI_Win_allocate_shared 
>[r5i5n13:26917] *** reported by process [3154051073,140733193388032] 
>[r5i5n13:26917] *** on communicator MPI_COMM_WORLD [r5i5n13:26917] *** 
>MPI_ERR_INTERN: internal error [r5i5n13:26917] *** MPI_ERRORS_ARE_FATAL 
>(processes in this communicator will now abort,
>[r5i5n13:26917] ***and potentially your MPI job)
>rachner@r5i5n13:~/dat>
>***
>*
>
>
>When I repeat the run using 24 processes (on same compute node) the same kind 
>of abort message occurs, but earlier:
>
>[1,0]: on nodemaster: iwin= 231 :
>[1,0]:  total storage [MByte] alloc. in shared windows so far:   
>46.2
> [1,0]: === allocation of shared window no. iwin= 232
>[1,0]:  starting now with idim_1=   5
>---
>-- It appears as if there is not enough space for 
>/tmp/openmpi-sessions-rachner@r5i5n13_0/48029/1/shared_window_234.r5i5n
>13 (the shared-memory backing file). It is likely

Re: [OMPI users] WG: Bug in OpenMPI-1.8.3: storage limition in shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

2014-10-27 Thread Michael.Rachner


-Ursprüngliche Nachricht-
Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Gilles 
Gouaillardet
Gesendet: Montag, 27. Oktober 2014 14:49
An: Open MPI Users
Betreff: Re: [OMPI users] WG: Bug in OpenMPI-1.8.3: storage limition in shared 
memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

Michael,

Could you please run
mpirun -np 1 df -h
mpirun -np 1 df -hi
on both compute and login nodes

Thanks

Gilles

michael.rach...@dlr.de wrote:
>Dear developers of OPENMPI,
>
>We have now installed and tested the bugfixed OPENMPI Nightly Tarball  of 
>2014-10-24  (openmpi-dev-176-g9334abc.tar.gz) on Cluster5 .
>As before (with OPENMPI-1.8.3 release version) the small Ftn-testprogram runs 
>correctly on the login-node.
>As before the program aborts on the compute node, but now with a different 
>error message: 
>
>The following message appears when launching the program with 2 processes: 
>mpiexec -np 2 -bind-to core -tag-output ./a.out
>
>[1,0]: on nodemaster: iwin= 685 :
>[1,0]:  total storage [MByte] alloc. in shared windows so far:   
>137.
>[ [1,0]: === allocation of shared window no. iwin= 686
>[1,0]:  starting now with idim_1=   5
>---
>-- It appears as if there is not enough space for 
>/tmp/openmpi-sessions-rachner@r5i5n13_0/48127/1/shared_window_688.r5i5n
>13 (the shared-memory backing file). It is likely that your MPI job 
>will now either abort or experience performance degradation.
>
>  Local host:  r5i5n13
>  Space Requested: 204256 B
>  Space Available: 208896 B
>---
>--- [r5i5n13:26917] *** An error occurred in MPI_Win_allocate_shared 
>[r5i5n13:26917] *** reported by process [3154051073,140733193388032] 
>[r5i5n13:26917] *** on communicator MPI_COMM_WORLD [r5i5n13:26917] *** 
>MPI_ERR_INTERN: internal error [r5i5n13:26917] *** MPI_ERRORS_ARE_FATAL 
>(processes in this communicator will now abort,
>[r5i5n13:26917] ***and potentially your MPI job)
>rachner@r5i5n13:~/dat>
>***
>*
>
>
>When I repeat the run using 24 processes (on same compute node) the same kind 
>of abort message occurs, but earlier:
>
>[1,0]: on nodemaster: iwin= 231 :
>[1,0]:  total storage [MByte] alloc. in shared windows so far:   
>46.2
> [1,0]: === allocation of shared window no. iwin= 232
>[1,0]:  starting now with idim_1=   5
>---
>-- It appears as if there is not enough space for 
>/tmp/openmpi-sessions-rachner@r5i5n13_0/48029/1/shared_window_234.r5i5n
>13 (the shared-memory backing file). It is likely that your MPI job 
>will now either abort or experience performance degradation.
>
>  Local host:  r5i5n13
>  Space Requested: 204784 B
>  Space Available: 131072 B
>---
>--- [r5i5n13:26947] *** An error occurred in MPI_Win_allocate_shared 
>[r5i5n13:26947] *** reported by process [3147628545,140733193388032] 
>[r5i5n13:26947] *** on communicator MPI_COMM_WORLD [r5i5n13:26947] *** 
>MPI_ERR_INTERN: internal error [r5i5n13:26947] *** MPI_ERRORS_ARE_FATAL 
>(processes in this communicator will now abort,
>[r5i5n13:26947] ***and potentially your MPI job)
>rachner@r5i5n13:~/dat>
>***
>*
>
>So the problem is not yet resolved.
>
>Greetings
> Michael Rachner
>
>
>
>
>
>
>-Ursprüngliche Nachricht-
>Von: Rachner, Michael
>Gesendet: Montag, 27. Oktober 2014 11:49
>An: 'Open MPI Users'
>Betreff: AW: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in 
>shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code
>
>Dear Mr. Squyres.
>
>We will try to install your bug-fixed nigthly tarball of 2014-10-24 on 
>Cluster5 to see whether it works or not.
>The installation however will take some time. I get back to you, if I know 
>more.
>
>Let me add the information that on the Laki each nodes has 16 GB of shared 
>memory (there it worked), the login-node on Cluster 5 has 64 GB (there it 
>worked too), whereas the compute nodes on Cluster5 have 128 GB (there it did 
>not work).
>So possibly the bug might have something to do with the size of the physical 
>shared memory available on the node.
>
>Greetings
>Michael Rachner
>
>-Ursprüngliche Nachricht-
>Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Jeff 
>Squyres (jsquyres)
>Gesendet: Freitag, 24. Oktober 2014 22:45
>An: Open MPI User's List
>Betreff: Re: [OMPI users] Bug in OpenMPI-1.8.3:

Re: [OMPI users] WG: Bug in OpenMPI-1.8.3: storage limition in shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

2014-10-27 Thread Gilles Gouaillardet
Michael,

The available space must be greater than the requested size + 5%

From the logs, the error message makes sense to me : there is not enough space 
in /tmp
Since the compute nodes have a lot of memory, you might want to try using 
/dev/shm instead of /tmp for the backing files

Cheers,

Gilles

michael.rach...@dlr.de wrote:
>Dear developers of OPENMPI,
>
>We have now installed and tested the bugfixed OPENMPI Nightly Tarball  of 
>2014-10-24  (openmpi-dev-176-g9334abc.tar.gz) on Cluster5 .
>As before (with OPENMPI-1.8.3 release version) the small Ftn-testprogram runs 
>correctly on the login-node.
>As before the program aborts on the compute node, but now with a different 
>error message: 
>
>The following message appears when launching the program with 2 processes: 
>mpiexec -np 2 -bind-to core -tag-output ./a.out
>
>[1,0]: on nodemaster: iwin= 685 :
>[1,0]:  total storage [MByte] alloc. in shared windows so far:   
>137.
>[ [1,0]: === allocation of shared window no. iwin= 686
>[1,0]:  starting now with idim_1=   5
>-
>It appears as if there is not enough space for 
>/tmp/openmpi-sessions-rachner@r5i5n13_0/48127/1/shared_window_688.r5i5n13 (the 
>shared-memory backing
>file). It is likely that your MPI job will now either abort or experience
>performance degradation.
>
>  Local host:  r5i5n13
>  Space Requested: 204256 B
>  Space Available: 208896 B
>--
>[r5i5n13:26917] *** An error occurred in MPI_Win_allocate_shared
>[r5i5n13:26917] *** reported by process [3154051073,140733193388032]
>[r5i5n13:26917] *** on communicator MPI_COMM_WORLD
>[r5i5n13:26917] *** MPI_ERR_INTERN: internal error
>[r5i5n13:26917] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will 
>now abort,
>[r5i5n13:26917] ***and potentially your MPI job)
>rachner@r5i5n13:~/dat>
>
>
>
>When I repeat the run using 24 processes (on same compute node) the same kind 
>of abort message occurs, but earlier:
>
>[1,0]: on nodemaster: iwin= 231 :
>[1,0]:  total storage [MByte] alloc. in shared windows so far:   
>46.2
> [1,0]: === allocation of shared window no. iwin= 232
>[1,0]:  starting now with idim_1=   5
>-
>It appears as if there is not enough space for 
>/tmp/openmpi-sessions-rachner@r5i5n13_0/48029/1/shared_window_234.r5i5n13 (the 
>shared-memory backing
>file). It is likely that your MPI job will now either abort or experience
>performance degradation.
>
>  Local host:  r5i5n13
>  Space Requested: 204784 B
>  Space Available: 131072 B
>--
>[r5i5n13:26947] *** An error occurred in MPI_Win_allocate_shared
>[r5i5n13:26947] *** reported by process [3147628545,140733193388032]
>[r5i5n13:26947] *** on communicator MPI_COMM_WORLD
>[r5i5n13:26947] *** MPI_ERR_INTERN: internal error
>[r5i5n13:26947] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will 
>now abort,
>[r5i5n13:26947] ***and potentially your MPI job)
>rachner@r5i5n13:~/dat>
>
>
>So the problem is not yet resolved.
>
>Greetings
> Michael Rachner
>
>
>
>
>
>
>-Ursprüngliche Nachricht-
>Von: Rachner, Michael 
>Gesendet: Montag, 27. Oktober 2014 11:49
>An: 'Open MPI Users'
>Betreff: AW: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared 
>memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code
>
>Dear Mr. Squyres.
>
>We will try to install your bug-fixed nigthly tarball of 2014-10-24 on 
>Cluster5 to see whether it works or not.
>The installation however will take some time. I get back to you, if I know 
>more.
>
>Let me add the information that on the Laki each nodes has 16 GB of shared 
>memory (there it worked), the login-node on Cluster 5 has 64 GB (there it 
>worked too), whereas the compute nodes on Cluster5 have 128 GB (there it did 
>not work).
>So possibly the bug might have something to do with the size of the physical 
>shared memory available on the node.
>
>Greetings
>Michael Rachner
>
>-Ursprüngliche Nachricht-
>Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Jeff Squyres 
>(jsquyres)
>Gesendet: Freitag, 24. Oktober 2014 22:45
>An: Open MPI User's List
>Betreff: Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared 
>memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code
>
>Nathan tells me that this may well be related to a fi

Re: [OMPI users] WG: Bug in OpenMPI-1.8.3: storage limition in shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

2014-10-27 Thread Gilles Gouaillardet
Michael,

Could you please run
mpirun -np 1 df -h
mpirun -np 1 df -hi
on both compute and login nodes

Thanks

Gilles

michael.rach...@dlr.de wrote:
>Dear developers of OPENMPI,
>
>We have now installed and tested the bugfixed OPENMPI Nightly Tarball  of 
>2014-10-24  (openmpi-dev-176-g9334abc.tar.gz) on Cluster5 .
>As before (with OPENMPI-1.8.3 release version) the small Ftn-testprogram runs 
>correctly on the login-node.
>As before the program aborts on the compute node, but now with a different 
>error message: 
>
>The following message appears when launching the program with 2 processes: 
>mpiexec -np 2 -bind-to core -tag-output ./a.out
>
>[1,0]: on nodemaster: iwin= 685 :
>[1,0]:  total storage [MByte] alloc. in shared windows so far:   
>137.
>[ [1,0]: === allocation of shared window no. iwin= 686
>[1,0]:  starting now with idim_1=   5
>-
>It appears as if there is not enough space for 
>/tmp/openmpi-sessions-rachner@r5i5n13_0/48127/1/shared_window_688.r5i5n13 (the 
>shared-memory backing
>file). It is likely that your MPI job will now either abort or experience
>performance degradation.
>
>  Local host:  r5i5n13
>  Space Requested: 204256 B
>  Space Available: 208896 B
>--
>[r5i5n13:26917] *** An error occurred in MPI_Win_allocate_shared
>[r5i5n13:26917] *** reported by process [3154051073,140733193388032]
>[r5i5n13:26917] *** on communicator MPI_COMM_WORLD
>[r5i5n13:26917] *** MPI_ERR_INTERN: internal error
>[r5i5n13:26917] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will 
>now abort,
>[r5i5n13:26917] ***and potentially your MPI job)
>rachner@r5i5n13:~/dat>
>
>
>
>When I repeat the run using 24 processes (on same compute node) the same kind 
>of abort message occurs, but earlier:
>
>[1,0]: on nodemaster: iwin= 231 :
>[1,0]:  total storage [MByte] alloc. in shared windows so far:   
>46.2
> [1,0]: === allocation of shared window no. iwin= 232
>[1,0]:  starting now with idim_1=   5
>-
>It appears as if there is not enough space for 
>/tmp/openmpi-sessions-rachner@r5i5n13_0/48029/1/shared_window_234.r5i5n13 (the 
>shared-memory backing
>file). It is likely that your MPI job will now either abort or experience
>performance degradation.
>
>  Local host:  r5i5n13
>  Space Requested: 204784 B
>  Space Available: 131072 B
>--
>[r5i5n13:26947] *** An error occurred in MPI_Win_allocate_shared
>[r5i5n13:26947] *** reported by process [3147628545,140733193388032]
>[r5i5n13:26947] *** on communicator MPI_COMM_WORLD
>[r5i5n13:26947] *** MPI_ERR_INTERN: internal error
>[r5i5n13:26947] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will 
>now abort,
>[r5i5n13:26947] ***and potentially your MPI job)
>rachner@r5i5n13:~/dat>
>
>
>So the problem is not yet resolved.
>
>Greetings
> Michael Rachner
>
>
>
>
>
>
>-Ursprüngliche Nachricht-
>Von: Rachner, Michael 
>Gesendet: Montag, 27. Oktober 2014 11:49
>An: 'Open MPI Users'
>Betreff: AW: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared 
>memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code
>
>Dear Mr. Squyres.
>
>We will try to install your bug-fixed nigthly tarball of 2014-10-24 on 
>Cluster5 to see whether it works or not.
>The installation however will take some time. I get back to you, if I know 
>more.
>
>Let me add the information that on the Laki each nodes has 16 GB of shared 
>memory (there it worked), the login-node on Cluster 5 has 64 GB (there it 
>worked too), whereas the compute nodes on Cluster5 have 128 GB (there it did 
>not work).
>So possibly the bug might have something to do with the size of the physical 
>shared memory available on the node.
>
>Greetings
>Michael Rachner
>
>-Ursprüngliche Nachricht-
>Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Jeff Squyres 
>(jsquyres)
>Gesendet: Freitag, 24. Oktober 2014 22:45
>An: Open MPI User's List
>Betreff: Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared 
>memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code
>
>Nathan tells me that this may well be related to a fix that was literally just 
>pulled into the v1.8 branch today:
>
>https://github.com/open-mpi/ompi-release/pull/56
>
>Would you mind testing any nightly tarball after tonight?  (i.e

[OMPI users] WG: Bug in OpenMPI-1.8.3: storage limition in shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

2014-10-27 Thread Michael.Rachner
Dear developers of OPENMPI,

We have now installed and tested the bugfixed OPENMPI Nightly Tarball  of 
2014-10-24  (openmpi-dev-176-g9334abc.tar.gz) on Cluster5 .
As before (with OPENMPI-1.8.3 release version) the small Ftn-testprogram runs 
correctly on the login-node.
As before the program aborts on the compute node, but now with a different 
error message: 

The following message appears when launching the program with 2 processes: 
mpiexec -np 2 -bind-to core -tag-output ./a.out

[1,0]: on nodemaster: iwin= 685 :
[1,0]:  total storage [MByte] alloc. in shared windows so far:   
137.
[ [1,0]: === allocation of shared window no. iwin= 686
[1,0]:  starting now with idim_1=   5
-
It appears as if there is not enough space for 
/tmp/openmpi-sessions-rachner@r5i5n13_0/48127/1/shared_window_688.r5i5n13 (the 
shared-memory backing
file). It is likely that your MPI job will now either abort or experience
performance degradation.

  Local host:  r5i5n13
  Space Requested: 204256 B
  Space Available: 208896 B
--
[r5i5n13:26917] *** An error occurred in MPI_Win_allocate_shared
[r5i5n13:26917] *** reported by process [3154051073,140733193388032]
[r5i5n13:26917] *** on communicator MPI_COMM_WORLD
[r5i5n13:26917] *** MPI_ERR_INTERN: internal error
[r5i5n13:26917] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will 
now abort,
[r5i5n13:26917] ***and potentially your MPI job)
rachner@r5i5n13:~/dat>



When I repeat the run using 24 processes (on same compute node) the same kind 
of abort message occurs, but earlier:

[1,0]: on nodemaster: iwin= 231 :
[1,0]:  total storage [MByte] alloc. in shared windows so far:   
46.2
 [1,0]: === allocation of shared window no. iwin= 232
[1,0]:  starting now with idim_1=   5
-
It appears as if there is not enough space for 
/tmp/openmpi-sessions-rachner@r5i5n13_0/48029/1/shared_window_234.r5i5n13 (the 
shared-memory backing
file). It is likely that your MPI job will now either abort or experience
performance degradation.

  Local host:  r5i5n13
  Space Requested: 204784 B
  Space Available: 131072 B
--
[r5i5n13:26947] *** An error occurred in MPI_Win_allocate_shared
[r5i5n13:26947] *** reported by process [3147628545,140733193388032]
[r5i5n13:26947] *** on communicator MPI_COMM_WORLD
[r5i5n13:26947] *** MPI_ERR_INTERN: internal error
[r5i5n13:26947] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will 
now abort,
[r5i5n13:26947] ***and potentially your MPI job)
rachner@r5i5n13:~/dat>


So the problem is not yet resolved.

Greetings
 Michael Rachner






-Ursprüngliche Nachricht-
Von: Rachner, Michael 
Gesendet: Montag, 27. Oktober 2014 11:49
An: 'Open MPI Users'
Betreff: AW: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared 
memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

Dear Mr. Squyres.

We will try to install your bug-fixed nigthly tarball of 2014-10-24 on Cluster5 
to see whether it works or not.
The installation however will take some time. I get back to you, if I know more.

Let me add the information that on the Laki each nodes has 16 GB of shared 
memory (there it worked), the login-node on Cluster 5 has 64 GB (there it 
worked too), whereas the compute nodes on Cluster5 have 128 GB (there it did 
not work).
So possibly the bug might have something to do with the size of the physical 
shared memory available on the node.

Greetings
Michael Rachner

-Ursprüngliche Nachricht-
Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Jeff Squyres 
(jsquyres)
Gesendet: Freitag, 24. Oktober 2014 22:45
An: Open MPI User's List
Betreff: Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared 
memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

Nathan tells me that this may well be related to a fix that was literally just 
pulled into the v1.8 branch today:

https://github.com/open-mpi/ompi-release/pull/56

Would you mind testing any nightly tarball after tonight?  (i.e., the v1.8 
tarballs generated tonight will be the first ones to contain this fix)

http://www.open-mpi.org/nightly/master/



On Oct 24, 2014, at 11:46 AM,   
wrote:

> Dear developers of OPENMPI,
>  
> I am running a small downsized Fortran-testprogram for sh

Re: [OMPI users] OMPI users] Possible Memory Leak in simple PingPong-Routine with OpenMPI 1.8.3?

2014-10-27 Thread Gilles Gouaillardet
Thanks Marco,

I could reproduce the issue even with one node sending/receiving to itself.

I will investigate this tomorrow

Cheers,

Gilles

Marco Atzeri  wrote:
>
>
>On 10/27/2014 10:30 AM, Gilles Gouaillardet wrote:
>> Hi,
>>
>> i tested on a RedHat 6 like linux server and could not observe any
>> memory leak.
>>
>> BTW, are you running 32 or 64 bits cygwin ? and what is your configure
>> command line ?
>>
>> Thanks,
>>
>> Gilles
>>
>
>the problem is present in both versions.
>
>cygwin 1.8.3-1 packages  are built with configure:
>
>  --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin 
>--libexecdir=/usr/libexec --datadir=/usr/share --localstatedir=/var 
>--sysconfdir=/etc --libdir=/usr/lib --datarootdir=/usr/share 
>--docdir=/usr/share/doc/openmpi --htmldir=/usr/share/doc/openmpi/html -C 
>LDFLAGS=-Wl,--export-all-symbols --disable-mca-dso --disable-sysv-shmem 
>--enable-cxx-exceptions --with-threads=posix --without-cs-fs 
>--with-mpi-param_check=always --enable-contrib-no-build=vt,libompitrace 
>--enable-mca-no-build=paffinity,installdirs-windows,timer-windows,shmem-sysv
>
>Regards
>Marco
>
>___
>users mailing list
>us...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>Link to this post: 
>http://www.open-mpi.org/community/lists/users/2014/10/25604.php


Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

2014-10-27 Thread Michael.Rachner
Dear Mr. Squyres.

We will try to install your bug-fixed nigthly tarball of 2014-10-24 on Cluster5 
to see whether it works or not.
The installation however will take some time. I get back to you, if I know more.

Let me add the information that on the Laki each nodes has 16 GB of shared 
memory (there it worked),
the login-node on Cluster 5 has 64 GB (there it worked too), whereas the 
compute nodes on Cluster5 have 128 GB (there it did not work).
So possibly the bug might have something to do with the size of the physical 
shared memory available on the node.

Greetings
Michael Rachner

-Ursprüngliche Nachricht-
Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Jeff Squyres 
(jsquyres)
Gesendet: Freitag, 24. Oktober 2014 22:45
An: Open MPI User's List
Betreff: Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared 
memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

Nathan tells me that this may well be related to a fix that was literally just 
pulled into the v1.8 branch today:

https://github.com/open-mpi/ompi-release/pull/56

Would you mind testing any nightly tarball after tonight?  (i.e., the v1.8 
tarballs generated tonight will be the first ones to contain this fix)

http://www.open-mpi.org/nightly/master/



On Oct 24, 2014, at 11:46 AM,   
wrote:

> Dear developers of OPENMPI,
>  
> I am running a small downsized Fortran-testprogram for shared memory 
> allocation (using MPI_WIN_ALLOCATE_SHARED and  MPI_WIN_SHARED_QUERY) )
> on only 1 node   of 2 different Linux-clusters with OPENMPI-1.8.3 and 
> Intel-14.0.4 /Intel-13.0.1, respectively.
>  
> The program simply allocates a sequence of shared data windows, each 
> consisting of 1 integer*4-array.
> None of the windows is freed, so the amount of allocated data  in shared 
> windows raises during the course of the execution.
>  
> That worked well on the 1st cluster (Laki, having 8 procs per node))  
> when allocating even 1000 shared windows each having 5 integer*4 array 
> elements, i.e. a total of  200 MBytes.
> On the 2nd cluster (Cluster5, having 24 procs per node) it also worked on the 
> login node, but it did NOT work on a compute node.
> In that error case, there occurs something like an internal storage limit of 
> ~ 140 MB for the total storage allocated in all shared windows.
> When that limit is reached, all later shared memory allocations fail (but 
> silently).
> So the first attempt to use such a bad shared data window results in a bus 
> error due to the bad storage address encountered.
>  
> That strange behavior could be observed in the small testprogram but also 
> with my large Fortran CFD-code.
> If the error occurs, then it occurs with both codes, and both at a storage 
> limit of  ~140 MB.
> I found that this storage limit depends only weakly on  the number of 
> processes (for np=2,4,8,16,24  it is: 144.4 , 144.0, 141.0, 137.0, 
> 132.2 MB)
>  
> Note that the shared memory storage available on both clusters was very large 
> (many GB of free memory).
>  
> Here is the error message when running with np=2 and an  array 
> dimension of idim_1=5  for the integer*4 array allocated per shared 
> window on the compute node of Cluster5:
> In that case, the error occurred at the 723-th shared window, which is the 
> 1st badly allocated window in that case:
> (722 successfully allocated shared windows * 5 array elements * 4 
> Bytes/el. = 144.4 MB)
>  
>  
> [1,0]: on nodemaster: iwin= 722 :
> [1,0]:  total storage [MByte] alloc. in shared windows so far:   
> 144.4000
> [1,0]: === allocation of shared window no. iwin= 723
> [1,0]:  starting now with idim_1=   5
> [1,0]: on nodemaster for iwin= 723 : before writing 
> on shared mem
> [1,0]:[r5i5n13:12597] *** Process received signal *** 
> [1,0]:[r5i5n13:12597] Signal: Bus error (7) 
> [1,0]:[r5i5n13:12597] Signal code: Non-existant physical 
> address (2) [1,0]:[r5i5n13:12597] Failing at address: 
> 0x7fffe08da000 [1,0]:[r5i5n13:12597] [ 0] 
> [1,0]:/lib64/libpthread.so.0(+0xf800)[0x76d67800]
> [1,0]:[r5i5n13:12597] [ 1] ./a.out[0x408a8b] 
> [1,0]:[r5i5n13:12597] [ 2] ./a.out[0x40800c] 
> [1,0]:[r5i5n13:12597] [ 3] 
> [1,0]:/lib64/libc.so.6(__libc_start_main+0xe6)[0x769fec36]
> [1,0]:[r5i5n13:12597] [ 4] [1,0]:./a.out[0x407f09] 
> [1,0]:[r5i5n13:12597] *** End of error message ***
> [1,1]:forrtl: error (78): process killed (SIGTERM)
> [1,1]:Image  PCRoutineLine
> Source
> [1,1]:libopen-pal.so.6   74B74580  Unknown   
> Unknown  Unknown
> [1,1]:libmpi.so.177267F3E  Unknown   
> Unknown  Unknown
> [1,1]:libmpi.so.17733B555  Unknown   
> Unknown  Unknown
> [1,1]:libmpi.so.17727DFFD  Unknown   
> Unknown  Unknown
> [1,1]:libmpi_mpifh.so.2  7779BA03  Unknown   
> Unknown  Unknown
> [1,1]:a.

Re: [OMPI users] Possible Memory Leak in simple PingPong-Routine with OpenMPI 1.8.3?

2014-10-27 Thread Marco Atzeri



On 10/27/2014 10:30 AM, Gilles Gouaillardet wrote:

Hi,

i tested on a RedHat 6 like linux server and could not observe any
memory leak.

BTW, are you running 32 or 64 bits cygwin ? and what is your configure
command line ?

Thanks,

Gilles



the problem is present in both versions.

cygwin 1.8.3-1 packages  are built with configure:

 --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin 
--libexecdir=/usr/libexec --datadir=/usr/share --localstatedir=/var 
--sysconfdir=/etc --libdir=/usr/lib --datarootdir=/usr/share 
--docdir=/usr/share/doc/openmpi --htmldir=/usr/share/doc/openmpi/html -C 
LDFLAGS=-Wl,--export-all-symbols --disable-mca-dso --disable-sysv-shmem 
--enable-cxx-exceptions --with-threads=posix --without-cs-fs 
--with-mpi-param_check=always --enable-contrib-no-build=vt,libompitrace 
--enable-mca-no-build=paffinity,installdirs-windows,timer-windows,shmem-sysv


Regards
Marco



Re: [OMPI users] Possible Memory Leak in simple PingPong-Routine with OpenMPI 1.8.3?

2014-10-27 Thread Gilles Gouaillardet
Hi,

i tested on a RedHat 6 like linux server and could not observe any
memory leak.

BTW, are you running 32 or 64 bits cygwin ? and what is your configure
command line ?

Thanks,

Gilles

On 2014/10/27 18:26, Marco Atzeri wrote:
> On 10/27/2014 8:30 AM, maxinator333 wrote:
>> Hello,
>>
>> I noticed this weird behavior, because after a certain time of more than
>> one minute the transfer rates of MPI_Send and MPI_Recv dropped by a
>> factor of 100+. By chance I saw, that my program did allocate more and
>> more memory. I have the following minimal working example:
>>
>> #include 
>> #include 
>>
>> const uint32_t MSG_LENGTH = 256;
>>
>> int main(int argc, char* argv[]) {
>>  MPI_Init(NULL, NULL);
>>  int rank;
>>  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>
>>  volatile char * msg  = (char*) malloc( sizeof(char) *
>> MSG_LENGTH );
>>
>>  for (uint64_t i = 0; i < 1e9; i++) {
>>  if ( rank == 1 ) {
>>  MPI_Recv( const_cast(msg), MSG_LENGTH, MPI_CHAR,
>>rank-1, 0, MPI_COMM_WORLD,
>> MPI_STATUS_IGNORE);
>>  MPI_Send( const_cast(msg), MSG_LENGTH, MPI_CHAR,
>>rank-1, 0, MPI_COMM_WORLD);
>>  } else if ( rank == 0 ) {
>>  MPI_Send( const_cast(msg), MSG_LENGTH, MPI_CHAR,
>>rank+1, 0, MPI_COMM_WORLD);
>>  MPI_Recv( const_cast(msg), MSG_LENGTH, MPI_CHAR,
>>rank+1, 0, MPI_COMM_WORLD,
>> MPI_STATUS_IGNORE);
>>  }
>>  MPI_Barrier( MPI_COMM_WORLD );
>>  for (uint32_t k = 0; k < MSG_LENGTH; k++)
>>  msg[k]++;
>>  }
>>
>>  MPI_Finalize();
>>  return 0;
>> }
>>
>>
>> I run this with mpirun -n 2 ./pingpong_memleak.exe
>>
>> The program does nothing more than send a message from rank 0 to rank 1,
>> then from rank 1 to rank 0 and so on in standard blocking mode, not even
>> asynchronous.
>>
>> Running the program will allocate roughly 30mb/s (Windows Task Manager)
>> until it stops at around 1.313.180kb. This is when the transfer rates
>> (not being measured in above snippet) drop significantly to maybe a
>> second per send instead of roughly 1µs.
>>
>> I use Cygwin with Windows 7 and 16Gb RAM. I haven't tested this minimal
>> working example on other setups.
>
> Can someone test on other platforms and confirm me that is a cygwin
> specific issue ?
>
> Regards
> Marco
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/10/25602.php



Re: [OMPI users] Possible Memory Leak in simple PingPong-Routine with OpenMPI 1.8.3?

2014-10-27 Thread Marco Atzeri

On 10/27/2014 8:30 AM, maxinator333 wrote:

Hello,

I noticed this weird behavior, because after a certain time of more than
one minute the transfer rates of MPI_Send and MPI_Recv dropped by a
factor of 100+. By chance I saw, that my program did allocate more and
more memory. I have the following minimal working example:

#include 
#include 

const uint32_t MSG_LENGTH = 256;

int main(int argc, char* argv[]) {
 MPI_Init(NULL, NULL);
 int rank;
 MPI_Comm_rank(MPI_COMM_WORLD, &rank);

 volatile char * msg  = (char*) malloc( sizeof(char) * MSG_LENGTH );

 for (uint64_t i = 0; i < 1e9; i++) {
 if ( rank == 1 ) {
 MPI_Recv( const_cast(msg), MSG_LENGTH, MPI_CHAR,
   rank-1, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
 MPI_Send( const_cast(msg), MSG_LENGTH, MPI_CHAR,
   rank-1, 0, MPI_COMM_WORLD);
 } else if ( rank == 0 ) {
 MPI_Send( const_cast(msg), MSG_LENGTH, MPI_CHAR,
   rank+1, 0, MPI_COMM_WORLD);
 MPI_Recv( const_cast(msg), MSG_LENGTH, MPI_CHAR,
   rank+1, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
 }
 MPI_Barrier( MPI_COMM_WORLD );
 for (uint32_t k = 0; k < MSG_LENGTH; k++)
 msg[k]++;
 }

 MPI_Finalize();
 return 0;
}


I run this with mpirun -n 2 ./pingpong_memleak.exe

The program does nothing more than send a message from rank 0 to rank 1,
then from rank 1 to rank 0 and so on in standard blocking mode, not even
asynchronous.

Running the program will allocate roughly 30mb/s (Windows Task Manager)
until it stops at around 1.313.180kb. This is when the transfer rates
(not being measured in above snippet) drop significantly to maybe a
second per send instead of roughly 1µs.

I use Cygwin with Windows 7 and 16Gb RAM. I haven't tested this minimal
working example on other setups.


Can someone test on other platforms and confirm me that is a cygwin
specific issue ?

Regards
Marco


Re: [OMPI users] MPI_Init seems to hang, but works after a minute or two

2014-10-27 Thread Marco Atzeri

On 10/27/2014 8:32 AM, maxinator333 wrote:

Hello,


After compiling and running a MPI program, it seems to hang at
MPI_Init(), but it eventually will work after a minute or two.

While the problem occured on my Notebook it did not on my desktop PC.


It can be a timeout on a network interface.
I see a similar issue with wireless ON but not with wireless OFF
on my notebook.

In the past I saw with some virtual driver of Telecom company
for the 3G driver.


Both run on Win 7, cygwin 64 Bit, OpenMPI version 1.8.3 r32794
(ompi_info), g++ v 4.8.3. I actually synced the cygwin installations
later on, and it still didn't work, but it did for a short time after a
restart...


Regards
Marco


Re: [OMPI users] which info is needed for SIGSEGV in Java foropenmpi-dev-124-g91e9686on Solaris

2014-10-27 Thread Oscar Vega-Gisbert

Hi Takahiro, Gilles, Siegmar,

Thank you very much for all your fix.
I don't notice about calling 'mca_base_var_register' before MPI_Init.
I'm sorry for the inconvenience.

Regards,
Oscar

El 27/10/14 07:16, Gilles Gouaillardet escribió:

Kawashima-san,

thanks a lot for the detailled explanation.
FWIW, i was previously testing on Solaris 11 that behaves like Linux : 
printf("%s", NULL) outputs '(null)'

vs a SIGSEGV on Solaris 10

i commited a16c1e44189366fbc8e967769e050f517a40f3f8 in order to fix 
this issue

(i moved the call to mca_base_var_register *after* MPI_Init)

regarding the BUS error reported by Siegmar, i also commited 
62bde1fcb554079143030bb305512c236672386f
in order to fix it (this is based on code review only, i have no 
sparc64 hardware to test it is enough)


Siegmar, --enable-heterogeneous is known to be broken on the trunk, 
and there are discussions on how to fix it.
in the mean time, you can either apply the attached minimal 
heterogeneous.diff patch or avoid the --enable-heterogeneous option
/* the attached patch "fixes" --enable-heterogeneous on homogeneous 
clusters *only* */


about attaching a process with gdb, i usually run
gdb none 
on Linux and everything is fine
on Solaris, i had to do
gdb /usr/bin/java 
in order to get the symbols loaded by gdb
and then
thread 11
f 3
set _dbg=0
/* but this is likely environment specific */

Cheers,

Gilles

On 2014/10/27 10:58, Ralph Castain wrote:

Oh yeah - that would indeed be very bad :-(



On Oct 26, 2014, at 6:06 PM, Kawashima, Takahiro  
wrote:

Siegmar, Oscar,

I suspect that the problem is calling mca_base_var_register
without initializing OPAL in JNI_OnLoad.

ompi/mpi/java/c/mpi_MPI.c:

jint JNI_OnLoad(JavaVM *vm, void *reserved)
{
libmpi = dlopen("libmpi." OPAL_DYN_LIB_SUFFIX, RTLD_NOW | RTLD_GLOBAL);

if(libmpi == NULL)
{
fprintf(stderr, "Java bindings failed to load liboshmem.\n");
exit(1);
}

mca_base_var_register("ompi", "mpi", "java", "eager",
  "Java buffers eager size",
  MCA_BASE_VAR_TYPE_INT, NULL, 0, 0,
  OPAL_INFO_LVL_5,
  MCA_BASE_VAR_SCOPE_READONLY,
  &ompi_mpi_java_eager);

return JNI_VERSION_1_6;
}


I suppose JNI_OnLoad is the first function in the libmpi_java.so
which is called by JVM. So OPAL is not initialized yet.
As shown in Siegmar's JRE log, SEGV occurred in asprintf called
by mca_base_var_cache_files.

Siegmar's hs_err_pid13080.log:

siginfo:si_signo=SIGSEGV: si_errno=0, si_code=1 (SEGV_MAPERR), 
si_addr=0x

Stack: [0x7b40,0x7b50],  sp=0x7b4fc730,  free 
space=1009k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libc.so.1+0x3c7f0]  strlen+0x50
C  [libc.so.1+0xaf640]  vsnprintf+0x84
C  [libc.so.1+0xaadb4]  vasprintf+0x20
C  [libc.so.1+0xaaf04]  asprintf+0x28
C  [libopen-pal.so.0.0.0+0xaf3cc]  mca_base_var_cache_files+0x160
C  [libopen-pal.so.0.0.0+0xaed90]  mca_base_var_init+0x4e8
C  [libopen-pal.so.0.0.0+0xb260c]  register_variable+0x214
C  [libopen-pal.so.0.0.0+0xb36a0]  mca_base_var_register+0x104
C  [libmpi_java.so.0.0.0+0x221e8]  JNI_OnLoad+0x128
C  [libjava.so+0x10860]  Java_java_lang_ClassLoader_00024NativeLibrary_load+0xb8
j  java.lang.ClassLoader$NativeLibrary.load(Ljava/lang/String;Z)V+-665819
j  java.lang.ClassLoader$NativeLibrary.load(Ljava/lang/String;Z)V+0
j  java.lang.ClassLoader.loadLibrary0(Ljava/lang/Class;Ljava/io/File;)Z+328
j  java.lang.ClassLoader.loadLibrary(Ljava/lang/Class;Ljava/lang/String;Z)V+290
j  java.lang.Runtime.loadLibrary0(Ljava/lang/Class;Ljava/lang/String;)V+54
j  java.lang.System.loadLibrary(Ljava/lang/String;)V+7
j  mpi.MPI.()V+28


mca_base_var_cache_files passes opal_install_dirs.sysconfdir to
asprintf.

opal/mca/base/mca_base_var.c:

asprintf(&mca_base_var_files, "%s"OPAL_PATH_SEP".openmpi" OPAL_PATH_SEP
 "mca-params.conf%c%s" OPAL_PATH_SEP "openmpi-mca-params.conf",
 home, OPAL_ENV_SEP, opal_install_dirs.sysconfdir);


In this situation, opal_install_dirs.sysconfdir is still NULL.

I run a MPI Java program that only calls MPI.Init() and
MPI.Finalize() with MCA variable mpi_show_mca_params=1 on
Linux to confirm this. mca_base_param_files contains "(null)".

mpi_show_mca_params=1:

[ppc:12232] 
mca_base_param_files=/home/rivis/.openmpi/mca-params.conf:(null)/openmpi-mca-params.conf
 (default)
[ppc:12232] 
mca_param_files=/home/rivis/.openmpi/mca-params.conf:(

[OMPI users] MPI_Init seems to hang, but works after a minute or two

2014-10-27 Thread maxinator333

Hello,


After compiling and running a MPI program, it seems to hang at 
MPI_Init(), but it eventually will work after a minute or two.


While the problem occured on my Notebook it did not on my desktop PC. 
Both run on Win 7, cygwin 64 Bit, OpenMPI version 1.8.3 r32794 
(ompi_info), g++ v 4.8.3. I actually synced the cygwin installations 
later on, and it still didn't work, but it did for a short time after a 
restart...


When I started a program on my desktop PC my Firewall Comodo 5.10 
notified me about orterun.exe (mpirun is only a symlink to orterun) and 
myprogram.exe. After I (permanently) allowed these two programs the 
started program still didn't resume execution. After canceling the 
program and restarting it, the program ran without problems, meaning it 
ran fast. Because of this, it seems to me that OpenMPI maybe has only 
insufficient error-handling if it can't connect instantly. Maybe this is 
somehow related to the problem?


The problem existed quite some months on my notebook, so I DID restart 
the notebook before without the problem being solved. I also have Ubuntu 
on that notebook and there I can compile and run MPI programs just fine.


I'm using Comodo Firewall 5.10. on my Desktop and absolutely no 
Firewall, not even the Windows Firewall (deactivated) on my Notebook. 
Installing Comodo 5.10 on my notebook didn't help either. But everything 
worked after restarting, so maybe the Firewall wasn't completely in 
place yet? But the Installation program didn't prompt me to restart and 
the Firewall was already working, so ...


A compiled version from my desktop PC did run on my notebook with 
mpirun... Alas I couldn't replicate this and I tried for hours now. 
Because of this I thought the error lay in the compiler or 
OpenMPI-libraries, but it seems it only works at complete random times.


After syncing my cygwin installation on my notebook with the one on my 
desktop, installing a firewall where I allowed all affected programs and 
restarting my notebook it also shortly worked, but again after that it 
didn't, so this "fix" may have coincided with a "good" time.


Other people have stories of their VPN software interfering with OpenMPI 
and causing exactly these problems, but I don't have such a software 
running.


Things I also fruitlessly tried:

 * closing programs which could jam TCP-IP connections
 * ping 127.0.0.1 works <1ms
 * running bash as administrator
 * running orterun/mpirun from windows cmd instead of cygwin-bash
 * stopping Windows Firewall Service and Windows Defender completely
 * using MPI_Init(NULL,NULL); instead of MPI_Init(&argc,&argv);
 * compiling with gcc instead of g++
 * the program works fine, if I don't start it with mpirun, but it also
   doesn't work, if I start it with mpirun -n 1
 * update Windows
 * using safe mode (with network drivers)
 * trying to debug it (I can't get a useful backtrace to the MPI_init call)



[OMPI users] Possible Memory Leak in simple PingPong-Routine with OpenMPI 1.8.3?

2014-10-27 Thread maxinator333

Hello,

I noticed this weird behavior, because after a certain time of more than 
one minute the transfer rates of MPI_Send and MPI_Recv dropped by a 
factor of 100+. By chance I saw, that my program did allocate more and 
more memory. I have the following minimal working example:


   #include 
   #include 

   const uint32_t MSG_LENGTH = 256;

   int main(int argc, char* argv[]) {
MPI_Init(NULL, NULL);
int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);

volatile char * msg  = (char*) malloc( sizeof(char) * MSG_LENGTH );

for (uint64_t i = 0; i < 1e9; i++) {
if ( rank == 1 ) {
MPI_Recv( const_cast(msg), MSG_LENGTH, MPI_CHAR,
  rank-1, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
MPI_Send( const_cast(msg), MSG_LENGTH, MPI_CHAR,
  rank-1, 0, MPI_COMM_WORLD);
} else if ( rank == 0 ) {
MPI_Send( const_cast(msg), MSG_LENGTH, MPI_CHAR,
  rank+1, 0, MPI_COMM_WORLD);
MPI_Recv( const_cast(msg), MSG_LENGTH, MPI_CHAR,
  rank+1, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
}
MPI_Barrier( MPI_COMM_WORLD );
for (uint32_t k = 0; k < MSG_LENGTH; k++)
msg[k]++;
}

MPI_Finalize();
return 0;
   }


I run this with mpirun -n 2 ./pingpong_memleak.exe

The program does nothing more than send a message from rank 0 to rank 1, 
then from rank 1 to rank 0 and so on in standard blocking mode, not even 
asynchronous.


Running the program will allocate roughly 30mb/s (Windows Task Manager) 
until it stops at around 1.313.180kb. This is when the transfer rates 
(not being measured in above snippet) drop significantly to maybe a 
second per send instead of roughly 1µs.


I use Cygwin with Windows 7 and 16Gb RAM. I haven't tested this minimal 
working example on other setups.


I understand, that it's possible for MPI_Send to just store the message 
in a buffer and then resume the program, even though it's supposed to be 
blocking. But MPI_Recv should in my understanding be really 100% 
blocking. This means, that after each MPI_Recv the buffer should be 
emptied in above code, am I right?


Well it's not the case, so I inserted the MPI_Barrier. Thinking, that at 
least now both blocking operations should be 100% finished, but it only 
dropped the 50mb/s allocation to the mentioned 30mb/s allocation. 
Although I have to add: in my real code the MPI_Barrier mitigates the 
problem to maybe 10kb/s, which is acceptable if the program finishes in 
under 2 days. But it still shouldn't happen.


I also tried to increase the msg in each step in order to prevent 
buffering and caching, but it didn't help either.


What is happening? Can I stop it from happening like with some kind of 
MPI_Flush()?


Re: [OMPI users] which info is needed for SIGSEGV in Javaforopenmpi-dev-124-g91e9686on Solaris

2014-10-27 Thread Siegmar Gross
Hi Gilles, Oscar, Ralph, Takahiro

thank you very much for all your help and time investigating my
problems on Sparc systems.


> thanks a lot for the detailled explanation.
> FWIW, i was previously testing on Solaris 11 that behaves like Linux :
> printf("%s", NULL) outputs '(null)'
> vs a SIGSEGV on Solaris 10
> 
> i commited a16c1e44189366fbc8e967769e050f517a40f3f8 in order to fix this
> issue
> (i moved the call to mca_base_var_register *after* MPI_Init)
> 
> regarding the BUS error reported by Siegmar, i also commited
> 62bde1fcb554079143030bb305512c236672386f
> in order to fix it (this is based on code review only, i have no sparc64
> hardware to test it is enough)

I'll test it, when a new nightly snapshot is available for the trunk.


Kind regards and thank you very much once more

Siegmar



Re: [OMPI users] which info is needed for SIGSEGV in Java foropenmpi-dev-124-g91e9686on Solaris

2014-10-27 Thread Gilles Gouaillardet
Kawashima-san,

thanks a lot for the detailled explanation.
FWIW, i was previously testing on Solaris 11 that behaves like Linux :
printf("%s", NULL) outputs '(null)'
vs a SIGSEGV on Solaris 10

i commited a16c1e44189366fbc8e967769e050f517a40f3f8 in order to fix this
issue
(i moved the call to mca_base_var_register *after* MPI_Init)

regarding the BUS error reported by Siegmar, i also commited
62bde1fcb554079143030bb305512c236672386f
in order to fix it (this is based on code review only, i have no sparc64
hardware to test it is enough)

Siegmar, --enable-heterogeneous is known to be broken on the trunk, and
there are discussions on how to fix it.
in the mean time, you can either apply the attached minimal
heterogeneous.diff patch or avoid the --enable-heterogeneous option
/* the attached patch "fixes" --enable-heterogeneous on homogeneous
clusters *only* */

about attaching a process with gdb, i usually run
gdb none 
on Linux and everything is fine
on Solaris, i had to do
gdb /usr/bin/java 
in order to get the symbols loaded by gdb
and then
thread 11
f 3
set _dbg=0
/* but this is likely environment specific */

Cheers,

Gilles

On 2014/10/27 10:58, Ralph Castain wrote:
> Oh yeah - that would indeed be very bad :-(
>
>
>> On Oct 26, 2014, at 6:06 PM, Kawashima, Takahiro 
>>  wrote:
>>
>> Siegmar, Oscar,
>>
>> I suspect that the problem is calling mca_base_var_register
>> without initializing OPAL in JNI_OnLoad.
>>
>> ompi/mpi/java/c/mpi_MPI.c:
>> 
>> jint JNI_OnLoad(JavaVM *vm, void *reserved)
>> {
>>libmpi = dlopen("libmpi." OPAL_DYN_LIB_SUFFIX, RTLD_NOW | RTLD_GLOBAL);
>>
>>if(libmpi == NULL)
>>{
>>fprintf(stderr, "Java bindings failed to load liboshmem.\n");
>>exit(1);
>>}
>>
>>mca_base_var_register("ompi", "mpi", "java", "eager",
>>  "Java buffers eager size",
>>  MCA_BASE_VAR_TYPE_INT, NULL, 0, 0,
>>  OPAL_INFO_LVL_5,
>>  MCA_BASE_VAR_SCOPE_READONLY,
>>  &ompi_mpi_java_eager);
>>
>>return JNI_VERSION_1_6;
>> }
>> 
>>
>> I suppose JNI_OnLoad is the first function in the libmpi_java.so
>> which is called by JVM. So OPAL is not initialized yet.
>> As shown in Siegmar's JRE log, SEGV occurred in asprintf called
>> by mca_base_var_cache_files.
>>
>> Siegmar's hs_err_pid13080.log:
>> 
>> siginfo:si_signo=SIGSEGV: si_errno=0, si_code=1 (SEGV_MAPERR), 
>> si_addr=0x
>>
>> Stack: [0x7b40,0x7b50],  sp=0x7b4fc730,  
>> free space=1009k
>> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
>> code)
>> C  [libc.so.1+0x3c7f0]  strlen+0x50
>> C  [libc.so.1+0xaf640]  vsnprintf+0x84
>> C  [libc.so.1+0xaadb4]  vasprintf+0x20
>> C  [libc.so.1+0xaaf04]  asprintf+0x28
>> C  [libopen-pal.so.0.0.0+0xaf3cc]  mca_base_var_cache_files+0x160
>> C  [libopen-pal.so.0.0.0+0xaed90]  mca_base_var_init+0x4e8
>> C  [libopen-pal.so.0.0.0+0xb260c]  register_variable+0x214
>> C  [libopen-pal.so.0.0.0+0xb36a0]  mca_base_var_register+0x104
>> C  [libmpi_java.so.0.0.0+0x221e8]  JNI_OnLoad+0x128
>> C  [libjava.so+0x10860]  
>> Java_java_lang_ClassLoader_00024NativeLibrary_load+0xb8
>> j  java.lang.ClassLoader$NativeLibrary.load(Ljava/lang/String;Z)V+-665819
>> j  java.lang.ClassLoader$NativeLibrary.load(Ljava/lang/String;Z)V+0
>> j  java.lang.ClassLoader.loadLibrary0(Ljava/lang/Class;Ljava/io/File;)Z+328
>> j  
>> java.lang.ClassLoader.loadLibrary(Ljava/lang/Class;Ljava/lang/String;Z)V+290
>> j  java.lang.Runtime.loadLibrary0(Ljava/lang/Class;Ljava/lang/String;)V+54
>> j  java.lang.System.loadLibrary(Ljava/lang/String;)V+7
>> j  mpi.MPI.()V+28
>> 
>>
>> mca_base_var_cache_files passes opal_install_dirs.sysconfdir to
>> asprintf.
>>
>> opal/mca/base/mca_base_var.c:
>> 
>>asprintf(&mca_base_var_files, "%s"OPAL_PATH_SEP".openmpi" OPAL_PATH_SEP
>> "mca-params.conf%c%s" OPAL_PATH_SEP "openmpi-mca-params.conf",
>> home, OPAL_ENV_SEP, opal_install_dirs.sysconfdir);
>> 
>>
>> In this situation, opal_install_dirs.sysconfdir is still NULL.
>>
>> I run a MPI Java program that only calls MPI.Init() and
>> MPI.Finalize() with MCA variable mpi_show_mca_params=1 on
>> Linux to confirm this. mca_base_param_files contains "(null)".
>>
>> mpi_show_mca_params=1:
>> 
>> [ppc:12232] 
>> mca_base_param_files=/home/rivis/.openmpi/mca-params.conf:(null)/openmpi-mca-params.conf
>>  (default)
>> [ppc:12232] 
>> mca_param_files=/home/rivis/.openmpi/mca-params.conf:(null)/ope

Re: [OMPI users] OMPI users] OMPI users] OMPI users] which info is needed for SIGSEGV inJava foropenmpi-dev-124-g91e9686on Solaris

2014-10-27 Thread Gilles Gouaillardet
Ralph,

this is also a solution.
the pro is it seems more lightweight than PR #249
the two cons i can see are :
- opal_process_name_t alignment goes from 64 to 32 bits
- some functions (opal_hash_table_*) takes an uint64_t as argument so we
still need to use memcpy in order to
  * guarantee 64 bits alignment on some archs (such as sparc)
  * avoid ugly cast such as uint64_t id = *(uint64_t *)&process_name;

as far as i am concerned, i am fine with your proposed suggestion to
dump opal_identifier_t.

about the patch, did you mean you have something ready i can apply to my
PR ?
or do you expect me to do the changes (i am ok to do it if needed)

Cheers,

Gilles

On 2014/10/27 11:04, Ralph Castain wrote:
> Just took a glance thru 249 and have a few suggestions on it - will pass them 
> along tomorrow. I think the right solution is to (a) dump opal_identifier_t 
> in favor of using opal_process_name_t everywhere in the opal layer, (b) 
> typedef orte_process_name_t to opal_process_name_t, and (c) leave 
> ompi_process_name_t as typedef’d to the RTE component in the MPI layer. This 
> lets other RTEs decide for themselves how they want to handle it.
>
> If you add changes to your branch, I can pass you a patch with my suggested 
> alterations.
>
>> On Oct 26, 2014, at 5:55 PM, Gilles Gouaillardet 
>>  wrote:
>>
>> No :-(
>> I need some extra work to stop declaring orte_process_name_t and 
>> ompi_process_name_t variables.
>> #249 will make things much easier.
>> One option is to use opal_process_name_t everywhere or typedef orte and ompi 
>> types to the opal one.
>> An other (lightweight but error prone imho) is to change variable 
>> declaration only.
>> Any thought ?
>>
>> Ralph Castain  wrote:
>>> Will PR#249 solve it? If so, we should just go with it as I suspect that is 
>>> the long-term solution.
>>>
 On Oct 26, 2014, at 4:25 PM, Gilles Gouaillardet 
  wrote:

 It looks like we faced a similar issue :
 opal_process_name_t is 64 bits aligned wheteas orte_process_name_t is 32 
 bits aligned. If you run an alignment sensitive cpu such as sparc and you 
 are not lucky (so to speak) you can run into this issue.
 i will make a patch for this shortly

 Ralph Castain  wrote:
> Afraid this must be something about the Sparc - just ran on a Solaris 11 
> x86 box and everything works fine.
>
>
>> On Oct 26, 2014, at 8:22 AM, Siegmar Gross 
>>  wrote:
>>
>> Hi Gilles,
>>
>> I wanted to explore which function is called, when I call MPI_Init
>> in a C program, because this function should be called from a Java
>> program as well. Unfortunately C programs break with a Bus Error
>> once more for openmpi-dev-124-g91e9686 on Solaris. I assume that's
>> the reason why I get no useful backtrace for my Java program.
>>
>> tyr small_prog 117 mpicc -o init_finalize init_finalize.c
>> tyr small_prog 118 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec
>> ...
>> (gdb) run -np 1 init_finalize
>> Starting program: /usr/local/openmpi-1.9.0_64_gcc/bin/mpiexec -np 1 
>> init_finalize
>> [Thread debugging using libthread_db enabled]
>> [New Thread 1 (LWP 1)]
>> [New LWP2]
>> [tyr:19240] *** Process received signal ***
>> [tyr:19240] Signal: Bus Error (10)
>> [tyr:19240] Signal code: Invalid address alignment (1)
>> [tyr:19240] Failing at address: 7bd1c10c
>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libopen-pal.so.0.0.0:opal_backtrace_print+0x2c
>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libopen-pal.so.0.0.0:0xdcc04
>> /lib/sparcv9/libc.so.1:0xd8b98
>> /lib/sparcv9/libc.so.1:0xcc70c
>> /lib/sparcv9/libc.so.1:0xcc918
>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libopen-pal.so.0.0.0:opal_proc_set_name+0x1c
>>  [ Signal 10 (BUS)]
>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/openmpi/mca_pmix_native.so:0x103e8
>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/openmpi/mca_ess_pmi.so:0x33dc
>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libopen-rte.so.0.0.0:orte_init+0x67c
>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libmpi.so.0.0.0:ompi_mpi_init+0x374
>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libmpi.so.0.0.0:PMPI_Init+0x2a8
>> /home/fd1026/work/skripte/master/parallel/prog/mpi/small_prog/init_finalize:main+0x20
>> /home/fd1026/work/skripte/master/parallel/prog/mpi/small_prog/init_finalize:_start+0x7c
>> [tyr:19240] *** End of error message ***
>> --
>> mpiexec noticed that process rank 0 with PID 0 on node tyr exited on 
>> signal 10 (Bus Error).
>> --
>> [LWP2 exited]
>> [New Thread 2]
>> [Switching to Thread 1 (LWP 1)]
>> sol_thread_fetch_r

[OMPI users] Java FAQ Page out of date

2014-10-27 Thread Brock Palen
I think a lot of the information on this page:

http://www.open-mpi.org/faq/?category=java

Is out of date with the 1.8 release. 

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985





[OMPI users] HAMSTER MPI+Yarn

2014-10-27 Thread Brock Palen
We are starting to look at supporting MPI on our Hadoop/Spark YARN based 
cluster.  I found a bunch of referneces to Hamster, but what I don't find is if 
it was ever merged into regular OpenMPI, and if so is it just another RM 
integration?  Or does it need more setup?

I found this:
http://pivotalhd.docs.pivotal.io/doc/2100/Hamster.html

Which appears to imply extra setup required.  Is this documented anywhere for 
OpenMPI?

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985