date:20160707

Re: [OMPI users] Java-OpenMPI returns with SIGSEGV

2016-07-07 Thread Saliya Ekanayake

I've received SIGSEGV a few times for different reasons with OpenMPI Java
and one of the most common reasons was the ulimit settings. You might want
to look at -l (max lock memory)  -u (max user processes), -n (open files).

Here's a snapshot of what we use in our clusters running OpenMPI and Java

core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 515696
max locked memory   (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files  (-n) 1048576
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 10240
cpu time   (seconds, -t) unlimited
max user processes  (-u) 196608
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited



On Wed, Jul 6, 2016 at 9:00 PM, Gilles Gouaillardet 
wrote:

> Gundram,
>
>
> fwiw, i cannot reproduce the issue on my box
>
> - centos 7
>
> - java version "1.8.0_71"
>   Java(TM) SE Runtime Environment (build 1.8.0_71-b15)
>   Java HotSpot(TM) 64-Bit Server VM (build 25.71-b15, mixed mode)
>
>
> i noticed on non zero rank saveMem is allocated at each iteration.
> ideally, the garbage collector can take care of that and this should not
> be an issue.
>
> would you mind giving the attached file a try ?
>
> Cheers,
>
> Gilles
>
>
> On 7/7/2016 7:41 AM, Gilles Gouaillardet wrote:
>
> I will have a look at it today
>
> how did you configure OpenMPI ?
>
> Cheers,
>
> Gilles
>
> On Thursday, July 7, 2016, Gundram Leifert <
> gundram.leif...@uni-rostock.de> wrote:
>
>> Hello Giles,
>>
>> thank you for your hints! I did 3 changes, unfortunately the same error
>> occures:
>>
>> update ompi:
>> commit ae8444682f0a7aa158caea08800542ce9874455e
>> Author: Ralph Castain 
>> Date:   Tue Jul 5 20:07:16 2016 -0700
>>
>> update java:
>> java version "1.8.0_92"
>> Java(TM) SE Runtime Environment (build 1.8.0_92-b14)
>> Java HotSpot(TM) Server VM (build 25.92-b14, mixed mode)
>>
>> delete hashcode-lines.
>>
>> Now I get this error message - to 100%, after different number of
>> iterations (15-300):
>>
>>  0/ 3:length = 1
>>  0/ 3:bcast length done (length = 1)
>>  1/ 3:bcast length done (length = 1)
>>  2/ 3:bcast length done (length = 1)
>> #
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  SIGSEGV (0xb) at pc=0x2b3d022fcd24, pid=16578,
>> tid=0x2b3d29716700
>> #
>> # JRE version: Java(TM) SE Runtime Environment (8.0_92-b14) (build
>> 1.8.0_92-b14)
>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.92-b14 mixed mode
>> linux-amd64 compressed oops)
>> # Problematic frame:
>> # V  [libjvm.so+0x414d24]  ciEnv::get_field_by_index(ciInstanceKlass*,
>> int)+0x94
>> #
>> # Failed to write core dump. Core dumps have been disabled. To enable
>> core dumping, try "ulimit -c unlimited" before starting Java again
>> #
>> # An error report file with more information is saved as:
>> # /home/gl069/ompi/bin/executor/hs_err_pid16578.log
>> #
>> # Compiler replay data is saved as:
>> # /home/gl069/ompi/bin/executor/replay_pid16578.log
>> #
>> # If you would like to submit a bug report, please visit:
>> #   http://bugreport.java.com/bugreport/crash.jsp
>> #
>> [titan01:16578] *** Process received signal ***
>> [titan01:16578] Signal: Aborted (6)
>> [titan01:16578] Signal code:  (-6)
>> [titan01:16578] [ 0] /usr/lib64/libpthread.so.0(+0xf100)[0x2b3d01500100]
>> [titan01:16578] [ 1] /usr/lib64/libc.so.6(gsignal+0x37)[0x2b3d01b5c5f7]
>> [titan01:16578] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x2b3d01b5dce8]
>> [titan01:16578] [ 3]
>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x91e605)[0x2b3d02806605]
>> [titan01:16578] [ 4]
>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0xabda63)[0x2b3d029a5a63]
>> [titan01:16578] [ 5]
>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(JVM_handle_linux_signal+0x14f)[0x2b3d0280be2f]
>> [titan01:16578] [ 6]
>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x91a5c3)[0x2b3d028025c3]
>> [titan01:16578] [ 7] /usr/lib64/libc.so.6(+0x35670)[0x2b3d01b5c670]
>> [titan01:16578] [ 8]
>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x414d24)[0x2b3d022fcd24]
>> [titan01:16578] [ 9]
>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x43c5ae)[0x2b3d023245ae]
>> [titan01:16578] [10]
>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x369ade)[0x2b3d02251ade]
>> [titan01:16578] [11]
>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36eda0)[0x2b3d02256da0]
>> [titan01:16578] [12]
>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x37091b)[0x2b3d0225891b]
>> [titan01:16578] [13]
>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x3712b

Re: [OMPI users] Java-OpenMPI returns with SIGSEGV

2016-07-07 Thread Gundram Leifert


Hello Gilles,

I tried you code and it crashes after 3-15 iterations (see (1)). It is 
always the same error (only the "94" varies).


Meanwhile I think Java and MPI use the same memory because when I delete 
the hash-call, the program runs sometimes more than 9k iterations.
When it crashes, there are different lines (see (2) and (3)). The 
crashes also occurs on rank 0.


# (1)#
# Problematic frame:
# J 94 C2 de.uros.citlab.executor.test.TestSendBigFiles2.hashcode([BI)I 
(42 bytes) @ 0x2b03242dc9c4 [0x2b03242dc860+0x164]


#(2)#
# Problematic frame:
# V  [libjvm.so+0x68d0f6] JavaCallWrapper::JavaCallWrapper(methodHandle, 
Handle, JavaValue*, Thread*)+0xb6


#(3)#
# Problematic frame:
# V  [libjvm.so+0x4183bf] 
ThreadInVMfromNative::ThreadInVMfromNative(JavaThread*)+0x4f


Any more idea?

On 07/07/2016 03:00 AM, Gilles Gouaillardet wrote:


Gundram,


fwiw, i cannot reproduce the issue on my box

- centos 7

- java version "1.8.0_71"
  Java(TM) SE Runtime Environment (build 1.8.0_71-b15)
  Java HotSpot(TM) 64-Bit Server VM (build 25.71-b15, mixed mode)


i noticed on non zero rank saveMem is allocated at each iteration.
ideally, the garbage collector can take care of that and this should 
not be an issue.


would you mind giving the attached file a try ?

Cheers,

Gilles

On 7/7/2016 7:41 AM, Gilles Gouaillardet wrote:

I will have a look at it today

how did you configure OpenMPI ?

Cheers,

Gilles

On Thursday, July 7, 2016, Gundram Leifert 
 wrote:


Hello Giles,

thank you for your hints! I did 3 changes, unfortunately the same
error occures:

update ompi:
commit ae8444682f0a7aa158caea08800542ce9874455e
Author: Ralph Castain 

Date:   Tue Jul 5 20:07:16 2016 -0700

update java:
java version "1.8.0_92"
Java(TM) SE Runtime Environment (build 1.8.0_92-b14)
Java HotSpot(TM) Server VM (build 25.92-b14, mixed mode)

delete hashcode-lines.

Now I get this error message - to 100%, after different number of
iterations (15-300):

 0/ 3:length = 1
 0/ 3:bcast length done (length = 1)
 1/ 3:bcast length done (length = 1)
 2/ 3:bcast length done (length = 1)
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x2b3d022fcd24, pid=16578,
tid=0x2b3d29716700
#
# JRE version: Java(TM) SE Runtime Environment (8.0_92-b14)
(build 1.8.0_92-b14)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.92-b14 mixed
mode linux-amd64 compressed oops)
# Problematic frame:
# V  [libjvm.so+0x414d24]
ciEnv::get_field_by_index(ciInstanceKlass*, int)+0x94
#
# Failed to write core dump. Core dumps have been disabled. To
enable core dumping, try "ulimit -c unlimited" before starting
Java again
#
# An error report file with more information is saved as:
# /home/gl069/ompi/bin/executor/hs_err_pid16578.log
#
# Compiler replay data is saved as:
# /home/gl069/ompi/bin/executor/replay_pid16578.log
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
#
[titan01:16578] *** Process received signal ***
[titan01:16578] Signal: Aborted (6)
[titan01:16578] Signal code:  (-6)
[titan01:16578] [ 0]
/usr/lib64/libpthread.so.0(+0xf100)[0x2b3d01500100]
[titan01:16578] [ 1]
/usr/lib64/libc.so.6(gsignal+0x37)[0x2b3d01b5c5f7]
[titan01:16578] [ 2]
/usr/lib64/libc.so.6(abort+0x148)[0x2b3d01b5dce8]
[titan01:16578] [ 3]

/home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x91e605)[0x2b3d02806605]
[titan01:16578] [ 4]

/home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0xabda63)[0x2b3d029a5a63]
[titan01:16578] [ 5]

/home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(JVM_handle_linux_signal+0x14f)[0x2b3d0280be2f]
[titan01:16578] [ 6]

/home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x91a5c3)[0x2b3d028025c3]
[titan01:16578] [ 7] /usr/lib64/libc.so.6(+0x35670)[0x2b3d01b5c670]
[titan01:16578] [ 8]

/home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x414d24)[0x2b3d022fcd24]
[titan01:16578] [ 9]

/home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x43c5ae)[0x2b3d023245ae]
[titan01:16578] [10]

/home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x369ade)[0x2b3d02251ade]
[titan01:16578] [11]

/home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36eda0)[0x2b3d02256da0]
[titan01:16578] [12]

/home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x37091b)[0x2b3d0225891b]
[titan01:16578] [13]

/home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x3712b6)[0x2b3d022592b6]
[titan01:16578] [14]

/home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36d2cf)[0x2b3d022552cf]
[titan01:16578] [15]

/home/gl069/bin/

Re: [OMPI users] Java-OpenMPI returns with SIGSEGV

2016-07-07 Thread Gilles Gouaillardet


Gundram,


can you please provide more information on your environment :

- configure command line

- OS

- memory available

- ulimit -a

- number of nodes

- number of tasks used

- interconnect used (if any)

- batch manager (if any)


Cheers,


Gilles

On 7/7/2016 4:17 PM, Gundram Leifert wrote:

Hello Gilles,

I tried you code and it crashes after 3-15 iterations (see (1)). It is 
always the same error (only the "94" varies).


Meanwhile I think Java and MPI use the same memory because when I 
delete the hash-call, the program runs sometimes more than 9k iterations.
When it crashes, there are different lines (see (2) and (3)). The 
crashes also occurs on rank 0.


# (1)#
# Problematic frame:
# J 94 C2 
de.uros.citlab.executor.test.TestSendBigFiles2.hashcode([BI)I (42 
bytes) @ 0x2b03242dc9c4 [0x2b03242dc860+0x164]


#(2)#
# Problematic frame:
# V  [libjvm.so+0x68d0f6] 
JavaCallWrapper::JavaCallWrapper(methodHandle, Handle, JavaValue*, 
Thread*)+0xb6


#(3)#
# Problematic frame:
# V  [libjvm.so+0x4183bf] 
ThreadInVMfromNative::ThreadInVMfromNative(JavaThread*)+0x4f


Any more idea?

On 07/07/2016 03:00 AM, Gilles Gouaillardet wrote:


Gundram,


fwiw, i cannot reproduce the issue on my box

- centos 7

- java version "1.8.0_71"
  Java(TM) SE Runtime Environment (build 1.8.0_71-b15)
  Java HotSpot(TM) 64-Bit Server VM (build 25.71-b15, mixed mode)


i noticed on non zero rank saveMem is allocated at each iteration.
ideally, the garbage collector can take care of that and this should 
not be an issue.


would you mind giving the attached file a try ?

Cheers,

Gilles

On 7/7/2016 7:41 AM, Gilles Gouaillardet wrote:

I will have a look at it today

how did you configure OpenMPI ?

Cheers,

Gilles

On Thursday, July 7, 2016, Gundram Leifert 
 wrote:


Hello Giles,

thank you for your hints! I did 3 changes, unfortunately the
same error occures:

update ompi:
commit ae8444682f0a7aa158caea08800542ce9874455e
Author: Ralph Castain 

Date:   Tue Jul 5 20:07:16 2016 -0700

update java:
java version "1.8.0_92"
Java(TM) SE Runtime Environment (build 1.8.0_92-b14)
Java HotSpot(TM) Server VM (build 25.92-b14, mixed mode)

delete hashcode-lines.

Now I get this error message - to 100%, after different number
of iterations (15-300):

 0/ 3:length = 1
 0/ 3:bcast length done (length = 1)
 1/ 3:bcast length done (length = 1)
 2/ 3:bcast length done (length = 1)
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x2b3d022fcd24, pid=16578,
tid=0x2b3d29716700
#
# JRE version: Java(TM) SE Runtime Environment (8.0_92-b14)
(build 1.8.0_92-b14)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.92-b14 mixed
mode linux-amd64 compressed oops)
# Problematic frame:
# V  [libjvm.so+0x414d24]
ciEnv::get_field_by_index(ciInstanceKlass*, int)+0x94
#
# Failed to write core dump. Core dumps have been disabled. To
enable core dumping, try "ulimit -c unlimited" before starting
Java again
#
# An error report file with more information is saved as:
# /home/gl069/ompi/bin/executor/hs_err_pid16578.log
#
# Compiler replay data is saved as:
# /home/gl069/ompi/bin/executor/replay_pid16578.log
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
#
[titan01:16578] *** Process received signal ***
[titan01:16578] Signal: Aborted (6)
[titan01:16578] Signal code:  (-6)
[titan01:16578] [ 0]
/usr/lib64/libpthread.so.0(+0xf100)[0x2b3d01500100]
[titan01:16578] [ 1]
/usr/lib64/libc.so.6(gsignal+0x37)[0x2b3d01b5c5f7]
[titan01:16578] [ 2]
/usr/lib64/libc.so.6(abort+0x148)[0x2b3d01b5dce8]
[titan01:16578] [ 3]

/home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x91e605)[0x2b3d02806605]
[titan01:16578] [ 4]

/home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0xabda63)[0x2b3d029a5a63]
[titan01:16578] [ 5]

/home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(JVM_handle_linux_signal+0x14f)[0x2b3d0280be2f]
[titan01:16578] [ 6]

/home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x91a5c3)[0x2b3d028025c3]
[titan01:16578] [ 7] /usr/lib64/libc.so.6(+0x35670)[0x2b3d01b5c670]
[titan01:16578] [ 8]

/home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x414d24)[0x2b3d022fcd24]
[titan01:16578] [ 9]

/home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x43c5ae)[0x2b3d023245ae]
[titan01:16578] [10]

/home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x369ade)[0x2b3d02251ade]
[titan01:16578] [11]

/home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x36eda0)[0x2b3d02256da0]
[titan01:16578] [12]

/home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+

[OMPI users] Class information in OpenMPI

2016-07-07 Thread Emani, Murali

Hi all,

I want to know if there is “class diagram” for OpenMPI code base that shows 
existing classes and dependencies/associations. Are there any available tools 
to extract and visualize this information.


—
Murali

Re: [OMPI users] Java-OpenMPI returns with SIGSEGV

2016-07-07 Thread Nathaniel Graham

Hello Gundram,

I was also not able to reproduce the issue on my computer (OS X El Capitan).
I ran both your code and the one provided by Gilles with no issues.

I can try it on my Ubuntu machine when I get home.

-Nathan

On Thu, Jul 7, 2016 at 2:05 AM, Gilles Gouaillardet 
wrote:

> Gundram,
>
>
> can you please provide more information on your environment :
>
> - configure command line
>
> - OS
>
> - memory available
>
> - ulimit -a
>
> - number of nodes
>
> - number of tasks used
>
> - interconnect used (if any)
>
> - batch manager (if any)
>
>
> Cheers,
>
>
> Gilles
> On 7/7/2016 4:17 PM, Gundram Leifert wrote:
>
> Hello Gilles,
>
> I tried you code and it crashes after 3-15 iterations (see (1)). It is
> always the same error (only the "94" varies).
>
> Meanwhile I think Java and MPI use the same memory because when I delete
> the hash-call, the program runs sometimes more than 9k iterations.
> When it crashes, there are different lines (see (2) and (3)). The crashes
> also occurs on rank 0.
>
> # (1)#
> # Problematic frame:
> # J 94 C2 de.uros.citlab.executor.test.TestSendBigFiles2.hashcode([BI)I
> (42 bytes) @ 0x2b03242dc9c4 [0x2b03242dc860+0x164]
>
> #(2)#
> # Problematic frame:
> # V  [libjvm.so+0x68d0f6]  JavaCallWrapper::JavaCallWrapper(methodHandle,
> Handle, JavaValue*, Thread*)+0xb6
>
> #(3)#
> # Problematic frame:
> # V  [libjvm.so+0x4183bf]
> ThreadInVMfromNative::ThreadInVMfromNative(JavaThread*)+0x4f
>
> Any more idea?
>
> On 07/07/2016 03:00 AM, Gilles Gouaillardet wrote:
>
> Gundram,
>
>
> fwiw, i cannot reproduce the issue on my box
>
> - centos 7
>
> - java version "1.8.0_71"
>   Java(TM) SE Runtime Environment (build 1.8.0_71-b15)
>   Java HotSpot(TM) 64-Bit Server VM (build 25.71-b15, mixed mode)
>
>
> i noticed on non zero rank saveMem is allocated at each iteration.
> ideally, the garbage collector can take care of that and this should not
> be an issue.
>
> would you mind giving the attached file a try ?
>
> Cheers,
>
> Gilles
>
> On 7/7/2016 7:41 AM, Gilles Gouaillardet wrote:
>
> I will have a look at it today
>
> how did you configure OpenMPI ?
>
> Cheers,
>
> Gilles
>
> On Thursday, July 7, 2016, Gundram Leifert <
> gundram.leif...@uni-rostock.de> wrote:
>
>> Hello Giles,
>>
>> thank you for your hints! I did 3 changes, unfortunately the same error
>> occures:
>>
>> update ompi:
>> commit ae8444682f0a7aa158caea08800542ce9874455e
>> Author: Ralph Castain 
>> Date:   Tue Jul 5 20:07:16 2016 -0700
>>
>> update java:
>> java version "1.8.0_92"
>> Java(TM) SE Runtime Environment (build 1.8.0_92-b14)
>> Java HotSpot(TM) Server VM (build 25.92-b14, mixed mode)
>>
>> delete hashcode-lines.
>>
>> Now I get this error message - to 100%, after different number of
>> iterations (15-300):
>>
>>  0/ 3:length = 1
>>  0/ 3:bcast length done (length = 1)
>>  1/ 3:bcast length done (length = 1)
>>  2/ 3:bcast length done (length = 1)
>> #
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  SIGSEGV (0xb) at pc=0x2b3d022fcd24, pid=16578,
>> tid=0x2b3d29716700
>> #
>> # JRE version: Java(TM) SE Runtime Environment (8.0_92-b14) (build
>> 1.8.0_92-b14)
>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.92-b14 mixed mode
>> linux-amd64 compressed oops)
>> # Problematic frame:
>> # V  [libjvm.so+0x414d24]  ciEnv::get_field_by_index(ciInstanceKlass*,
>> int)+0x94
>> #
>> # Failed to write core dump. Core dumps have been disabled. To enable
>> core dumping, try "ulimit -c unlimited" before starting Java again
>> #
>> # An error report file with more information is saved as:
>> # /home/gl069/ompi/bin/executor/hs_err_pid16578.log
>> #
>> # Compiler replay data is saved as:
>> # /home/gl069/ompi/bin/executor/replay_pid16578.log
>> #
>> # If you would like to submit a bug report, please visit:
>> #   http://bugreport.java.com/bugreport/crash.jsp
>> #
>> [titan01:16578] *** Process received signal ***
>> [titan01:16578] Signal: Aborted (6)
>> [titan01:16578] Signal code:  (-6)
>> [titan01:16578] [ 0] /usr/lib64/libpthread.so.0(+0xf100)[0x2b3d01500100]
>> [titan01:16578] [ 1] /usr/lib64/libc.so.6(gsignal+0x37)[0x2b3d01b5c5f7]
>> [titan01:16578] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x2b3d01b5dce8]
>> [titan01:16578] [ 3]
>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x91e605)[0x2b3d02806605]
>> [titan01:16578] [ 4]
>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0xabda63)[0x2b3d029a5a63]
>> [titan01:16578] [ 5]
>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(JVM_handle_linux_signal+0x14f)[0x2b3d0280be2f]
>> [titan01:16578] [ 6]
>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x91a5c3)[0x2b3d028025c3]
>> [titan01:16578] [ 7] /usr/lib64/libc.so.6(+0x35670)[0x2b3d01b5c670]
>> [titan01:16578] [ 8]
>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd64/server/libjvm.so(+0x414d24)[0x2b3d022fcd24]
>> [titan01:16578] [ 9]
>> /home/gl069/bin/jdk1.8.0_92/jre/lib/amd6

[OMPI users] mpirun has exited due to process rank N

2016-07-07 Thread Alberti, Andrea

Hi,

my name is Andrea and I am a new openMPI user.

I have a code compiled with:
intel/16.0.3
openmpi/1.6.5

--> When I try to run my code with:  mpirun -n N ./code.exe
  a) the code correctly runs and gives results if N<=25
  b) the code gives the following error if N>25:
  mpirun has exited due to process rank X with PID ...

--> This seems to be a pretty common problem when not all the processes are 
initialized or finalized.
  However, I do init and finalize the processors.
  And, moreover, I do not understand why the problem is not there when  
N<=25

Could someone, please, help me out with that or point me to some pages where 
the same problem is discussed/solved?
Thank you very much in advance for the help.

Andrea

Re: [OMPI users] mpirun has exited due to process rank N

2016-07-07 Thread Ralph Castain

Try running one of the OMPI example codes and verify that things run correctly 
if N > 25. I suspect you have an error in your code that causes it to fail if 
its rank is > 25.


> On Jul 7, 2016, at 2:49 PM, Alberti, Andrea  wrote:
> 
> Hi,
> 
> my name is Andrea and I am a new openMPI user.
> 
> I have a code compiled with:
> intel/16.0.3
> openmpi/1.6.5
> 
> --> When I try to run my code with:  mpirun -n N ./code.exe
>   a) the code correctly runs and gives results if N<=25
>   b) the code gives the following error if N>25:
>   mpirun has exited due to process rank X with PID ...
> 
> --> This seems to be a pretty common problem when not all the processes are 
> initialized or finalized.
>   However, I do init and finalize the processors.
>   And, moreover, I do not understand why the problem is not there when  
> N<=25
> 
> Could someone, please, help me out with that or point me to some pages where 
> the same problem is discussed/solved?
> Thank you very much in advance for the help.
> 
> Andrea
> 
> ___
> users mailing list
> us...@open-mpi.org 
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users 
> 
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/07/29596.php 
>

Re: [OMPI users] Class information in OpenMPI

2016-07-07 Thread Ralph Castain

We used to have Doxygen support that would create what you are asking for, but 
I don’t think anyone has maintained it in a long time. I ran “doxygen” at the 
top-level directory and it did indeed generate a bunch of html, but I’m not 
sure it is all that helpful.

You might take a look and see if it helps enough to be useful. Could be that 
someone will contribute updated Doxygen support to make it better…

> On Jul 7, 2016, at 11:57 AM, Emani, Murali  wrote:
> 
> Hi all,
> 
> I want to know if there is “class diagram” for OpenMPI code base that shows 
> existing classes and dependencies/associations. Are there any available tools 
> to extract and visualize this information.
> 
> 
> — 
> Murali
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/07/29594.php

Re: [OMPI users] mpirun has exited due to process rank N

2016-07-07 Thread Gilles Gouaillardet


Andrea,


On top of what Ralph just wrote, you might want to upgrade OpenMPI to 
the latest stable version (1.10.3)


1.6.5 is pretty antique and is no more maintained.


the message indicates that one process died, and so many things could 
cause a process crash.


(since the crash occurs only with N > 25, the root cause could be an out 
of memory (just run dmesg and grep OOM),


a division by zero, your application calling exit(...) instead of 
MPI_Finalize()/MPI_Abort(...) or a bug in your application)



Cheers,


Gilles


On 7/8/2016 7:12 AM, Ralph Castain wrote:
Try running one of the OMPI example codes and verify that things run 
correctly if N > 25. I suspect you have an error in your code that 
causes it to fail if its rank is > 25.



On Jul 7, 2016, at 2:49 PM, Alberti, Andrea > wrote:


Hi,

my name is Andrea and I am a new openMPI user.

I have a code compiled with:
intel/16.0.3
openmpi/1.6.5

--> When I try to run my code with: mpirun -n N ./code.exe
a) the code correctly runs and gives results if N<=25
b) the code gives the following error if N>25:
mpirun has exited due to process rank X with PID ...

--> This seems to be a pretty common problem when not all the 
processes are initialized or finalized.

However, I do init and finalize the processors.
And, moreover, I do not understand why the problem is not there 
when N<=25


Could someone, please, help me out with that or point me to some 
pages where the same problem is discussed/solved?

Thank you very much in advance for the help.

Andrea

___
users mailing list
us...@open-mpi.org 
Subscription:https://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this 
post:http://www.open-mpi.org/community/lists/users/2016/07/29596.php




___
users mailing list
us...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/07/29597.php

Re: [OMPI users] Java-OpenMPI returns with SIGSEGV

Re: [OMPI users] Java-OpenMPI returns with SIGSEGV

Re: [OMPI users] Java-OpenMPI returns with SIGSEGV

[OMPI users] Class information in OpenMPI

Re: [OMPI users] Java-OpenMPI returns with SIGSEGV

[OMPI users] mpirun has exited due to process rank N

Re: [OMPI users] mpirun has exited due to process rank N

Re: [OMPI users] Class information in OpenMPI

Re: [OMPI users] mpirun has exited due to process rank N

9 matches

Site Navigation

Mail list logo

Footer information