Re: [OMPI users] memory per core/process

2013-04-03 Thread Ralph Castain
Here is a v1.6 port of what was committed to the trunk. Let me know if/how it 
works for you. The option you will want to use is:

mpirun -mca opal_set_max_sys_limits stacksize:unlimited

or whatever number you want to give (see ulimit for the units). Note that you 
won't see any impact if you run it with a non-OMPI executable like "sh ulimit" 
since it only gets called during MPI_Init.


On Apr 2, 2013, at 9:48 AM, Duke Nguyen  wrote:

> On 4/2/13 11:03 PM, Gus Correa wrote:
>> On 04/02/2013 11:40 AM, Duke Nguyen wrote:
>>> On 3/30/13 8:46 PM, Patrick Bégou wrote:
 Ok, so your problem is identified as a stack size problem. I went into
 these limitations using Intel fortran compilers on large data problems.
 
 First, it seems you can increase your stack size as "ulimit -s
 unlimited" works (you didn't enforce the system hard limit). The best
 way is to set this setting in your .bashrc file so it will works on
 every node.
 But setting it to unlimited may not be really safe. IE, if you got in
 a badly coded recursive function calling itself without a stop
 condition you can request all the system memory and crash the node. So
 set a large but limited value, it's safer.
 
>>> 
>>> Now I feel the pain you mentioned :). With -s unlimited now some of our
>>> nodes are easily down (completely) and needed to be hard reset!!!
>>> (whereas we never had any node down like that before even with the
>>> killed or badly coded jobs).
>>> 
>>> Looking for a safer number of ulimit -s other than "unlimited" now... :(
>>> 
>> 
>> In my opinion this is a trade off between who feels the pain.
>> It can be you (sys admin) feeling the pain of having
>> to power up offline nodes,
>> or it could be the user feeling the pain for having
>> her/his code killed by segmentation fault due to small memory
>> available for the stack.
> 
> ... in case that user is at a large institute that promises to provide best 
> service, unlimited resources/unlimited *everything* to end users. If not, 
> user should really think of how to make use the best of available resources. 
> Unfortunately many (most?) end users don't.
> 
>> There is only so much that can be done to make everybody happy.
> 
> So true... especially HPC resource is still luxurious here in Vietnam, and we 
> have a quite small (and not-so-strong) cluster.
> 
>> If you share the nodes among jobs, you could set the
>> stack size limit to
>> some part of the physical_memory divided by the number_of_cores,
>> saving some memory for the OS etc beforehand.
>> However, this can be a straitjacket for jobs that could run with
>> a bit more memory, and won't because of this limit.
>> If you do not share the nodes, then you could make stacksize
>> closer to physical memory.
> 
> Great. Thanks for this advice Gus.
> 
>> 
>> Anyway, this is less of an OpenMPI than of a
>> resource manager / queuing system conversation.
> 
> Yeah, and I have learned a lot other than just openmpi stuffs here :)
> 
>> 
>> Best,
>> Gus Correa
>> 
 I'm managing a cluster and I always set a maximum value to stack size.
 I also limit the memory available for each core for system stability.
 If a user request only one of the 12 cores of a node he can only
 access 1/12 of the node memory amount. If he needs more memory he has
 to request 2 cores, even if he uses a sequential code. This avoid
 crashing jobs of other users on the same node with memory
 requirements. But this is not configured on your node.
 
 Duke Nguyen a écrit :
> On 3/30/13 3:13 PM, Patrick Bégou wrote:
>> I do not know about your code but:
>> 
>> 1) did you check stack limitations ? Typically intel fortran codes
>> needs large amount of stack when the problem size increase.
>> Check ulimit -a
> 
> First time I heard of stack limitations. Anyway, ulimit -a gives
> 
> $ ulimit -a
> core file size (blocks, -c) 0
> data seg size (kbytes, -d) unlimited
> scheduling priority (-e) 0
> file size (blocks, -f) unlimited
> pending signals (-i) 127368
> max locked memory (kbytes, -l) unlimited
> max memory size (kbytes, -m) unlimited
> open files (-n) 1024
> pipe size (512 bytes, -p) 8
> POSIX message queues (bytes, -q) 819200
> real-time priority (-r) 0
> stack size (kbytes, -s) 10240
> cpu time (seconds, -t) unlimited
> max user processes (-u) 1024
> virtual memory (kbytes, -v) unlimited
> file locks (-x) unlimited
> 
> So stack size is 10MB??? Does this one create problem? How do I
> change this?
> 
>> 
>> 2) did your node uses cpuset and memory limitation like fake numa to
>> set the maximum amount of memory available for a job ?
> 
> Not really understand (also first time heard of fake numa), but I am
> pretty sure we do not have such things. The server I tried was a
> dedicated server with 2 x5420 and 16GB 

Re: [OMPI users] memory per core/process

2013-04-02 Thread Duke Nguyen

On 4/2/13 11:03 PM, Gus Correa wrote:

On 04/02/2013 11:40 AM, Duke Nguyen wrote:

On 3/30/13 8:46 PM, Patrick Bégou wrote:

Ok, so your problem is identified as a stack size problem. I went into
these limitations using Intel fortran compilers on large data problems.

First, it seems you can increase your stack size as "ulimit -s
unlimited" works (you didn't enforce the system hard limit). The best
way is to set this setting in your .bashrc file so it will works on
every node.
But setting it to unlimited may not be really safe. IE, if you got in
a badly coded recursive function calling itself without a stop
condition you can request all the system memory and crash the node. So
set a large but limited value, it's safer.



Now I feel the pain you mentioned :). With -s unlimited now some of our
nodes are easily down (completely) and needed to be hard reset!!!
(whereas we never had any node down like that before even with the
killed or badly coded jobs).

Looking for a safer number of ulimit -s other than "unlimited" now... :(



In my opinion this is a trade off between who feels the pain.
It can be you (sys admin) feeling the pain of having
to power up offline nodes,
or it could be the user feeling the pain for having
her/his code killed by segmentation fault due to small memory
available for the stack.


... in case that user is at a large institute that promises to provide 
best service, unlimited resources/unlimited *everything* to end users. 
If not, user should really think of how to make use the best of 
available resources. Unfortunately many (most?) end users don't.



There is only so much that can be done to make everybody happy.


So true... especially HPC resource is still luxurious here in Vietnam, 
and we have a quite small (and not-so-strong) cluster.



If you share the nodes among jobs, you could set the
stack size limit to
some part of the physical_memory divided by the number_of_cores,
saving some memory for the OS etc beforehand.
However, this can be a straitjacket for jobs that could run with
a bit more memory, and won't because of this limit.
If you do not share the nodes, then you could make stacksize
closer to physical memory.


Great. Thanks for this advice Gus.



Anyway, this is less of an OpenMPI than of a
resource manager / queuing system conversation.


Yeah, and I have learned a lot other than just openmpi stuffs here :)



Best,
Gus Correa


I'm managing a cluster and I always set a maximum value to stack size.
I also limit the memory available for each core for system stability.
If a user request only one of the 12 cores of a node he can only
access 1/12 of the node memory amount. If he needs more memory he has
to request 2 cores, even if he uses a sequential code. This avoid
crashing jobs of other users on the same node with memory
requirements. But this is not configured on your node.

Duke Nguyen a écrit :

On 3/30/13 3:13 PM, Patrick Bégou wrote:

I do not know about your code but:

1) did you check stack limitations ? Typically intel fortran codes
needs large amount of stack when the problem size increase.
Check ulimit -a


First time I heard of stack limitations. Anyway, ulimit -a gives

$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 127368
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

So stack size is 10MB??? Does this one create problem? How do I
change this?



2) did your node uses cpuset and memory limitation like fake numa to
set the maximum amount of memory available for a job ?


Not really understand (also first time heard of fake numa), but I am
pretty sure we do not have such things. The server I tried was a
dedicated server with 2 x5420 and 16GB physical memory.



Patrick

Duke Nguyen a écrit :

Hi folks,

I am sorry if this question had been asked before, but after ten
days of searching/working on the system, I surrender :(. We try to
use mpirun to run abinit (abinit.org) which in turns will call an
input file to run some simulation. The command to run is pretty 
simple


$ mpirun -np 4 /opt/apps/abinit/bin/abinit < input.files >& 
output.log


We ran this command on a server with two quad core x5420 and 16GB
of memory. I called only 4 core, and I guess in theory each of the
core should take up to 2GB each.

In the output of the log, there is something about memory:

P This job should need less than 717.175 Mbytes of memory.
Rough estimation (10% accuracy) of disk space for files :
WF disk file : 69.524 Mbytes ; DEN or POT disk file : 14.240 Mbytes.

So basically it reported that the above job should not take more
than 718MB each core.

But I 

Re: [OMPI users] memory per core/process

2013-04-02 Thread Duke Nguyen

On 4/2/13 10:45 PM, Ralph Castain wrote:

Hmmm...tell you what. I'll add the ability for OMPI to set the limit to a 
user-specified level upon launch of each process. This will give you some 
protection and flexibility.


That would be excellent ;)



I forget, so please forgive the old man's fading memory - what version of OMPI 
are you using? I'll backport a patch for you.


It's openmpi-1.6.3-x86_64, if that helps...



On Apr 2, 2013, at 8:40 AM, Duke Nguyen  wrote:


On 3/30/13 8:46 PM, Patrick Bégou wrote:

Ok, so your problem is identified as a stack size problem. I went into these 
limitations using Intel fortran compilers on large data problems.

First, it seems you can increase your stack size as "ulimit -s unlimited" works 
(you didn't enforce the system hard limit). The best way  is to set this setting in your 
.bashrc file so it will works on every node.
But setting it to unlimited may not be really safe. IE, if you got in a badly 
coded recursive function calling itself without a stop condition you can 
request all the system memory and crash the node. So set a large but limited 
value, it's safer.


Now I feel the pain you mentioned :). With -s unlimited now some of our nodes 
are easily down (completely) and needed to be hard reset!!! (whereas we never 
had any node down like that before even with the killed or badly coded jobs).

Looking for a safer number of ulimit -s other than "unlimited" now... :(


I'm managing a cluster and I always set a maximum value to stack size. I also 
limit the memory available for each core for system stability. If a user 
request only one of the 12 cores of a node he can only access 1/12 of the node 
memory amount. If he needs more memory he has to request 2 cores, even if he 
uses a sequential code. This avoid crashing jobs of other users on the same 
node with memory requirements. But this is not configured on your node.

Duke Nguyen a écrit :

On 3/30/13 3:13 PM, Patrick Bégou wrote:

I do not know about your code but:

1) did you check stack limitations ? Typically intel fortran codes needs large 
amount of stack when the problem size increase.
Check ulimit -a

First time I heard of stack limitations. Anyway, ulimit -a gives

$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 127368
max locked memory   (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 10240
cpu time   (seconds, -t) unlimited
max user processes  (-u) 1024
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

So stack size is 10MB??? Does this one create problem? How do I change this?


2) did your node uses cpuset and memory limitation like fake numa to set the 
maximum amount of memory available for a job ?

Not really understand (also first time heard of fake numa), but I am pretty 
sure we do not have such things. The server I tried was a dedicated server with 
2 x5420 and 16GB physical memory.


Patrick

Duke Nguyen a écrit :

Hi folks,

I am sorry if this question had been asked before, but after ten days of 
searching/working on the system, I surrender :(. We try to use mpirun to run 
abinit (abinit.org) which in turns will call an input file to run some 
simulation. The command to run is pretty simple

$ mpirun -np 4 /opt/apps/abinit/bin/abinit < input.files >& output.log

We ran this command on a server with two quad core x5420 and 16GB of memory. I 
called only 4 core, and I guess in theory each of the core should take up to 
2GB each.

In the output of the log, there is something about memory:

P This job should need less than 717.175 Mbytes of memory.
  Rough estimation (10% accuracy) of disk space for files :
  WF disk file : 69.524 Mbytes ; DEN or POT disk file : 14.240 Mbytes.

So basically it reported that the above job should not take more than 718MB 
each core.

But I still have the Segmentation Fault error:

mpirun noticed that process rank 0 with PID 16099 on node biobos exited on 
signal 11 (Segmentation fault).

The system already has limits up to unlimited:

$ cat /etc/security/limits.conf | grep -v '#'
* soft memlock unlimited
* hard memlock unlimited

I also tried to run

$ ulimit -l unlimited

before the mpirun command above, but it did not help at all.

If we adjust the parameters of the input.files to give the reported mem per 
core is less than 512MB, then the job runs fine.

Please help,

Thanks,

D.


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] memory per core/process

2013-04-02 Thread Gus Correa

On 04/02/2013 11:40 AM, Duke Nguyen wrote:

On 3/30/13 8:46 PM, Patrick Bégou wrote:

Ok, so your problem is identified as a stack size problem. I went into
these limitations using Intel fortran compilers on large data problems.

First, it seems you can increase your stack size as "ulimit -s
unlimited" works (you didn't enforce the system hard limit). The best
way is to set this setting in your .bashrc file so it will works on
every node.
But setting it to unlimited may not be really safe. IE, if you got in
a badly coded recursive function calling itself without a stop
condition you can request all the system memory and crash the node. So
set a large but limited value, it's safer.



Now I feel the pain you mentioned :). With -s unlimited now some of our
nodes are easily down (completely) and needed to be hard reset!!!
(whereas we never had any node down like that before even with the
killed or badly coded jobs).

Looking for a safer number of ulimit -s other than "unlimited" now... :(



In my opinion this is a trade off between who feels the pain.
It can be you (sys admin) feeling the pain of having
to power up offline nodes,
or it could be the user feeling the pain for having
her/his code killed by segmentation fault due to small memory
available for the stack.
There is only so much that can be done to make everybody happy.
If you share the nodes among jobs, you could set the
stack size limit to
some part of the physical_memory divided by the number_of_cores,
saving some memory for the OS etc beforehand.
However, this can be a straitjacket for jobs that could run with
a bit more memory, and won't because of this limit.
If you do not share the nodes, then you could make stacksize
closer to physical memory.

Anyway, this is less of an OpenMPI than of a
resource manager / queuing system conversation.

Best,
Gus Correa


I'm managing a cluster and I always set a maximum value to stack size.
I also limit the memory available for each core for system stability.
If a user request only one of the 12 cores of a node he can only
access 1/12 of the node memory amount. If he needs more memory he has
to request 2 cores, even if he uses a sequential code. This avoid
crashing jobs of other users on the same node with memory
requirements. But this is not configured on your node.

Duke Nguyen a écrit :

On 3/30/13 3:13 PM, Patrick Bégou wrote:

I do not know about your code but:

1) did you check stack limitations ? Typically intel fortran codes
needs large amount of stack when the problem size increase.
Check ulimit -a


First time I heard of stack limitations. Anyway, ulimit -a gives

$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 127368
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

So stack size is 10MB??? Does this one create problem? How do I
change this?



2) did your node uses cpuset and memory limitation like fake numa to
set the maximum amount of memory available for a job ?


Not really understand (also first time heard of fake numa), but I am
pretty sure we do not have such things. The server I tried was a
dedicated server with 2 x5420 and 16GB physical memory.



Patrick

Duke Nguyen a écrit :

Hi folks,

I am sorry if this question had been asked before, but after ten
days of searching/working on the system, I surrender :(. We try to
use mpirun to run abinit (abinit.org) which in turns will call an
input file to run some simulation. The command to run is pretty simple

$ mpirun -np 4 /opt/apps/abinit/bin/abinit < input.files >& output.log

We ran this command on a server with two quad core x5420 and 16GB
of memory. I called only 4 core, and I guess in theory each of the
core should take up to 2GB each.

In the output of the log, there is something about memory:

P This job should need less than 717.175 Mbytes of memory.
Rough estimation (10% accuracy) of disk space for files :
WF disk file : 69.524 Mbytes ; DEN or POT disk file : 14.240 Mbytes.

So basically it reported that the above job should not take more
than 718MB each core.

But I still have the Segmentation Fault error:

mpirun noticed that process rank 0 with PID 16099 on node biobos
exited on signal 11 (Segmentation fault).

The system already has limits up to unlimited:

$ cat /etc/security/limits.conf | grep -v '#'
* soft memlock unlimited
* hard memlock unlimited

I also tried to run

$ ulimit -l unlimited

before the mpirun command above, but it did not help at all.

If we adjust the parameters of the input.files to give the reported
mem per core is less than 512MB, then the job runs fine.

Please help,

Thanks,

D.



Re: [OMPI users] memory per core/process

2013-04-02 Thread Ralph Castain
Hmmm...tell you what. I'll add the ability for OMPI to set the limit to a 
user-specified level upon launch of each process. This will give you some 
protection and flexibility.

I forget, so please forgive the old man's fading memory - what version of OMPI 
are you using? I'll backport a patch for you.

On Apr 2, 2013, at 8:40 AM, Duke Nguyen  wrote:

> On 3/30/13 8:46 PM, Patrick Bégou wrote:
>> Ok, so your problem is identified as a stack size problem. I went into these 
>> limitations using Intel fortran compilers on large data problems.
>> 
>> First, it seems you can increase your stack size as "ulimit -s unlimited" 
>> works (you didn't enforce the system hard limit). The best way  is to set 
>> this setting in your .bashrc file so it will works on every node.
>> But setting it to unlimited may not be really safe. IE, if you got in a 
>> badly coded recursive function calling itself without a stop condition you 
>> can request all the system memory and crash the node. So set a large but 
>> limited value, it's safer.
>> 
> 
> Now I feel the pain you mentioned :). With -s unlimited now some of our nodes 
> are easily down (completely) and needed to be hard reset!!! (whereas we never 
> had any node down like that before even with the killed or badly coded jobs).
> 
> Looking for a safer number of ulimit -s other than "unlimited" now... :(
> 
>> I'm managing a cluster and I always set a maximum value to stack size. I 
>> also limit the memory available for each core for system stability. If a 
>> user request only one of the 12 cores of a node he can only access 1/12 of 
>> the node memory amount. If he needs more memory he has to request 2 cores, 
>> even if he uses a sequential code. This avoid crashing jobs of other users 
>> on the same node with memory requirements. But this is not configured on 
>> your node.
>> 
>> Duke Nguyen a écrit :
>>> On 3/30/13 3:13 PM, Patrick Bégou wrote:
 I do not know about your code but:
 
 1) did you check stack limitations ? Typically intel fortran codes needs 
 large amount of stack when the problem size increase.
 Check ulimit -a
>>> 
>>> First time I heard of stack limitations. Anyway, ulimit -a gives
>>> 
>>> $ ulimit -a
>>> core file size  (blocks, -c) 0
>>> data seg size   (kbytes, -d) unlimited
>>> scheduling priority (-e) 0
>>> file size   (blocks, -f) unlimited
>>> pending signals (-i) 127368
>>> max locked memory   (kbytes, -l) unlimited
>>> max memory size (kbytes, -m) unlimited
>>> open files  (-n) 1024
>>> pipe size(512 bytes, -p) 8
>>> POSIX message queues (bytes, -q) 819200
>>> real-time priority  (-r) 0
>>> stack size  (kbytes, -s) 10240
>>> cpu time   (seconds, -t) unlimited
>>> max user processes  (-u) 1024
>>> virtual memory  (kbytes, -v) unlimited
>>> file locks  (-x) unlimited
>>> 
>>> So stack size is 10MB??? Does this one create problem? How do I change this?
>>> 
 
 2) did your node uses cpuset and memory limitation like fake numa to set 
 the maximum amount of memory available for a job ?
>>> 
>>> Not really understand (also first time heard of fake numa), but I am pretty 
>>> sure we do not have such things. The server I tried was a dedicated server 
>>> with 2 x5420 and 16GB physical memory.
>>> 
 
 Patrick
 
 Duke Nguyen a écrit :
> Hi folks,
> 
> I am sorry if this question had been asked before, but after ten days of 
> searching/working on the system, I surrender :(. We try to use mpirun to 
> run abinit (abinit.org) which in turns will call an input file to run 
> some simulation. The command to run is pretty simple
> 
> $ mpirun -np 4 /opt/apps/abinit/bin/abinit < input.files >& output.log
> 
> We ran this command on a server with two quad core x5420 and 16GB of 
> memory. I called only 4 core, and I guess in theory each of the core 
> should take up to 2GB each.
> 
> In the output of the log, there is something about memory:
> 
> P This job should need less than 717.175 Mbytes of 
> memory.
>  Rough estimation (10% accuracy) of disk space for files :
>  WF disk file : 69.524 Mbytes ; DEN or POT disk file : 14.240 Mbytes.
> 
> So basically it reported that the above job should not take more than 
> 718MB each core.
> 
> But I still have the Segmentation Fault error:
> 
> mpirun noticed that process rank 0 with PID 16099 on node biobos exited 
> on signal 11 (Segmentation fault).
> 
> The system already has limits up to unlimited:
> 
> $ cat /etc/security/limits.conf | grep -v '#'
> * soft memlock unlimited
> * hard memlock unlimited
> 
> I also tried to run
> 
> $ ulimit -l unlimited
> 
> before the mpirun 

Re: [OMPI users] memory per core/process

2013-04-02 Thread Duke Nguyen

On 3/30/13 8:46 PM, Patrick Bégou wrote:
Ok, so your problem is identified as a stack size problem. I went into 
these limitations using Intel fortran compilers on large data problems.


First, it seems you can increase your stack size as "ulimit -s 
unlimited" works (you didn't enforce the system hard limit). The best 
way  is to set this setting in your .bashrc file so it will works on 
every node.
But setting it to unlimited may not be really safe. IE, if you got in 
a badly coded recursive function calling itself without a stop 
condition you can request all the system memory and crash the node. So 
set a large but limited value, it's safer.




Now I feel the pain you mentioned :). With -s unlimited now some of our 
nodes are easily down (completely) and needed to be hard reset!!! 
(whereas we never had any node down like that before even with the 
killed or badly coded jobs).


Looking for a safer number of ulimit -s other than "unlimited" now... :(

I'm managing a cluster and I always set a maximum value to stack size. 
I also limit the memory available for each core for system stability. 
If a user request only one of the 12 cores of a node he can only 
access 1/12 of the node memory amount. If he needs more memory he has 
to request 2 cores, even if he uses a sequential code. This avoid 
crashing jobs of other users on the same node with memory 
requirements. But this is not configured on your node.


Duke Nguyen a écrit :

On 3/30/13 3:13 PM, Patrick Bégou wrote:

I do not know about your code but:

1) did you check stack limitations ? Typically intel fortran codes 
needs large amount of stack when the problem size increase.

Check ulimit -a


First time I heard of stack limitations. Anyway, ulimit -a gives

$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 127368
max locked memory   (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 10240
cpu time   (seconds, -t) unlimited
max user processes  (-u) 1024
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

So stack size is 10MB??? Does this one create problem? How do I 
change this?




2) did your node uses cpuset and memory limitation like fake numa to 
set the maximum amount of memory available for a job ?


Not really understand (also first time heard of fake numa), but I am 
pretty sure we do not have such things. The server I tried was a 
dedicated server with 2 x5420 and 16GB physical memory.




Patrick

Duke Nguyen a écrit :

Hi folks,

I am sorry if this question had been asked before, but after ten 
days of searching/working on the system, I surrender :(. We try to 
use mpirun to run abinit (abinit.org) which in turns will call an 
input file to run some simulation. The command to run is pretty simple


$ mpirun -np 4 /opt/apps/abinit/bin/abinit < input.files >& output.log

We ran this command on a server with two quad core x5420 and 16GB 
of memory. I called only 4 core, and I guess in theory each of the 
core should take up to 2GB each.


In the output of the log, there is something about memory:

P This job should need less than 717.175 Mbytes 
of memory.

  Rough estimation (10% accuracy) of disk space for files :
  WF disk file : 69.524 Mbytes ; DEN or POT disk file : 14.240 
Mbytes.


So basically it reported that the above job should not take more 
than 718MB each core.


But I still have the Segmentation Fault error:

mpirun noticed that process rank 0 with PID 16099 on node biobos 
exited on signal 11 (Segmentation fault).


The system already has limits up to unlimited:

$ cat /etc/security/limits.conf | grep -v '#'
* soft memlock unlimited
* hard memlock unlimited

I also tried to run

$ ulimit -l unlimited

before the mpirun command above, but it did not help at all.

If we adjust the parameters of the input.files to give the reported 
mem per core is less than 512MB, then the job runs fine.


Please help,

Thanks,

D.


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





Re: [OMPI users] memory per core/process

2013-04-02 Thread Duke Nguyen

On 4/2/13 6:50 PM, Reuti wrote:

Hi,

Am 30.03.2013 um 14:46 schrieb Patrick Bégou:


Ok, so your problem is identified as a stack size problem. I went into these 
limitations using Intel fortran compilers on large data problems.

First, it seems you can increase your stack size as "ulimit -s unlimited" works 
(you didn't enforce the system hard limit). The best way  is to set this setting in your 
.bashrc file so it will works on every node.
But setting it to unlimited may not be really safe. IE, if you got in a badly 
coded recursive function calling itself without a stop condition you can 
request all the system memory and crash the node. So set a large but limited 
value, it's safer.

I'm managing a cluster and I always set a maximum value to stack size. I also 
limit the memory available for each core for system stability. If a user 
request only one of the 12 cores of a node he can only access 1/12 of the node 
memory amount. If he needs more memory he has to request 2 cores, even if he 
uses a sequential code. This avoid crashing jobs of other users on the same 
node with memory requirements. But this is not configured on your node.

This is one way to implement memory limits as a policy - it's up to the user to 
request the correct number of cores then although he wants to run a serial job 
only. Personally I prefer that the user specifies the requested memory in such 
a case. It's up to the queuingsystem then to avoid that additional jobs are 
scheduled to a machine unless the remaining memory is sufficient for their 
execution in such a situation.


We use Torque/Maui and I want to do similar with Torque/Maui (still 
learning - those, together with openmpi are new to me). Unfortunately 
posting to Torque/Maui forums are somehow too difficult (my posts were 
moderated since I am the newcomer, but it seems nobody is managing those 
forums so my posts were never able to get through...). I wish they were 
as active as this forum...


D.



-- Reuti



Duke Nguyen a écrit :

On 3/30/13 3:13 PM, Patrick Bégou wrote:

I do not know about your code but:

1) did you check stack limitations ? Typically intel fortran codes needs large 
amount of stack when the problem size increase.
Check ulimit -a

First time I heard of stack limitations. Anyway, ulimit -a gives

$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 127368
max locked memory   (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 10240
cpu time   (seconds, -t) unlimited
max user processes  (-u) 1024
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

So stack size is 10MB??? Does this one create problem? How do I change this?


2) did your node uses cpuset and memory limitation like fake numa to set the 
maximum amount of memory available for a job ?

Not really understand (also first time heard of fake numa), but I am pretty 
sure we do not have such things. The server I tried was a dedicated server with 
2 x5420 and 16GB physical memory.


Patrick

Duke Nguyen a écrit :

Hi folks,

I am sorry if this question had been asked before, but after ten days of 
searching/working on the system, I surrender :(. We try to use mpirun to run 
abinit (abinit.org) which in turns will call an input file to run some 
simulation. The command to run is pretty simple

$ mpirun -np 4 /opt/apps/abinit/bin/abinit < input.files >& output.log

We ran this command on a server with two quad core x5420 and 16GB of memory. I 
called only 4 core, and I guess in theory each of the core should take up to 
2GB each.

In the output of the log, there is something about memory:

P This job should need less than 717.175 Mbytes of memory.
  Rough estimation (10% accuracy) of disk space for files :
  WF disk file : 69.524 Mbytes ; DEN or POT disk file : 14.240 Mbytes.

So basically it reported that the above job should not take more than 718MB 
each core.

But I still have the Segmentation Fault error:

mpirun noticed that process rank 0 with PID 16099 on node biobos exited on 
signal 11 (Segmentation fault).

The system already has limits up to unlimited:

$ cat /etc/security/limits.conf | grep -v '#'
* soft memlock unlimited
* hard memlock unlimited

I also tried to run

$ ulimit -l unlimited

before the mpirun command above, but it did not help at all.

If we adjust the parameters of the input.files to give the reported mem per 
core is less than 512MB, then the job runs fine.

Please help,

Thanks,

D.


___
users mailing list
us...@open-mpi.org

Re: [OMPI users] memory per core/process

2013-04-02 Thread Duke Nguyen

On 4/2/13 6:42 PM, Reuti wrote:

/usr/local/bin/mpirun -npernode 1 -tag-output  sh -c "ulimit -a"

You are right :)

$ /usr/local/bin/mpirun -npernode 1 -tag-output  sh -c "ulimit -a"
[1,0]:core file size  (blocks, -c) 0
[1,0]:data seg size   (kbytes, -d) unlimited
[1,0]:scheduling priority (-e) 0
[1,0]:file size   (blocks, -f) unlimited
[1,0]:pending signals (-i) 8271027
[1,0]:max locked memory   (kbytes, -l) unlimited
[1,0]:max memory size (kbytes, -m) unlimited
[1,0]:open files  (-n) 32768
[1,0]:pipe size(512 bytes, -p) 8
[1,0]:POSIX message queues (bytes, -q) 819200
[1,0]:real-time priority  (-r) 0
[1,0]:stack size  (kbytes, -s) unlimited
[1,0]:cpu time   (seconds, -t) unlimited
[1,0]:max user processes  (-u) 8192
[1,0]:virtual memory  (kbytes, -v) unlimited
[1,0]:file locks  (-x) unlimited
[1,1]:core file size  (blocks, -c) 0
[1,1]:data seg size   (kbytes, -d) unlimited
[1,1]:scheduling priority (-e) 0
[1,1]:file size   (blocks, -f) unlimited
[1,1]:pending signals (-i) 8271027
[1,1]:max locked memory   (kbytes, -l) unlimited
[1,1]:max memory size (kbytes, -m) unlimited
[1,1]:open files  (-n) 32768
[1,1]:pipe size(512 bytes, -p) 8
[1,1]:POSIX message queues (bytes, -q) 819200
[1,1]:real-time priority  (-r) 0
[1,1]:stack size  (kbytes, -s) unlimited
[1,1]:cpu time   (seconds, -t) unlimited
[1,1]:max user processes  (-u) 8192
[1,1]:virtual memory  (kbytes, -v) unlimited
[1,1]:file locks  (-x) unlimited
[1,2]:core file size  (blocks, -c) 0
[1,2]:data seg size   (kbytes, -d) unlimited
[1,2]:scheduling priority (-e) 0
[1,2]:file size   (blocks, -f) unlimited
[1,2]:pending signals (-i) 8271027
[1,2]:max locked memory   (kbytes, -l) unlimited
[1,2]:max memory size (kbytes, -m) unlimited
[1,2]:open files  (-n) 32768
[1,2]:pipe size(512 bytes, -p) 8
[1,2]:POSIX message queues (bytes, -q) 819200
[1,2]:real-time priority  (-r) 0
[1,2]:stack size  (kbytes, -s) unlimited
[1,2]:cpu time   (seconds, -t) unlimited
[1,2]:max user processes  (-u) 8192
[1,2]:virtual memory  (kbytes, -v) unlimited
[1,2]:file locks  (-x) unlimited
[1,3]:core file size  (blocks, -c) 0
[1,3]:data seg size   (kbytes, -d) unlimited
[1,3]:scheduling priority (-e) 0
[1,3]:file size   (blocks, -f) unlimited
[1,3]:pending signals (-i) 8271027
[1,3]:max locked memory   (kbytes, -l) unlimited
[1,3]:max memory size (kbytes, -m) unlimited
[1,3]:open files  (-n) 32768
[1,3]:pipe size(512 bytes, -p) 8
[1,3]:POSIX message queues (bytes, -q) 819200
[1,3]:real-time priority  (-r) 0
[1,3]:stack size  (kbytes, -s) unlimited
[1,3]:cpu time   (seconds, -t) unlimited
[1,3]:max user processes  (-u) 8192
[1,3]:virtual memory  (kbytes, -v) unlimited
[1,3]:file locks  (-x) unlimited


Re: [OMPI users] memory per core/process

2013-04-02 Thread Reuti
Hi,

Am 30.03.2013 um 15:35 schrieb Gustavo Correa:

> On Mar 30, 2013, at 10:02 AM, Duke Nguyen wrote:
> 
>> On 3/30/13 8:20 PM, Reuti wrote:
>>> Am 30.03.2013 um 13:26 schrieb Tim Prince:
>>> 
 On 03/30/2013 06:36 AM, Duke Nguyen wrote:
> On 3/30/13 5:22 PM, Duke Nguyen wrote:
>> On 3/30/13 3:13 PM, Patrick Bégou wrote:
>>> I do not know about your code but:
>>> 
>>> 1) did you check stack limitations ? Typically intel fortran codes 
>>> needs large amount of stack when the problem size increase.
>>> Check ulimit -a
>> First time I heard of stack limitations. Anyway, ulimit -a gives
>> 
>> $ ulimit -a
>> core file size  (blocks, -c) 0
>> data seg size   (kbytes, -d) unlimited
>> scheduling priority (-e) 0
>> file size   (blocks, -f) unlimited
>> pending signals (-i) 127368
>> max locked memory   (kbytes, -l) unlimited
>> max memory size (kbytes, -m) unlimited
>> open files  (-n) 1024
>> pipe size(512 bytes, -p) 8
>> POSIX message queues (bytes, -q) 819200
>> real-time priority  (-r) 0
>> stack size  (kbytes, -s) 10240
>> cpu time   (seconds, -t) unlimited
>> max user processes  (-u) 1024
>> virtual memory  (kbytes, -v) unlimited
>> file locks  (-x) unlimited
>> 
>> So stack size is 10MB??? Does this one create problem? How do I change 
>> this?
> I did $ ulimit -s unlimited to have stack size to be unlimited, and the 
> job ran fine!!! So it looks like stack limit is the problem. Questions 
> are:
> 
> * how do I set this automatically (and permanently)?
> * should I set all other ulimits to be unlimited?
> 
 In our environment, the only solution we found is to have mpirun run a 
 script on each node which sets ulimit (as well as environment variables 
 which are more convenient to set there than in the mpirun), before 
 starting the executable.  We had expert recommendations against this but 
 no other working solution.  It seems unlikely that you would want to 
 remove any limits which work at default.
 Stack size unlimited in reality is not unlimited; it may be limited by a 
 system limit or implementation.  As we run up to 120 threads per rank and 
 many applications have threadprivate data regions, ability to run without 
 considering stack limit is the exception rather than the rule.
>>> Even if I would be the only user on a cluster of machines, I would define 
>>> this in any queuingsystem to set the limits for the job.
>> 
>> Sorry if I dont get this correctly, but do you mean I should set this using 
>> Torque/Maui (our queuing manager) instead of the system itself 
>> (/etc/security/limits.conf and /etc/profile.d/)?

Yes, or per queue/job.


> Hi Duke
> 
> We do both.
> Set memlock and stacksize to unlimited, and increase the maximum number of
> open files  in the pbs_mom script in /etc/init.d, and do the same in 
> /etc/security/limits.conf.
> This maybe an overzealous  "belt and suspenders" policy, but it works.
> As everybody else said, a small stacksize is a common cause of segmentation 
> fault in
> large codes.

This way it would be fixed in the overall cluster and not per job - or? I saw 
situations, where with a limited virtual memory for a job, the stack size has 
to be set to a low value in range of a few ten megabytes only.

Whether such a request is possible depends on the queuingsystem though. In 
GridEngine it's possible, I'm not sure about Torque/PBS.

-- Reuti


> Basically all codes that we run here have this problem, with too many
> automatic arrays, structures, etc in functions and subroutines. 
> But also a small memlock is trouble for OFED/Infinband, and the small 
> (default) 
> max number of open file handles may hit the limit easily if many programs 
> (or poorly written  programs) are running in the same node.
> The default Linux distribution limits don't seem to be tailored for HPC, I 
> guess.
> 
> I hope this helps,
> Gus Correa 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 




Re: [OMPI users] memory per core/process

2013-04-02 Thread Reuti
Hi,

Am 30.03.2013 um 14:46 schrieb Patrick Bégou:

> Ok, so your problem is identified as a stack size problem. I went into these 
> limitations using Intel fortran compilers on large data problems.
> 
> First, it seems you can increase your stack size as "ulimit -s unlimited" 
> works (you didn't enforce the system hard limit). The best way  is to set 
> this setting in your .bashrc file so it will works on every node.
> But setting it to unlimited may not be really safe. IE, if you got in a badly 
> coded recursive function calling itself without a stop condition you can 
> request all the system memory and crash the node. So set a large but limited 
> value, it's safer.
> 
> I'm managing a cluster and I always set a maximum value to stack size. I also 
> limit the memory available for each core for system stability. If a user 
> request only one of the 12 cores of a node he can only access 1/12 of the 
> node memory amount. If he needs more memory he has to request 2 cores, even 
> if he uses a sequential code. This avoid crashing jobs of other users on the 
> same node with memory requirements. But this is not configured on your node.

This is one way to implement memory limits as a policy - it's up to the user to 
request the correct number of cores then although he wants to run a serial job 
only. Personally I prefer that the user specifies the requested memory in such 
a case. It's up to the queuingsystem then to avoid that additional jobs are 
scheduled to a machine unless the remaining memory is sufficient for their 
execution in such a situation.

-- Reuti


> Duke Nguyen a écrit :
>> On 3/30/13 3:13 PM, Patrick Bégou wrote:
>>> I do not know about your code but:
>>> 
>>> 1) did you check stack limitations ? Typically intel fortran codes needs 
>>> large amount of stack when the problem size increase.
>>> Check ulimit -a
>> 
>> First time I heard of stack limitations. Anyway, ulimit -a gives
>> 
>> $ ulimit -a
>> core file size  (blocks, -c) 0
>> data seg size   (kbytes, -d) unlimited
>> scheduling priority (-e) 0
>> file size   (blocks, -f) unlimited
>> pending signals (-i) 127368
>> max locked memory   (kbytes, -l) unlimited
>> max memory size (kbytes, -m) unlimited
>> open files  (-n) 1024
>> pipe size(512 bytes, -p) 8
>> POSIX message queues (bytes, -q) 819200
>> real-time priority  (-r) 0
>> stack size  (kbytes, -s) 10240
>> cpu time   (seconds, -t) unlimited
>> max user processes  (-u) 1024
>> virtual memory  (kbytes, -v) unlimited
>> file locks  (-x) unlimited
>> 
>> So stack size is 10MB??? Does this one create problem? How do I change this?
>> 
>>> 
>>> 2) did your node uses cpuset and memory limitation like fake numa to set 
>>> the maximum amount of memory available for a job ?
>> 
>> Not really understand (also first time heard of fake numa), but I am pretty 
>> sure we do not have such things. The server I tried was a dedicated server 
>> with 2 x5420 and 16GB physical memory.
>> 
>>> 
>>> Patrick
>>> 
>>> Duke Nguyen a écrit :
 Hi folks,
 
 I am sorry if this question had been asked before, but after ten days of 
 searching/working on the system, I surrender :(. We try to use mpirun to 
 run abinit (abinit.org) which in turns will call an input file to run some 
 simulation. The command to run is pretty simple
 
 $ mpirun -np 4 /opt/apps/abinit/bin/abinit < input.files >& output.log
 
 We ran this command on a server with two quad core x5420 and 16GB of 
 memory. I called only 4 core, and I guess in theory each of the core 
 should take up to 2GB each.
 
 In the output of the log, there is something about memory:
 
 P This job should need less than 717.175 Mbytes of 
 memory.
  Rough estimation (10% accuracy) of disk space for files :
  WF disk file : 69.524 Mbytes ; DEN or POT disk file : 14.240 Mbytes.
 
 So basically it reported that the above job should not take more than 
 718MB each core.
 
 But I still have the Segmentation Fault error:
 
 mpirun noticed that process rank 0 with PID 16099 on node biobos exited on 
 signal 11 (Segmentation fault).
 
 The system already has limits up to unlimited:
 
 $ cat /etc/security/limits.conf | grep -v '#'
 * soft memlock unlimited
 * hard memlock unlimited
 
 I also tried to run
 
 $ ulimit -l unlimited
 
 before the mpirun command above, but it did not help at all.
 
 If we adjust the parameters of the input.files to give the reported mem 
 per core is less than 512MB, then the job runs fine.
 
 Please help,
 
 Thanks,
 
 D.
 
 
 ___
 users mailing list
 us...@open-mpi.org
 

Re: [OMPI users] memory per core/process

2013-04-02 Thread Reuti
Hi,

Am 02.04.2013 um 13:22 schrieb Duke Nguyen:

> On 4/1/13 9:20 PM, Ralph Castain wrote:
>> It's probably the same problem - try running 'mpirun -npernode 1 -tag-output 
>> ulimit -a"  on the remote nodes and see what it says. I suspect you'll find 
>> that they aren't correct.
> 
> Somehow I could not run your advised CMD:
> 
> $ qsub -l nodes=4:ppn=8 -I
> qsub: waiting for job 481.biobos to start
> qsub: job 481.biobos ready
> 
> $ /usr/local/bin/mpirun -npernode 1 -tag-output ulimit -a
> --
> mpirun was unable to launch the specified application as it could not find an 
> executable:

`ulimit` is a shell builtin:

$ type ulimit
ulimit is a shell builtin

It should work wit:

$ /usr/local/bin/mpirun -npernode 1 -tag-output  sh -c "ulimit -a"

-- Reuti


> Executable: ulimit
> Node: node0108.biobos
> 
> while attempting to start process rank 0.
> --
> 4 total processes failed to start
> 
> But anyway, I figured out the reason. Yes, it is the cluster nodes that did 
> not update ulimit settings (our system is a diskless node with warewulf so 
> basically we have to update the vnfs and reboot all nodes before the nodes 
> can run with new settings).
> 
> Thanks for all the helps :)
> 
> D.
> 
>> 
>> BTW: the "-tag-output'" option marks each line of output with the rank of 
>> the process. Since all the outputs will be interleaved, this will help you 
>> identify what came from each node.
>> 
>> 
>> On Mar 31, 2013, at 11:30 PM, Duke Nguyen  wrote:
>> 
>>> On 3/31/13 12:20 AM, Duke Nguyen wrote:
 I should really have asked earlier. Thanks for all the helps.
>>> I think I was excited too soon :). Increasing stacksize does help if I run 
>>> a job in a dedicated server. Today I tried to modify the cluster 
>>> (/etc/security/limits.conf, /etc/init.d/pbs_mom) and tried to run a 
>>> different job with 4 nodes/8 core each (nodes=4:ppn=8), but I still get the 
>>> mpirun error. My ulimit now reads:
>>> 
>>> $ ulimit -a
>>> core file size  (blocks, -c) 0
>>> data seg size   (kbytes, -d) unlimited
>>> scheduling priority (-e) 0
>>> file size   (blocks, -f) unlimited
>>> pending signals (-i) 8271027
>>> max locked memory   (kbytes, -l) unlimited
>>> max memory size (kbytes, -m) unlimited
>>> open files  (-n) 32768
>>> pipe size(512 bytes, -p) 8
>>> POSIX message queues (bytes, -q) 819200
>>> real-time priority  (-r) 0
>>> stack size  (kbytes, -s) unlimited
>>> cpu time   (seconds, -t) unlimited
>>> max user processes  (-u) 8192
>>> virtual memory  (kbytes, -v) unlimited
>>> file locks  (-x) unlimited
>>> 
>>> Any other advice???
>>> 
 On 3/30/13 10:28 PM, Ralph Castain wrote:
> FWIW: there is an MCA param that helps with such problems:
> 
> opal_set_max_sys_limits
>  "Set to non-zero to automatically set any system-imposed 
> limits to the maximum allowed",
> 
> At the moment, it only sets the limits on number of files open, and max 
> size of a file we can create. Easy enough to add the stack size, though 
> as someone pointed out, it has some negatives as well.
> 
> 
> On Mar 30, 2013, at 7:35 AM, Gustavo Correa  
> wrote:
> 
>> On Mar 30, 2013, at 10:02 AM, Duke Nguyen wrote:
>> 
>>> On 3/30/13 8:20 PM, Reuti wrote:
 Am 30.03.2013 um 13:26 schrieb Tim Prince:
 
> On 03/30/2013 06:36 AM, Duke Nguyen wrote:
>> On 3/30/13 5:22 PM, Duke Nguyen wrote:
>>> On 3/30/13 3:13 PM, Patrick Bégou wrote:
 I do not know about your code but:
 
 1) did you check stack limitations ? Typically intel fortran codes 
 needs large amount of stack when the problem size increase.
 Check ulimit -a
>>> First time I heard of stack limitations. Anyway, ulimit -a gives
>>> 
>>> $ ulimit -a
>>> core file size  (blocks, -c) 0
>>> data seg size   (kbytes, -d) unlimited
>>> scheduling priority (-e) 0
>>> file size   (blocks, -f) unlimited
>>> pending signals (-i) 127368
>>> max locked memory   (kbytes, -l) unlimited
>>> max memory size (kbytes, -m) unlimited
>>> open files  (-n) 1024
>>> pipe size(512 bytes, -p) 8
>>> POSIX message queues (bytes, -q) 819200
>>> real-time priority  (-r) 0
>>> stack size  (kbytes, -s) 10240
>>> cpu time   (seconds, -t) unlimited
>>> max user processes  (-u) 1024

Re: [OMPI users] memory per core/process

2013-04-02 Thread Duke Nguyen

On 4/1/13 9:20 PM, Ralph Castain wrote:

It's probably the same problem - try running 'mpirun -npernode 1 -tag-output ulimit 
-a"  on the remote nodes and see what it says. I suspect you'll find that they 
aren't correct.


Somehow I could not run your advised CMD:

$ qsub -l nodes=4:ppn=8 -I
qsub: waiting for job 481.biobos to start
qsub: job 481.biobos ready

$ /usr/local/bin/mpirun -npernode 1 -tag-output ulimit -a
--
mpirun was unable to launch the specified application as it could not 
find an executable:


Executable: ulimit
Node: node0108.biobos

while attempting to start process rank 0.
--
4 total processes failed to start

But anyway, I figured out the reason. Yes, it is the cluster nodes that 
did not update ulimit settings (our system is a diskless node with 
warewulf so basically we have to update the vnfs and reboot all nodes 
before the nodes can run with new settings).


Thanks for all the helps :)

D.



BTW: the "-tag-output'" option marks each line of output with the rank of the 
process. Since all the outputs will be interleaved, this will help you identify what came 
from each node.


On Mar 31, 2013, at 11:30 PM, Duke Nguyen  wrote:


On 3/31/13 12:20 AM, Duke Nguyen wrote:

I should really have asked earlier. Thanks for all the helps.

I think I was excited too soon :). Increasing stacksize does help if I run a 
job in a dedicated server. Today I tried to modify the cluster 
(/etc/security/limits.conf, /etc/init.d/pbs_mom) and tried to run a different 
job with 4 nodes/8 core each (nodes=4:ppn=8), but I still get the mpirun error. 
My ulimit now reads:

$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 8271027
max locked memory   (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files  (-n) 32768
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) unlimited
cpu time   (seconds, -t) unlimited
max user processes  (-u) 8192
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

Any other advice???


On 3/30/13 10:28 PM, Ralph Castain wrote:

FWIW: there is an MCA param that helps with such problems:

 opal_set_max_sys_limits
  "Set to non-zero to automatically set any system-imposed limits to 
the maximum allowed",

At the moment, it only sets the limits on number of files open, and max size of 
a file we can create. Easy enough to add the stack size, though as someone 
pointed out, it has some negatives as well.


On Mar 30, 2013, at 7:35 AM, Gustavo Correa  wrote:


On Mar 30, 2013, at 10:02 AM, Duke Nguyen wrote:


On 3/30/13 8:20 PM, Reuti wrote:

Am 30.03.2013 um 13:26 schrieb Tim Prince:


On 03/30/2013 06:36 AM, Duke Nguyen wrote:

On 3/30/13 5:22 PM, Duke Nguyen wrote:

On 3/30/13 3:13 PM, Patrick Bégou wrote:

I do not know about your code but:

1) did you check stack limitations ? Typically intel fortran codes needs large 
amount of stack when the problem size increase.
Check ulimit -a

First time I heard of stack limitations. Anyway, ulimit -a gives

$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 127368
max locked memory   (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 10240
cpu time   (seconds, -t) unlimited
max user processes  (-u) 1024
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

So stack size is 10MB??? Does this one create problem? How do I change this?

I did $ ulimit -s unlimited to have stack size to be unlimited, and the job ran 
fine!!! So it looks like stack limit is the problem. Questions are:

* how do I set this automatically (and permanently)?
* should I set all other ulimits to be unlimited?


In our environment, the only solution we found is to have mpirun run a script 
on each node which sets ulimit (as well as environment variables which are more 
convenient to set there than in the mpirun), before starting the executable.  
We had expert recommendations against this but no other working solution.  It 
seems unlikely that you would want to remove any limits which work at default.
Stack size 

Re: [OMPI users] memory per core/process

2013-04-01 Thread Duke Nguyen

On 3/31/13 12:20 AM, Duke Nguyen wrote:

I should really have asked earlier. Thanks for all the helps.


I think I was excited too soon :). Increasing stacksize does help if I 
run a job in a dedicated server. Today I tried to modify the cluster 
(/etc/security/limits.conf, /etc/init.d/pbs_mom) and tried to run a 
different job with 4 nodes/8 core each (nodes=4:ppn=8), but I still get 
the mpirun error. My ulimit now reads:


$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 8271027
max locked memory   (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files  (-n) 32768
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) unlimited
cpu time   (seconds, -t) unlimited
max user processes  (-u) 8192
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

Any other advice???



On 3/30/13 10:28 PM, Ralph Castain wrote:

FWIW: there is an MCA param that helps with such problems:

 opal_set_max_sys_limits
  "Set to non-zero to automatically set any 
system-imposed limits to the maximum allowed",


At the moment, it only sets the limits on number of files open, and 
max size of a file we can create. Easy enough to add the stack size, 
though as someone pointed out, it has some negatives as well.



On Mar 30, 2013, at 7:35 AM, Gustavo Correa  
wrote:



On Mar 30, 2013, at 10:02 AM, Duke Nguyen wrote:


On 3/30/13 8:20 PM, Reuti wrote:

Am 30.03.2013 um 13:26 schrieb Tim Prince:


On 03/30/2013 06:36 AM, Duke Nguyen wrote:

On 3/30/13 5:22 PM, Duke Nguyen wrote:

On 3/30/13 3:13 PM, Patrick Bégou wrote:

I do not know about your code but:

1) did you check stack limitations ? Typically intel fortran 
codes needs large amount of stack when the problem size increase.

Check ulimit -a

First time I heard of stack limitations. Anyway, ulimit -a gives

$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 127368
max locked memory   (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 10240
cpu time   (seconds, -t) unlimited
max user processes  (-u) 1024
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

So stack size is 10MB??? Does this one create problem? How do I 
change this?
I did $ ulimit -s unlimited to have stack size to be unlimited, 
and the job ran fine!!! So it looks like stack limit is the 
problem. Questions are:


* how do I set this automatically (and permanently)?
* should I set all other ulimits to be unlimited?

In our environment, the only solution we found is to have mpirun 
run a script on each node which sets ulimit (as well as 
environment variables which are more convenient to set there than 
in the mpirun), before starting the executable.  We had expert 
recommendations against this but no other working solution.  It 
seems unlikely that you would want to remove any limits which 
work at default.
Stack size unlimited in reality is not unlimited; it may be 
limited by a system limit or implementation.  As we run up to 120 
threads per rank and many applications have threadprivate data 
regions, ability to run without considering stack limit is the 
exception rather than the rule.
Even if I would be the only user on a cluster of machines, I would 
define this in any queuingsystem to set the limits for the job.
Sorry if I dont get this correctly, but do you mean I should set 
this using Torque/Maui (our queuing manager) instead of the system 
itself (/etc/security/limits.conf and /etc/profile.d/)?

Hi Duke

We do both.
Set memlock and stacksize to unlimited, and increase the maximum 
number of
open files  in the pbs_mom script in /etc/init.d, and do the same in 
/etc/security/limits.conf.

This maybe an overzealous  "belt and suspenders" policy, but it works.
As everybody else said, a small stacksize is a common cause of 
segmentation fault in

large codes.
Basically all codes that we run here have this problem, with too many
automatic arrays, structures, etc in functions and subroutines.
But also a small memlock is trouble for OFED/Infinband, and the 
small (default)
max number of open file handles may hit the limit easily if many 
programs

(or poorly written  programs) are running in the same node.
The default Linux 

Re: [OMPI users] memory per core/process

2013-03-30 Thread Duke Nguyen

I should really have asked earlier. Thanks for all the helps.

D.

On 3/30/13 10:28 PM, Ralph Castain wrote:

FWIW: there is an MCA param that helps with such problems:

 opal_set_max_sys_limits
  "Set to non-zero to automatically set any system-imposed limits to 
the maximum allowed",

At the moment, it only sets the limits on number of files open, and max size of 
a file we can create. Easy enough to add the stack size, though as someone 
pointed out, it has some negatives as well.


On Mar 30, 2013, at 7:35 AM, Gustavo Correa  wrote:


On Mar 30, 2013, at 10:02 AM, Duke Nguyen wrote:


On 3/30/13 8:20 PM, Reuti wrote:

Am 30.03.2013 um 13:26 schrieb Tim Prince:


On 03/30/2013 06:36 AM, Duke Nguyen wrote:

On 3/30/13 5:22 PM, Duke Nguyen wrote:

On 3/30/13 3:13 PM, Patrick Bégou wrote:

I do not know about your code but:

1) did you check stack limitations ? Typically intel fortran codes needs large 
amount of stack when the problem size increase.
Check ulimit -a

First time I heard of stack limitations. Anyway, ulimit -a gives

$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 127368
max locked memory   (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 10240
cpu time   (seconds, -t) unlimited
max user processes  (-u) 1024
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

So stack size is 10MB??? Does this one create problem? How do I change this?

I did $ ulimit -s unlimited to have stack size to be unlimited, and the job ran 
fine!!! So it looks like stack limit is the problem. Questions are:

* how do I set this automatically (and permanently)?
* should I set all other ulimits to be unlimited?


In our environment, the only solution we found is to have mpirun run a script 
on each node which sets ulimit (as well as environment variables which are more 
convenient to set there than in the mpirun), before starting the executable.  
We had expert recommendations against this but no other working solution.  It 
seems unlikely that you would want to remove any limits which work at default.
Stack size unlimited in reality is not unlimited; it may be limited by a system 
limit or implementation.  As we run up to 120 threads per rank and many 
applications have threadprivate data regions, ability to run without 
considering stack limit is the exception rather than the rule.

Even if I would be the only user on a cluster of machines, I would define this 
in any queuingsystem to set the limits for the job.

Sorry if I dont get this correctly, but do you mean I should set this using 
Torque/Maui (our queuing manager) instead of the system itself 
(/etc/security/limits.conf and /etc/profile.d/)?

Hi Duke

We do both.
Set memlock and stacksize to unlimited, and increase the maximum number of
open files  in the pbs_mom script in /etc/init.d, and do the same in 
/etc/security/limits.conf.
This maybe an overzealous  "belt and suspenders" policy, but it works.
As everybody else said, a small stacksize is a common cause of segmentation 
fault in
large codes.
Basically all codes that we run here have this problem, with too many
automatic arrays, structures, etc in functions and subroutines.
But also a small memlock is trouble for OFED/Infinband, and the small (default)
max number of open file handles may hit the limit easily if many programs
(or poorly written  programs) are running in the same node.
The default Linux distribution limits don't seem to be tailored for HPC, I 
guess.

I hope this helps,
Gus Correa


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





Re: [OMPI users] memory per core/process

2013-03-30 Thread Ralph Castain
FWIW: there is an MCA param that helps with such problems:

opal_set_max_sys_limits
 "Set to non-zero to automatically set any system-imposed 
limits to the maximum allowed",

At the moment, it only sets the limits on number of files open, and max size of 
a file we can create. Easy enough to add the stack size, though as someone 
pointed out, it has some negatives as well.


On Mar 30, 2013, at 7:35 AM, Gustavo Correa  wrote:

> 
> On Mar 30, 2013, at 10:02 AM, Duke Nguyen wrote:
> 
>> On 3/30/13 8:20 PM, Reuti wrote:
>>> Am 30.03.2013 um 13:26 schrieb Tim Prince:
>>> 
 On 03/30/2013 06:36 AM, Duke Nguyen wrote:
> On 3/30/13 5:22 PM, Duke Nguyen wrote:
>> On 3/30/13 3:13 PM, Patrick Bégou wrote:
>>> I do not know about your code but:
>>> 
>>> 1) did you check stack limitations ? Typically intel fortran codes 
>>> needs large amount of stack when the problem size increase.
>>> Check ulimit -a
>> First time I heard of stack limitations. Anyway, ulimit -a gives
>> 
>> $ ulimit -a
>> core file size  (blocks, -c) 0
>> data seg size   (kbytes, -d) unlimited
>> scheduling priority (-e) 0
>> file size   (blocks, -f) unlimited
>> pending signals (-i) 127368
>> max locked memory   (kbytes, -l) unlimited
>> max memory size (kbytes, -m) unlimited
>> open files  (-n) 1024
>> pipe size(512 bytes, -p) 8
>> POSIX message queues (bytes, -q) 819200
>> real-time priority  (-r) 0
>> stack size  (kbytes, -s) 10240
>> cpu time   (seconds, -t) unlimited
>> max user processes  (-u) 1024
>> virtual memory  (kbytes, -v) unlimited
>> file locks  (-x) unlimited
>> 
>> So stack size is 10MB??? Does this one create problem? How do I change 
>> this?
> I did $ ulimit -s unlimited to have stack size to be unlimited, and the 
> job ran fine!!! So it looks like stack limit is the problem. Questions 
> are:
> 
> * how do I set this automatically (and permanently)?
> * should I set all other ulimits to be unlimited?
> 
 In our environment, the only solution we found is to have mpirun run a 
 script on each node which sets ulimit (as well as environment variables 
 which are more convenient to set there than in the mpirun), before 
 starting the executable.  We had expert recommendations against this but 
 no other working solution.  It seems unlikely that you would want to 
 remove any limits which work at default.
 Stack size unlimited in reality is not unlimited; it may be limited by a 
 system limit or implementation.  As we run up to 120 threads per rank and 
 many applications have threadprivate data regions, ability to run without 
 considering stack limit is the exception rather than the rule.
>>> Even if I would be the only user on a cluster of machines, I would define 
>>> this in any queuingsystem to set the limits for the job.
>> 
>> Sorry if I dont get this correctly, but do you mean I should set this using 
>> Torque/Maui (our queuing manager) instead of the system itself 
>> (/etc/security/limits.conf and /etc/profile.d/)?
> 
> Hi Duke
> 
> We do both.
> Set memlock and stacksize to unlimited, and increase the maximum number of
> open files  in the pbs_mom script in /etc/init.d, and do the same in 
> /etc/security/limits.conf.
> This maybe an overzealous  "belt and suspenders" policy, but it works.
> As everybody else said, a small stacksize is a common cause of segmentation 
> fault in
> large codes.
> Basically all codes that we run here have this problem, with too many
> automatic arrays, structures, etc in functions and subroutines. 
> But also a small memlock is trouble for OFED/Infinband, and the small 
> (default) 
> max number of open file handles may hit the limit easily if many programs 
> (or poorly written  programs) are running in the same node.
> The default Linux distribution limits don't seem to be tailored for HPC, I 
> guess.
> 
> I hope this helps,
> Gus Correa 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] memory per core/process

2013-03-30 Thread Gustavo Correa

On Mar 30, 2013, at 10:02 AM, Duke Nguyen wrote:

> On 3/30/13 8:20 PM, Reuti wrote:
>> Am 30.03.2013 um 13:26 schrieb Tim Prince:
>> 
>>> On 03/30/2013 06:36 AM, Duke Nguyen wrote:
 On 3/30/13 5:22 PM, Duke Nguyen wrote:
> On 3/30/13 3:13 PM, Patrick Bégou wrote:
>> I do not know about your code but:
>> 
>> 1) did you check stack limitations ? Typically intel fortran codes needs 
>> large amount of stack when the problem size increase.
>> Check ulimit -a
> First time I heard of stack limitations. Anyway, ulimit -a gives
> 
> $ ulimit -a
> core file size  (blocks, -c) 0
> data seg size   (kbytes, -d) unlimited
> scheduling priority (-e) 0
> file size   (blocks, -f) unlimited
> pending signals (-i) 127368
> max locked memory   (kbytes, -l) unlimited
> max memory size (kbytes, -m) unlimited
> open files  (-n) 1024
> pipe size(512 bytes, -p) 8
> POSIX message queues (bytes, -q) 819200
> real-time priority  (-r) 0
> stack size  (kbytes, -s) 10240
> cpu time   (seconds, -t) unlimited
> max user processes  (-u) 1024
> virtual memory  (kbytes, -v) unlimited
> file locks  (-x) unlimited
> 
> So stack size is 10MB??? Does this one create problem? How do I change 
> this?
 I did $ ulimit -s unlimited to have stack size to be unlimited, and the 
 job ran fine!!! So it looks like stack limit is the problem. Questions are:
 
 * how do I set this automatically (and permanently)?
 * should I set all other ulimits to be unlimited?
 
>>> In our environment, the only solution we found is to have mpirun run a 
>>> script on each node which sets ulimit (as well as environment variables 
>>> which are more convenient to set there than in the mpirun), before starting 
>>> the executable.  We had expert recommendations against this but no other 
>>> working solution.  It seems unlikely that you would want to remove any 
>>> limits which work at default.
>>> Stack size unlimited in reality is not unlimited; it may be limited by a 
>>> system limit or implementation.  As we run up to 120 threads per rank and 
>>> many applications have threadprivate data regions, ability to run without 
>>> considering stack limit is the exception rather than the rule.
>> Even if I would be the only user on a cluster of machines, I would define 
>> this in any queuingsystem to set the limits for the job.
> 
> Sorry if I dont get this correctly, but do you mean I should set this using 
> Torque/Maui (our queuing manager) instead of the system itself 
> (/etc/security/limits.conf and /etc/profile.d/)?

Hi Duke

We do both.
Set memlock and stacksize to unlimited, and increase the maximum number of
open files  in the pbs_mom script in /etc/init.d, and do the same in 
/etc/security/limits.conf.
This maybe an overzealous  "belt and suspenders" policy, but it works.
As everybody else said, a small stacksize is a common cause of segmentation 
fault in
large codes.
Basically all codes that we run here have this problem, with too many
automatic arrays, structures, etc in functions and subroutines. 
But also a small memlock is trouble for OFED/Infinband, and the small (default) 
max number of open file handles may hit the limit easily if many programs 
(or poorly written  programs) are running in the same node.
The default Linux distribution limits don't seem to be tailored for HPC, I 
guess.

I hope this helps,
Gus Correa 




Re: [OMPI users] memory per core/process

2013-03-30 Thread Duke Nguyen

On 3/30/13 8:20 PM, Reuti wrote:

Am 30.03.2013 um 13:26 schrieb Tim Prince:


On 03/30/2013 06:36 AM, Duke Nguyen wrote:

On 3/30/13 5:22 PM, Duke Nguyen wrote:

On 3/30/13 3:13 PM, Patrick Bégou wrote:

I do not know about your code but:

1) did you check stack limitations ? Typically intel fortran codes needs large 
amount of stack when the problem size increase.
Check ulimit -a

First time I heard of stack limitations. Anyway, ulimit -a gives

$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 127368
max locked memory   (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 10240
cpu time   (seconds, -t) unlimited
max user processes  (-u) 1024
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

So stack size is 10MB??? Does this one create problem? How do I change this?

I did $ ulimit -s unlimited to have stack size to be unlimited, and the job ran 
fine!!! So it looks like stack limit is the problem. Questions are:

* how do I set this automatically (and permanently)?
* should I set all other ulimits to be unlimited?


In our environment, the only solution we found is to have mpirun run a script 
on each node which sets ulimit (as well as environment variables which are more 
convenient to set there than in the mpirun), before starting the executable.  
We had expert recommendations against this but no other working solution.  It 
seems unlikely that you would want to remove any limits which work at default.
Stack size unlimited in reality is not unlimited; it may be limited by a system 
limit or implementation.  As we run up to 120 threads per rank and many 
applications have threadprivate data regions, ability to run without 
considering stack limit is the exception rather than the rule.

Even if I would be the only user on a cluster of machines, I would define this 
in any queuingsystem to set the limits for the job.


Sorry if I dont get this correctly, but do you mean I should set this 
using Torque/Maui (our queuing manager) instead of the system itself 
(/etc/security/limits.conf and /etc/profile.d/)?




Re: [OMPI users] memory per core/process

2013-03-30 Thread Patrick Bégou
Ok, so your problem is identified as a stack size problem. I went into 
these limitations using Intel fortran compilers on large data problems.


First, it seems you can increase your stack size as "ulimit -s 
unlimited" works (you didn't enforce the system hard limit). The best 
way  is to set this setting in your .bashrc file so it will works on 
every node.
But setting it to unlimited may not be really safe. IE, if you got in a 
badly coded recursive function calling itself without a stop condition 
you can request all the system memory and crash the node. So set a large 
but limited value, it's safer.


I'm managing a cluster and I always set a maximum value to stack size. I 
also limit the memory available for each core for system stability. If a 
user request only one of the 12 cores of a node he can only access 1/12 
of the node memory amount. If he needs more memory he has to request 2 
cores, even if he uses a sequential code. This avoid crashing jobs of 
other users on the same node with memory requirements. But this is not 
configured on your node.


Duke Nguyen a écrit :

On 3/30/13 3:13 PM, Patrick Bégou wrote:

I do not know about your code but:

1) did you check stack limitations ? Typically intel fortran codes 
needs large amount of stack when the problem size increase.

Check ulimit -a


First time I heard of stack limitations. Anyway, ulimit -a gives

$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 127368
max locked memory   (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 10240
cpu time   (seconds, -t) unlimited
max user processes  (-u) 1024
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

So stack size is 10MB??? Does this one create problem? How do I change 
this?




2) did your node uses cpuset and memory limitation like fake numa to 
set the maximum amount of memory available for a job ?


Not really understand (also first time heard of fake numa), but I am 
pretty sure we do not have such things. The server I tried was a 
dedicated server with 2 x5420 and 16GB physical memory.




Patrick

Duke Nguyen a écrit :

Hi folks,

I am sorry if this question had been asked before, but after ten 
days of searching/working on the system, I surrender :(. We try to 
use mpirun to run abinit (abinit.org) which in turns will call an 
input file to run some simulation. The command to run is pretty simple


$ mpirun -np 4 /opt/apps/abinit/bin/abinit < input.files >& output.log

We ran this command on a server with two quad core x5420 and 16GB of 
memory. I called only 4 core, and I guess in theory each of the core 
should take up to 2GB each.


In the output of the log, there is something about memory:

P This job should need less than 717.175 Mbytes 
of memory.

  Rough estimation (10% accuracy) of disk space for files :
  WF disk file : 69.524 Mbytes ; DEN or POT disk file : 14.240 
Mbytes.


So basically it reported that the above job should not take more 
than 718MB each core.


But I still have the Segmentation Fault error:

mpirun noticed that process rank 0 with PID 16099 on node biobos 
exited on signal 11 (Segmentation fault).


The system already has limits up to unlimited:

$ cat /etc/security/limits.conf | grep -v '#'
* soft memlock unlimited
* hard memlock unlimited

I also tried to run

$ ulimit -l unlimited

before the mpirun command above, but it did not help at all.

If we adjust the parameters of the input.files to give the reported 
mem per core is less than 512MB, then the job runs fine.


Please help,

Thanks,

D.


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





Re: [OMPI users] memory per core/process

2013-03-30 Thread Reuti
Am 30.03.2013 um 13:26 schrieb Tim Prince:

> On 03/30/2013 06:36 AM, Duke Nguyen wrote:
>> On 3/30/13 5:22 PM, Duke Nguyen wrote:
>>> On 3/30/13 3:13 PM, Patrick Bégou wrote:
 I do not know about your code but:
 
 1) did you check stack limitations ? Typically intel fortran codes needs 
 large amount of stack when the problem size increase.
 Check ulimit -a
>>> 
>>> First time I heard of stack limitations. Anyway, ulimit -a gives
>>> 
>>> $ ulimit -a
>>> core file size  (blocks, -c) 0
>>> data seg size   (kbytes, -d) unlimited
>>> scheduling priority (-e) 0
>>> file size   (blocks, -f) unlimited
>>> pending signals (-i) 127368
>>> max locked memory   (kbytes, -l) unlimited
>>> max memory size (kbytes, -m) unlimited
>>> open files  (-n) 1024
>>> pipe size(512 bytes, -p) 8
>>> POSIX message queues (bytes, -q) 819200
>>> real-time priority  (-r) 0
>>> stack size  (kbytes, -s) 10240
>>> cpu time   (seconds, -t) unlimited
>>> max user processes  (-u) 1024
>>> virtual memory  (kbytes, -v) unlimited
>>> file locks  (-x) unlimited
>>> 
>>> So stack size is 10MB??? Does this one create problem? How do I change this?
>> 
>> I did $ ulimit -s unlimited to have stack size to be unlimited, and the job 
>> ran fine!!! So it looks like stack limit is the problem. Questions are:
>> 
>> * how do I set this automatically (and permanently)?
>> * should I set all other ulimits to be unlimited?
>> 
> In our environment, the only solution we found is to have mpirun run a script 
> on each node which sets ulimit (as well as environment variables which are 
> more convenient to set there than in the mpirun), before starting the 
> executable.  We had expert recommendations against this but no other working 
> solution.  It seems unlikely that you would want to remove any limits which 
> work at default.
> Stack size unlimited in reality is not unlimited; it may be limited by a 
> system limit or implementation.  As we run up to 120 threads per rank and 
> many applications have threadprivate data regions, ability to run without 
> considering stack limit is the exception rather than the rule.

Even if I would be the only user on a cluster of machines, I would define this 
in any queuingsystem to set the limits for the job.

-- Reuti


> -- 
> Tim Prince
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 




Re: [OMPI users] memory per core/process

2013-03-30 Thread Tim Prince

On 03/30/2013 06:36 AM, Duke Nguyen wrote:

On 3/30/13 5:22 PM, Duke Nguyen wrote:

On 3/30/13 3:13 PM, Patrick Bégou wrote:

I do not know about your code but:

1) did you check stack limitations ? Typically intel fortran codes 
needs large amount of stack when the problem size increase.

Check ulimit -a


First time I heard of stack limitations. Anyway, ulimit -a gives

$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 127368
max locked memory   (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 10240
cpu time   (seconds, -t) unlimited
max user processes  (-u) 1024
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

So stack size is 10MB??? Does this one create problem? How do I 
change this?


I did $ ulimit -s unlimited to have stack size to be unlimited, and 
the job ran fine!!! So it looks like stack limit is the problem. 
Questions are:


 * how do I set this automatically (and permanently)?
 * should I set all other ulimits to be unlimited?

In our environment, the only solution we found is to have mpirun run a 
script on each node which sets ulimit (as well as environment variables 
which are more convenient to set there than in the mpirun), before 
starting the executable.  We had expert recommendations against this but 
no other working solution.  It seems unlikely that you would want to 
remove any limits which work at default.
Stack size unlimited in reality is not unlimited; it may be limited by a 
system limit or implementation.  As we run up to 120 threads per rank 
and many applications have threadprivate data regions, ability to run 
without considering stack limit is the exception rather than the rule.


--
Tim Prince



Re: [OMPI users] memory per core/process

2013-03-30 Thread Duke Nguyen

On 3/30/13 5:22 PM, Duke Nguyen wrote:

On 3/30/13 3:13 PM, Patrick Bégou wrote:

I do not know about your code but:

1) did you check stack limitations ? Typically intel fortran codes 
needs large amount of stack when the problem size increase.

Check ulimit -a


First time I heard of stack limitations. Anyway, ulimit -a gives

$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 127368
max locked memory   (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 10240
cpu time   (seconds, -t) unlimited
max user processes  (-u) 1024
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

So stack size is 10MB??? Does this one create problem? How do I change 
this?


I did $ ulimit -s unlimited to have stack size to be unlimited, and the 
job ran fine!!! So it looks like stack limit is the problem. Questions are:


 * how do I set this automatically (and permanently)?
 * should I set all other ulimits to be unlimited?

Thanks,

D.





2) did your node uses cpuset and memory limitation like fake numa to 
set the maximum amount of memory available for a job ?


Not really understand (also first time heard of fake numa), but I am 
pretty sure we do not have such things. The server I tried was a 
dedicated server with 2 x5420 and 16GB physical memory.




Patrick

Duke Nguyen a écrit :

Hi folks,

I am sorry if this question had been asked before, but after ten 
days of searching/working on the system, I surrender :(. We try to 
use mpirun to run abinit (abinit.org) which in turns will call an 
input file to run some simulation. The command to run is pretty simple


$ mpirun -np 4 /opt/apps/abinit/bin/abinit < input.files >& output.log

We ran this command on a server with two quad core x5420 and 16GB of 
memory. I called only 4 core, and I guess in theory each of the core 
should take up to 2GB each.


In the output of the log, there is something about memory:

P This job should need less than 717.175 Mbytes 
of memory.

  Rough estimation (10% accuracy) of disk space for files :
  WF disk file : 69.524 Mbytes ; DEN or POT disk file : 14.240 
Mbytes.


So basically it reported that the above job should not take more 
than 718MB each core.


But I still have the Segmentation Fault error:

mpirun noticed that process rank 0 with PID 16099 on node biobos 
exited on signal 11 (Segmentation fault).


The system already has limits up to unlimited:

$ cat /etc/security/limits.conf | grep -v '#'
* soft memlock unlimited
* hard memlock unlimited

I also tried to run

$ ulimit -l unlimited

before the mpirun command above, but it did not help at all.

If we adjust the parameters of the input.files to give the reported 
mem per core is less than 512MB, then the job runs fine.


Please help,

Thanks,

D.


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users







Re: [OMPI users] memory per core/process

2013-03-30 Thread Reuti
Am 30.03.2013 um 05:21 schrieb Duke Nguyen:

> Hi folks,
> 
> I am sorry if this question had been asked before, but after ten days of 
> searching/working on the system, I surrender :(. We try to use mpirun to run 
> abinit (abinit.org) which in turns will call an input file to run some 
> simulation. The command to run is pretty simple
> 
> $ mpirun -np 4 /opt/apps/abinit/bin/abinit < input.files >& output.log
> 
> We ran this command on a server with two quad core x5420 and 16GB of memory. 
> I called only 4 core, and I guess in theory each of the core should take up 
> to 2GB each.
> 
> In the output of the log, there is something about memory:
> 
> P This job should need less than 717.175 Mbytes of memory.
>   Rough estimation (10% accuracy) of disk space for files :
>   WF disk file : 69.524 Mbytes ; DEN or POT disk file : 14.240 Mbytes.
> 
> So basically it reported that the above job should not take more than 718MB 
> each core.
> 
> But I still have the Segmentation Fault error:

It might also be a programming error in abinit. You compiled abinit with the 
compiler version they suggest and Open MPI was compiled with the same version? 
It's running fine in serial mode? The `make check` of abinit succeeded?

-- Reuti


> mpirun noticed that process rank 0 with PID 16099 on node biobos exited on 
> signal 11 (Segmentation fault).
> 
> The system already has limits up to unlimited:
> 
> $ cat /etc/security/limits.conf | grep -v '#'
> * soft memlock unlimited
> * hard memlock unlimited
> 
> I also tried to run
> 
> $ ulimit -l unlimited
> 
> before the mpirun command above, but it did not help at all.
> 
> If we adjust the parameters of the input.files to give the reported mem per 
> core is less than 512MB, then the job runs fine.
> 
> Please help,
> 
> Thanks,
> 
> D.
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] memory per core/process

2013-03-30 Thread Duke Nguyen

On 3/30/13 3:13 PM, Patrick Bégou wrote:

I do not know about your code but:

1) did you check stack limitations ? Typically intel fortran codes 
needs large amount of stack when the problem size increase.

Check ulimit -a


First time I heard of stack limitations. Anyway, ulimit -a gives

$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 127368
max locked memory   (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 10240
cpu time   (seconds, -t) unlimited
max user processes  (-u) 1024
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

So stack size is 10MB??? Does this one create problem? How do I change this?



2) did your node uses cpuset and memory limitation like fake numa to 
set the maximum amount of memory available for a job ?


Not really understand (also first time heard of fake numa), but I am 
pretty sure we do not have such things. The server I tried was a 
dedicated server with 2 x5420 and 16GB physical memory.




Patrick

Duke Nguyen a écrit :

Hi folks,

I am sorry if this question had been asked before, but after ten days 
of searching/working on the system, I surrender :(. We try to use 
mpirun to run abinit (abinit.org) which in turns will call an input 
file to run some simulation. The command to run is pretty simple


$ mpirun -np 4 /opt/apps/abinit/bin/abinit < input.files >& output.log

We ran this command on a server with two quad core x5420 and 16GB of 
memory. I called only 4 core, and I guess in theory each of the core 
should take up to 2GB each.


In the output of the log, there is something about memory:

P This job should need less than 717.175 Mbytes 
of memory.

  Rough estimation (10% accuracy) of disk space for files :
  WF disk file : 69.524 Mbytes ; DEN or POT disk file : 14.240 
Mbytes.


So basically it reported that the above job should not take more than 
718MB each core.


But I still have the Segmentation Fault error:

mpirun noticed that process rank 0 with PID 16099 on node biobos 
exited on signal 11 (Segmentation fault).


The system already has limits up to unlimited:

$ cat /etc/security/limits.conf | grep -v '#'
* soft memlock unlimited
* hard memlock unlimited

I also tried to run

$ ulimit -l unlimited

before the mpirun command above, but it did not help at all.

If we adjust the parameters of the input.files to give the reported 
mem per core is less than 512MB, then the job runs fine.


Please help,

Thanks,

D.


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





Re: [OMPI users] memory per core/process

2013-03-30 Thread Patrick Bégou

I do not know about your code but:

1) did you check stack limitations ? Typically intel fortran codes needs 
large amount of stack when the problem size increase.

Check ulimit -a

2) did your node uses cpuset and memory limitation like fake numa to set 
the maximum amount of memory available for a job ?


Patrick

Duke Nguyen a écrit :

Hi folks,

I am sorry if this question had been asked before, but after ten days 
of searching/working on the system, I surrender :(. We try to use 
mpirun to run abinit (abinit.org) which in turns will call an input 
file to run some simulation. The command to run is pretty simple


$ mpirun -np 4 /opt/apps/abinit/bin/abinit < input.files >& output.log

We ran this command on a server with two quad core x5420 and 16GB of 
memory. I called only 4 core, and I guess in theory each of the core 
should take up to 2GB each.


In the output of the log, there is something about memory:

P This job should need less than 717.175 Mbytes of 
memory.

  Rough estimation (10% accuracy) of disk space for files :
  WF disk file : 69.524 Mbytes ; DEN or POT disk file : 14.240 Mbytes.

So basically it reported that the above job should not take more than 
718MB each core.


But I still have the Segmentation Fault error:

mpirun noticed that process rank 0 with PID 16099 on node biobos 
exited on signal 11 (Segmentation fault).


The system already has limits up to unlimited:

$ cat /etc/security/limits.conf | grep -v '#'
* soft memlock unlimited
* hard memlock unlimited

I also tried to run

$ ulimit -l unlimited

before the mpirun command above, but it did not help at all.

If we adjust the parameters of the input.files to give the reported 
mem per core is less than 512MB, then the job runs fine.


Please help,

Thanks,

D.


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] memory per core/process

2013-03-30 Thread Duke Nguyen

Hi folks,

I am sorry if this question had been asked before, but after ten days of 
searching/working on the system, I surrender :(. We try to use mpirun to 
run abinit (abinit.org) which in turns will call an input file to run 
some simulation. The command to run is pretty simple


$ mpirun -np 4 /opt/apps/abinit/bin/abinit < input.files >& output.log

We ran this command on a server with two quad core x5420 and 16GB of 
memory. I called only 4 core, and I guess in theory each of the core 
should take up to 2GB each.


In the output of the log, there is something about memory:

P This job should need less than 717.175 Mbytes of 
memory.

  Rough estimation (10% accuracy) of disk space for files :
  WF disk file : 69.524 Mbytes ; DEN or POT disk file : 14.240 Mbytes.

So basically it reported that the above job should not take more than 
718MB each core.


But I still have the Segmentation Fault error:

mpirun noticed that process rank 0 with PID 16099 on node biobos exited 
on signal 11 (Segmentation fault).


The system already has limits up to unlimited:

$ cat /etc/security/limits.conf | grep -v '#'
* soft memlock unlimited
* hard memlock unlimited

I also tried to run

$ ulimit -l unlimited

before the mpirun command above, but it did not help at all.

If we adjust the parameters of the input.files to give the reported mem 
per core is less than 512MB, then the job runs fine.


Please help,

Thanks,

D.