Re: [OMPI users] memory per core/process
Here is a v1.6 port of what was committed to the trunk. Let me know if/how it works for you. The option you will want to use is: mpirun -mca opal_set_max_sys_limits stacksize:unlimited or whatever number you want to give (see ulimit for the units). Note that you won't see any impact if you run it with a non-OMPI executable like "sh ulimit" since it only gets called during MPI_Init. On Apr 2, 2013, at 9:48 AM, Duke Nguyenwrote: > On 4/2/13 11:03 PM, Gus Correa wrote: >> On 04/02/2013 11:40 AM, Duke Nguyen wrote: >>> On 3/30/13 8:46 PM, Patrick Bégou wrote: Ok, so your problem is identified as a stack size problem. I went into these limitations using Intel fortran compilers on large data problems. First, it seems you can increase your stack size as "ulimit -s unlimited" works (you didn't enforce the system hard limit). The best way is to set this setting in your .bashrc file so it will works on every node. But setting it to unlimited may not be really safe. IE, if you got in a badly coded recursive function calling itself without a stop condition you can request all the system memory and crash the node. So set a large but limited value, it's safer. >>> >>> Now I feel the pain you mentioned :). With -s unlimited now some of our >>> nodes are easily down (completely) and needed to be hard reset!!! >>> (whereas we never had any node down like that before even with the >>> killed or badly coded jobs). >>> >>> Looking for a safer number of ulimit -s other than "unlimited" now... :( >>> >> >> In my opinion this is a trade off between who feels the pain. >> It can be you (sys admin) feeling the pain of having >> to power up offline nodes, >> or it could be the user feeling the pain for having >> her/his code killed by segmentation fault due to small memory >> available for the stack. > > ... in case that user is at a large institute that promises to provide best > service, unlimited resources/unlimited *everything* to end users. If not, > user should really think of how to make use the best of available resources. > Unfortunately many (most?) end users don't. > >> There is only so much that can be done to make everybody happy. > > So true... especially HPC resource is still luxurious here in Vietnam, and we > have a quite small (and not-so-strong) cluster. > >> If you share the nodes among jobs, you could set the >> stack size limit to >> some part of the physical_memory divided by the number_of_cores, >> saving some memory for the OS etc beforehand. >> However, this can be a straitjacket for jobs that could run with >> a bit more memory, and won't because of this limit. >> If you do not share the nodes, then you could make stacksize >> closer to physical memory. > > Great. Thanks for this advice Gus. > >> >> Anyway, this is less of an OpenMPI than of a >> resource manager / queuing system conversation. > > Yeah, and I have learned a lot other than just openmpi stuffs here :) > >> >> Best, >> Gus Correa >> I'm managing a cluster and I always set a maximum value to stack size. I also limit the memory available for each core for system stability. If a user request only one of the 12 cores of a node he can only access 1/12 of the node memory amount. If he needs more memory he has to request 2 cores, even if he uses a sequential code. This avoid crashing jobs of other users on the same node with memory requirements. But this is not configured on your node. Duke Nguyen a écrit : > On 3/30/13 3:13 PM, Patrick Bégou wrote: >> I do not know about your code but: >> >> 1) did you check stack limitations ? Typically intel fortran codes >> needs large amount of stack when the problem size increase. >> Check ulimit -a > > First time I heard of stack limitations. Anyway, ulimit -a gives > > $ ulimit -a > core file size (blocks, -c) 0 > data seg size (kbytes, -d) unlimited > scheduling priority (-e) 0 > file size (blocks, -f) unlimited > pending signals (-i) 127368 > max locked memory (kbytes, -l) unlimited > max memory size (kbytes, -m) unlimited > open files (-n) 1024 > pipe size (512 bytes, -p) 8 > POSIX message queues (bytes, -q) 819200 > real-time priority (-r) 0 > stack size (kbytes, -s) 10240 > cpu time (seconds, -t) unlimited > max user processes (-u) 1024 > virtual memory (kbytes, -v) unlimited > file locks (-x) unlimited > > So stack size is 10MB??? Does this one create problem? How do I > change this? > >> >> 2) did your node uses cpuset and memory limitation like fake numa to >> set the maximum amount of memory available for a job ? > > Not really understand (also first time heard of fake numa), but I am > pretty sure we do not have such things. The server I tried was a > dedicated server with 2 x5420 and 16GB
Re: [OMPI users] memory per core/process
On 4/2/13 11:03 PM, Gus Correa wrote: On 04/02/2013 11:40 AM, Duke Nguyen wrote: On 3/30/13 8:46 PM, Patrick Bégou wrote: Ok, so your problem is identified as a stack size problem. I went into these limitations using Intel fortran compilers on large data problems. First, it seems you can increase your stack size as "ulimit -s unlimited" works (you didn't enforce the system hard limit). The best way is to set this setting in your .bashrc file so it will works on every node. But setting it to unlimited may not be really safe. IE, if you got in a badly coded recursive function calling itself without a stop condition you can request all the system memory and crash the node. So set a large but limited value, it's safer. Now I feel the pain you mentioned :). With -s unlimited now some of our nodes are easily down (completely) and needed to be hard reset!!! (whereas we never had any node down like that before even with the killed or badly coded jobs). Looking for a safer number of ulimit -s other than "unlimited" now... :( In my opinion this is a trade off between who feels the pain. It can be you (sys admin) feeling the pain of having to power up offline nodes, or it could be the user feeling the pain for having her/his code killed by segmentation fault due to small memory available for the stack. ... in case that user is at a large institute that promises to provide best service, unlimited resources/unlimited *everything* to end users. If not, user should really think of how to make use the best of available resources. Unfortunately many (most?) end users don't. There is only so much that can be done to make everybody happy. So true... especially HPC resource is still luxurious here in Vietnam, and we have a quite small (and not-so-strong) cluster. If you share the nodes among jobs, you could set the stack size limit to some part of the physical_memory divided by the number_of_cores, saving some memory for the OS etc beforehand. However, this can be a straitjacket for jobs that could run with a bit more memory, and won't because of this limit. If you do not share the nodes, then you could make stacksize closer to physical memory. Great. Thanks for this advice Gus. Anyway, this is less of an OpenMPI than of a resource manager / queuing system conversation. Yeah, and I have learned a lot other than just openmpi stuffs here :) Best, Gus Correa I'm managing a cluster and I always set a maximum value to stack size. I also limit the memory available for each core for system stability. If a user request only one of the 12 cores of a node he can only access 1/12 of the node memory amount. If he needs more memory he has to request 2 cores, even if he uses a sequential code. This avoid crashing jobs of other users on the same node with memory requirements. But this is not configured on your node. Duke Nguyen a écrit : On 3/30/13 3:13 PM, Patrick Bégou wrote: I do not know about your code but: 1) did you check stack limitations ? Typically intel fortran codes needs large amount of stack when the problem size increase. Check ulimit -a First time I heard of stack limitations. Anyway, ulimit -a gives $ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 127368 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 1024 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited So stack size is 10MB??? Does this one create problem? How do I change this? 2) did your node uses cpuset and memory limitation like fake numa to set the maximum amount of memory available for a job ? Not really understand (also first time heard of fake numa), but I am pretty sure we do not have such things. The server I tried was a dedicated server with 2 x5420 and 16GB physical memory. Patrick Duke Nguyen a écrit : Hi folks, I am sorry if this question had been asked before, but after ten days of searching/working on the system, I surrender :(. We try to use mpirun to run abinit (abinit.org) which in turns will call an input file to run some simulation. The command to run is pretty simple $ mpirun -np 4 /opt/apps/abinit/bin/abinit < input.files >& output.log We ran this command on a server with two quad core x5420 and 16GB of memory. I called only 4 core, and I guess in theory each of the core should take up to 2GB each. In the output of the log, there is something about memory: P This job should need less than 717.175 Mbytes of memory. Rough estimation (10% accuracy) of disk space for files : WF disk file : 69.524 Mbytes ; DEN or POT disk file : 14.240 Mbytes. So basically it reported that the above job should not take more than 718MB each core. But I
Re: [OMPI users] memory per core/process
On 4/2/13 10:45 PM, Ralph Castain wrote: Hmmm...tell you what. I'll add the ability for OMPI to set the limit to a user-specified level upon launch of each process. This will give you some protection and flexibility. That would be excellent ;) I forget, so please forgive the old man's fading memory - what version of OMPI are you using? I'll backport a patch for you. It's openmpi-1.6.3-x86_64, if that helps... On Apr 2, 2013, at 8:40 AM, Duke Nguyenwrote: On 3/30/13 8:46 PM, Patrick Bégou wrote: Ok, so your problem is identified as a stack size problem. I went into these limitations using Intel fortran compilers on large data problems. First, it seems you can increase your stack size as "ulimit -s unlimited" works (you didn't enforce the system hard limit). The best way is to set this setting in your .bashrc file so it will works on every node. But setting it to unlimited may not be really safe. IE, if you got in a badly coded recursive function calling itself without a stop condition you can request all the system memory and crash the node. So set a large but limited value, it's safer. Now I feel the pain you mentioned :). With -s unlimited now some of our nodes are easily down (completely) and needed to be hard reset!!! (whereas we never had any node down like that before even with the killed or badly coded jobs). Looking for a safer number of ulimit -s other than "unlimited" now... :( I'm managing a cluster and I always set a maximum value to stack size. I also limit the memory available for each core for system stability. If a user request only one of the 12 cores of a node he can only access 1/12 of the node memory amount. If he needs more memory he has to request 2 cores, even if he uses a sequential code. This avoid crashing jobs of other users on the same node with memory requirements. But this is not configured on your node. Duke Nguyen a écrit : On 3/30/13 3:13 PM, Patrick Bégou wrote: I do not know about your code but: 1) did you check stack limitations ? Typically intel fortran codes needs large amount of stack when the problem size increase. Check ulimit -a First time I heard of stack limitations. Anyway, ulimit -a gives $ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 127368 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 1024 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited So stack size is 10MB??? Does this one create problem? How do I change this? 2) did your node uses cpuset and memory limitation like fake numa to set the maximum amount of memory available for a job ? Not really understand (also first time heard of fake numa), but I am pretty sure we do not have such things. The server I tried was a dedicated server with 2 x5420 and 16GB physical memory. Patrick Duke Nguyen a écrit : Hi folks, I am sorry if this question had been asked before, but after ten days of searching/working on the system, I surrender :(. We try to use mpirun to run abinit (abinit.org) which in turns will call an input file to run some simulation. The command to run is pretty simple $ mpirun -np 4 /opt/apps/abinit/bin/abinit < input.files >& output.log We ran this command on a server with two quad core x5420 and 16GB of memory. I called only 4 core, and I guess in theory each of the core should take up to 2GB each. In the output of the log, there is something about memory: P This job should need less than 717.175 Mbytes of memory. Rough estimation (10% accuracy) of disk space for files : WF disk file : 69.524 Mbytes ; DEN or POT disk file : 14.240 Mbytes. So basically it reported that the above job should not take more than 718MB each core. But I still have the Segmentation Fault error: mpirun noticed that process rank 0 with PID 16099 on node biobos exited on signal 11 (Segmentation fault). The system already has limits up to unlimited: $ cat /etc/security/limits.conf | grep -v '#' * soft memlock unlimited * hard memlock unlimited I also tried to run $ ulimit -l unlimited before the mpirun command above, but it did not help at all. If we adjust the parameters of the input.files to give the reported mem per core is less than 512MB, then the job runs fine. Please help, Thanks, D. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] memory per core/process
On 04/02/2013 11:40 AM, Duke Nguyen wrote: On 3/30/13 8:46 PM, Patrick Bégou wrote: Ok, so your problem is identified as a stack size problem. I went into these limitations using Intel fortran compilers on large data problems. First, it seems you can increase your stack size as "ulimit -s unlimited" works (you didn't enforce the system hard limit). The best way is to set this setting in your .bashrc file so it will works on every node. But setting it to unlimited may not be really safe. IE, if you got in a badly coded recursive function calling itself without a stop condition you can request all the system memory and crash the node. So set a large but limited value, it's safer. Now I feel the pain you mentioned :). With -s unlimited now some of our nodes are easily down (completely) and needed to be hard reset!!! (whereas we never had any node down like that before even with the killed or badly coded jobs). Looking for a safer number of ulimit -s other than "unlimited" now... :( In my opinion this is a trade off between who feels the pain. It can be you (sys admin) feeling the pain of having to power up offline nodes, or it could be the user feeling the pain for having her/his code killed by segmentation fault due to small memory available for the stack. There is only so much that can be done to make everybody happy. If you share the nodes among jobs, you could set the stack size limit to some part of the physical_memory divided by the number_of_cores, saving some memory for the OS etc beforehand. However, this can be a straitjacket for jobs that could run with a bit more memory, and won't because of this limit. If you do not share the nodes, then you could make stacksize closer to physical memory. Anyway, this is less of an OpenMPI than of a resource manager / queuing system conversation. Best, Gus Correa I'm managing a cluster and I always set a maximum value to stack size. I also limit the memory available for each core for system stability. If a user request only one of the 12 cores of a node he can only access 1/12 of the node memory amount. If he needs more memory he has to request 2 cores, even if he uses a sequential code. This avoid crashing jobs of other users on the same node with memory requirements. But this is not configured on your node. Duke Nguyen a écrit : On 3/30/13 3:13 PM, Patrick Bégou wrote: I do not know about your code but: 1) did you check stack limitations ? Typically intel fortran codes needs large amount of stack when the problem size increase. Check ulimit -a First time I heard of stack limitations. Anyway, ulimit -a gives $ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 127368 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 1024 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited So stack size is 10MB??? Does this one create problem? How do I change this? 2) did your node uses cpuset and memory limitation like fake numa to set the maximum amount of memory available for a job ? Not really understand (also first time heard of fake numa), but I am pretty sure we do not have such things. The server I tried was a dedicated server with 2 x5420 and 16GB physical memory. Patrick Duke Nguyen a écrit : Hi folks, I am sorry if this question had been asked before, but after ten days of searching/working on the system, I surrender :(. We try to use mpirun to run abinit (abinit.org) which in turns will call an input file to run some simulation. The command to run is pretty simple $ mpirun -np 4 /opt/apps/abinit/bin/abinit < input.files >& output.log We ran this command on a server with two quad core x5420 and 16GB of memory. I called only 4 core, and I guess in theory each of the core should take up to 2GB each. In the output of the log, there is something about memory: P This job should need less than 717.175 Mbytes of memory. Rough estimation (10% accuracy) of disk space for files : WF disk file : 69.524 Mbytes ; DEN or POT disk file : 14.240 Mbytes. So basically it reported that the above job should not take more than 718MB each core. But I still have the Segmentation Fault error: mpirun noticed that process rank 0 with PID 16099 on node biobos exited on signal 11 (Segmentation fault). The system already has limits up to unlimited: $ cat /etc/security/limits.conf | grep -v '#' * soft memlock unlimited * hard memlock unlimited I also tried to run $ ulimit -l unlimited before the mpirun command above, but it did not help at all. If we adjust the parameters of the input.files to give the reported mem per core is less than 512MB, then the job runs fine. Please help, Thanks, D.
Re: [OMPI users] memory per core/process
Hmmm...tell you what. I'll add the ability for OMPI to set the limit to a user-specified level upon launch of each process. This will give you some protection and flexibility. I forget, so please forgive the old man's fading memory - what version of OMPI are you using? I'll backport a patch for you. On Apr 2, 2013, at 8:40 AM, Duke Nguyenwrote: > On 3/30/13 8:46 PM, Patrick Bégou wrote: >> Ok, so your problem is identified as a stack size problem. I went into these >> limitations using Intel fortran compilers on large data problems. >> >> First, it seems you can increase your stack size as "ulimit -s unlimited" >> works (you didn't enforce the system hard limit). The best way is to set >> this setting in your .bashrc file so it will works on every node. >> But setting it to unlimited may not be really safe. IE, if you got in a >> badly coded recursive function calling itself without a stop condition you >> can request all the system memory and crash the node. So set a large but >> limited value, it's safer. >> > > Now I feel the pain you mentioned :). With -s unlimited now some of our nodes > are easily down (completely) and needed to be hard reset!!! (whereas we never > had any node down like that before even with the killed or badly coded jobs). > > Looking for a safer number of ulimit -s other than "unlimited" now... :( > >> I'm managing a cluster and I always set a maximum value to stack size. I >> also limit the memory available for each core for system stability. If a >> user request only one of the 12 cores of a node he can only access 1/12 of >> the node memory amount. If he needs more memory he has to request 2 cores, >> even if he uses a sequential code. This avoid crashing jobs of other users >> on the same node with memory requirements. But this is not configured on >> your node. >> >> Duke Nguyen a écrit : >>> On 3/30/13 3:13 PM, Patrick Bégou wrote: I do not know about your code but: 1) did you check stack limitations ? Typically intel fortran codes needs large amount of stack when the problem size increase. Check ulimit -a >>> >>> First time I heard of stack limitations. Anyway, ulimit -a gives >>> >>> $ ulimit -a >>> core file size (blocks, -c) 0 >>> data seg size (kbytes, -d) unlimited >>> scheduling priority (-e) 0 >>> file size (blocks, -f) unlimited >>> pending signals (-i) 127368 >>> max locked memory (kbytes, -l) unlimited >>> max memory size (kbytes, -m) unlimited >>> open files (-n) 1024 >>> pipe size(512 bytes, -p) 8 >>> POSIX message queues (bytes, -q) 819200 >>> real-time priority (-r) 0 >>> stack size (kbytes, -s) 10240 >>> cpu time (seconds, -t) unlimited >>> max user processes (-u) 1024 >>> virtual memory (kbytes, -v) unlimited >>> file locks (-x) unlimited >>> >>> So stack size is 10MB??? Does this one create problem? How do I change this? >>> 2) did your node uses cpuset and memory limitation like fake numa to set the maximum amount of memory available for a job ? >>> >>> Not really understand (also first time heard of fake numa), but I am pretty >>> sure we do not have such things. The server I tried was a dedicated server >>> with 2 x5420 and 16GB physical memory. >>> Patrick Duke Nguyen a écrit : > Hi folks, > > I am sorry if this question had been asked before, but after ten days of > searching/working on the system, I surrender :(. We try to use mpirun to > run abinit (abinit.org) which in turns will call an input file to run > some simulation. The command to run is pretty simple > > $ mpirun -np 4 /opt/apps/abinit/bin/abinit < input.files >& output.log > > We ran this command on a server with two quad core x5420 and 16GB of > memory. I called only 4 core, and I guess in theory each of the core > should take up to 2GB each. > > In the output of the log, there is something about memory: > > P This job should need less than 717.175 Mbytes of > memory. > Rough estimation (10% accuracy) of disk space for files : > WF disk file : 69.524 Mbytes ; DEN or POT disk file : 14.240 Mbytes. > > So basically it reported that the above job should not take more than > 718MB each core. > > But I still have the Segmentation Fault error: > > mpirun noticed that process rank 0 with PID 16099 on node biobos exited > on signal 11 (Segmentation fault). > > The system already has limits up to unlimited: > > $ cat /etc/security/limits.conf | grep -v '#' > * soft memlock unlimited > * hard memlock unlimited > > I also tried to run > > $ ulimit -l unlimited > > before the mpirun
Re: [OMPI users] memory per core/process
On 3/30/13 8:46 PM, Patrick Bégou wrote: Ok, so your problem is identified as a stack size problem. I went into these limitations using Intel fortran compilers on large data problems. First, it seems you can increase your stack size as "ulimit -s unlimited" works (you didn't enforce the system hard limit). The best way is to set this setting in your .bashrc file so it will works on every node. But setting it to unlimited may not be really safe. IE, if you got in a badly coded recursive function calling itself without a stop condition you can request all the system memory and crash the node. So set a large but limited value, it's safer. Now I feel the pain you mentioned :). With -s unlimited now some of our nodes are easily down (completely) and needed to be hard reset!!! (whereas we never had any node down like that before even with the killed or badly coded jobs). Looking for a safer number of ulimit -s other than "unlimited" now... :( I'm managing a cluster and I always set a maximum value to stack size. I also limit the memory available for each core for system stability. If a user request only one of the 12 cores of a node he can only access 1/12 of the node memory amount. If he needs more memory he has to request 2 cores, even if he uses a sequential code. This avoid crashing jobs of other users on the same node with memory requirements. But this is not configured on your node. Duke Nguyen a écrit : On 3/30/13 3:13 PM, Patrick Bégou wrote: I do not know about your code but: 1) did you check stack limitations ? Typically intel fortran codes needs large amount of stack when the problem size increase. Check ulimit -a First time I heard of stack limitations. Anyway, ulimit -a gives $ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 127368 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 1024 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited So stack size is 10MB??? Does this one create problem? How do I change this? 2) did your node uses cpuset and memory limitation like fake numa to set the maximum amount of memory available for a job ? Not really understand (also first time heard of fake numa), but I am pretty sure we do not have such things. The server I tried was a dedicated server with 2 x5420 and 16GB physical memory. Patrick Duke Nguyen a écrit : Hi folks, I am sorry if this question had been asked before, but after ten days of searching/working on the system, I surrender :(. We try to use mpirun to run abinit (abinit.org) which in turns will call an input file to run some simulation. The command to run is pretty simple $ mpirun -np 4 /opt/apps/abinit/bin/abinit < input.files >& output.log We ran this command on a server with two quad core x5420 and 16GB of memory. I called only 4 core, and I guess in theory each of the core should take up to 2GB each. In the output of the log, there is something about memory: P This job should need less than 717.175 Mbytes of memory. Rough estimation (10% accuracy) of disk space for files : WF disk file : 69.524 Mbytes ; DEN or POT disk file : 14.240 Mbytes. So basically it reported that the above job should not take more than 718MB each core. But I still have the Segmentation Fault error: mpirun noticed that process rank 0 with PID 16099 on node biobos exited on signal 11 (Segmentation fault). The system already has limits up to unlimited: $ cat /etc/security/limits.conf | grep -v '#' * soft memlock unlimited * hard memlock unlimited I also tried to run $ ulimit -l unlimited before the mpirun command above, but it did not help at all. If we adjust the parameters of the input.files to give the reported mem per core is less than 512MB, then the job runs fine. Please help, Thanks, D. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] memory per core/process
On 4/2/13 6:50 PM, Reuti wrote: Hi, Am 30.03.2013 um 14:46 schrieb Patrick Bégou: Ok, so your problem is identified as a stack size problem. I went into these limitations using Intel fortran compilers on large data problems. First, it seems you can increase your stack size as "ulimit -s unlimited" works (you didn't enforce the system hard limit). The best way is to set this setting in your .bashrc file so it will works on every node. But setting it to unlimited may not be really safe. IE, if you got in a badly coded recursive function calling itself without a stop condition you can request all the system memory and crash the node. So set a large but limited value, it's safer. I'm managing a cluster and I always set a maximum value to stack size. I also limit the memory available for each core for system stability. If a user request only one of the 12 cores of a node he can only access 1/12 of the node memory amount. If he needs more memory he has to request 2 cores, even if he uses a sequential code. This avoid crashing jobs of other users on the same node with memory requirements. But this is not configured on your node. This is one way to implement memory limits as a policy - it's up to the user to request the correct number of cores then although he wants to run a serial job only. Personally I prefer that the user specifies the requested memory in such a case. It's up to the queuingsystem then to avoid that additional jobs are scheduled to a machine unless the remaining memory is sufficient for their execution in such a situation. We use Torque/Maui and I want to do similar with Torque/Maui (still learning - those, together with openmpi are new to me). Unfortunately posting to Torque/Maui forums are somehow too difficult (my posts were moderated since I am the newcomer, but it seems nobody is managing those forums so my posts were never able to get through...). I wish they were as active as this forum... D. -- Reuti Duke Nguyen a écrit : On 3/30/13 3:13 PM, Patrick Bégou wrote: I do not know about your code but: 1) did you check stack limitations ? Typically intel fortran codes needs large amount of stack when the problem size increase. Check ulimit -a First time I heard of stack limitations. Anyway, ulimit -a gives $ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 127368 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 1024 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited So stack size is 10MB??? Does this one create problem? How do I change this? 2) did your node uses cpuset and memory limitation like fake numa to set the maximum amount of memory available for a job ? Not really understand (also first time heard of fake numa), but I am pretty sure we do not have such things. The server I tried was a dedicated server with 2 x5420 and 16GB physical memory. Patrick Duke Nguyen a écrit : Hi folks, I am sorry if this question had been asked before, but after ten days of searching/working on the system, I surrender :(. We try to use mpirun to run abinit (abinit.org) which in turns will call an input file to run some simulation. The command to run is pretty simple $ mpirun -np 4 /opt/apps/abinit/bin/abinit < input.files >& output.log We ran this command on a server with two quad core x5420 and 16GB of memory. I called only 4 core, and I guess in theory each of the core should take up to 2GB each. In the output of the log, there is something about memory: P This job should need less than 717.175 Mbytes of memory. Rough estimation (10% accuracy) of disk space for files : WF disk file : 69.524 Mbytes ; DEN or POT disk file : 14.240 Mbytes. So basically it reported that the above job should not take more than 718MB each core. But I still have the Segmentation Fault error: mpirun noticed that process rank 0 with PID 16099 on node biobos exited on signal 11 (Segmentation fault). The system already has limits up to unlimited: $ cat /etc/security/limits.conf | grep -v '#' * soft memlock unlimited * hard memlock unlimited I also tried to run $ ulimit -l unlimited before the mpirun command above, but it did not help at all. If we adjust the parameters of the input.files to give the reported mem per core is less than 512MB, then the job runs fine. Please help, Thanks, D. ___ users mailing list us...@open-mpi.org
Re: [OMPI users] memory per core/process
On 4/2/13 6:42 PM, Reuti wrote: /usr/local/bin/mpirun -npernode 1 -tag-output sh -c "ulimit -a" You are right :) $ /usr/local/bin/mpirun -npernode 1 -tag-output sh -c "ulimit -a" [1,0]:core file size (blocks, -c) 0 [1,0]:data seg size (kbytes, -d) unlimited [1,0]:scheduling priority (-e) 0 [1,0]:file size (blocks, -f) unlimited [1,0]:pending signals (-i) 8271027 [1,0]:max locked memory (kbytes, -l) unlimited [1,0]:max memory size (kbytes, -m) unlimited [1,0]:open files (-n) 32768 [1,0]:pipe size(512 bytes, -p) 8 [1,0]:POSIX message queues (bytes, -q) 819200 [1,0]:real-time priority (-r) 0 [1,0]:stack size (kbytes, -s) unlimited [1,0]:cpu time (seconds, -t) unlimited [1,0]:max user processes (-u) 8192 [1,0]:virtual memory (kbytes, -v) unlimited [1,0]:file locks (-x) unlimited [1,1]:core file size (blocks, -c) 0 [1,1]:data seg size (kbytes, -d) unlimited [1,1]:scheduling priority (-e) 0 [1,1]:file size (blocks, -f) unlimited [1,1]:pending signals (-i) 8271027 [1,1]:max locked memory (kbytes, -l) unlimited [1,1]:max memory size (kbytes, -m) unlimited [1,1]:open files (-n) 32768 [1,1]:pipe size(512 bytes, -p) 8 [1,1]:POSIX message queues (bytes, -q) 819200 [1,1]:real-time priority (-r) 0 [1,1]:stack size (kbytes, -s) unlimited [1,1]:cpu time (seconds, -t) unlimited [1,1]:max user processes (-u) 8192 [1,1]:virtual memory (kbytes, -v) unlimited [1,1]:file locks (-x) unlimited [1,2]:core file size (blocks, -c) 0 [1,2]:data seg size (kbytes, -d) unlimited [1,2]:scheduling priority (-e) 0 [1,2]:file size (blocks, -f) unlimited [1,2]:pending signals (-i) 8271027 [1,2]:max locked memory (kbytes, -l) unlimited [1,2]:max memory size (kbytes, -m) unlimited [1,2]:open files (-n) 32768 [1,2]:pipe size(512 bytes, -p) 8 [1,2]:POSIX message queues (bytes, -q) 819200 [1,2]:real-time priority (-r) 0 [1,2]:stack size (kbytes, -s) unlimited [1,2]:cpu time (seconds, -t) unlimited [1,2]:max user processes (-u) 8192 [1,2]:virtual memory (kbytes, -v) unlimited [1,2]:file locks (-x) unlimited [1,3]:core file size (blocks, -c) 0 [1,3]:data seg size (kbytes, -d) unlimited [1,3]:scheduling priority (-e) 0 [1,3]:file size (blocks, -f) unlimited [1,3]:pending signals (-i) 8271027 [1,3]:max locked memory (kbytes, -l) unlimited [1,3]:max memory size (kbytes, -m) unlimited [1,3]:open files (-n) 32768 [1,3]:pipe size(512 bytes, -p) 8 [1,3]:POSIX message queues (bytes, -q) 819200 [1,3]:real-time priority (-r) 0 [1,3]:stack size (kbytes, -s) unlimited [1,3]:cpu time (seconds, -t) unlimited [1,3]:max user processes (-u) 8192 [1,3]:virtual memory (kbytes, -v) unlimited [1,3]:file locks (-x) unlimited
Re: [OMPI users] memory per core/process
Hi, Am 30.03.2013 um 15:35 schrieb Gustavo Correa: > On Mar 30, 2013, at 10:02 AM, Duke Nguyen wrote: > >> On 3/30/13 8:20 PM, Reuti wrote: >>> Am 30.03.2013 um 13:26 schrieb Tim Prince: >>> On 03/30/2013 06:36 AM, Duke Nguyen wrote: > On 3/30/13 5:22 PM, Duke Nguyen wrote: >> On 3/30/13 3:13 PM, Patrick Bégou wrote: >>> I do not know about your code but: >>> >>> 1) did you check stack limitations ? Typically intel fortran codes >>> needs large amount of stack when the problem size increase. >>> Check ulimit -a >> First time I heard of stack limitations. Anyway, ulimit -a gives >> >> $ ulimit -a >> core file size (blocks, -c) 0 >> data seg size (kbytes, -d) unlimited >> scheduling priority (-e) 0 >> file size (blocks, -f) unlimited >> pending signals (-i) 127368 >> max locked memory (kbytes, -l) unlimited >> max memory size (kbytes, -m) unlimited >> open files (-n) 1024 >> pipe size(512 bytes, -p) 8 >> POSIX message queues (bytes, -q) 819200 >> real-time priority (-r) 0 >> stack size (kbytes, -s) 10240 >> cpu time (seconds, -t) unlimited >> max user processes (-u) 1024 >> virtual memory (kbytes, -v) unlimited >> file locks (-x) unlimited >> >> So stack size is 10MB??? Does this one create problem? How do I change >> this? > I did $ ulimit -s unlimited to have stack size to be unlimited, and the > job ran fine!!! So it looks like stack limit is the problem. Questions > are: > > * how do I set this automatically (and permanently)? > * should I set all other ulimits to be unlimited? > In our environment, the only solution we found is to have mpirun run a script on each node which sets ulimit (as well as environment variables which are more convenient to set there than in the mpirun), before starting the executable. We had expert recommendations against this but no other working solution. It seems unlikely that you would want to remove any limits which work at default. Stack size unlimited in reality is not unlimited; it may be limited by a system limit or implementation. As we run up to 120 threads per rank and many applications have threadprivate data regions, ability to run without considering stack limit is the exception rather than the rule. >>> Even if I would be the only user on a cluster of machines, I would define >>> this in any queuingsystem to set the limits for the job. >> >> Sorry if I dont get this correctly, but do you mean I should set this using >> Torque/Maui (our queuing manager) instead of the system itself >> (/etc/security/limits.conf and /etc/profile.d/)? Yes, or per queue/job. > Hi Duke > > We do both. > Set memlock and stacksize to unlimited, and increase the maximum number of > open files in the pbs_mom script in /etc/init.d, and do the same in > /etc/security/limits.conf. > This maybe an overzealous "belt and suspenders" policy, but it works. > As everybody else said, a small stacksize is a common cause of segmentation > fault in > large codes. This way it would be fixed in the overall cluster and not per job - or? I saw situations, where with a limited virtual memory for a job, the stack size has to be set to a low value in range of a few ten megabytes only. Whether such a request is possible depends on the queuingsystem though. In GridEngine it's possible, I'm not sure about Torque/PBS. -- Reuti > Basically all codes that we run here have this problem, with too many > automatic arrays, structures, etc in functions and subroutines. > But also a small memlock is trouble for OFED/Infinband, and the small > (default) > max number of open file handles may hit the limit easily if many programs > (or poorly written programs) are running in the same node. > The default Linux distribution limits don't seem to be tailored for HPC, I > guess. > > I hope this helps, > Gus Correa > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] memory per core/process
Hi, Am 30.03.2013 um 14:46 schrieb Patrick Bégou: > Ok, so your problem is identified as a stack size problem. I went into these > limitations using Intel fortran compilers on large data problems. > > First, it seems you can increase your stack size as "ulimit -s unlimited" > works (you didn't enforce the system hard limit). The best way is to set > this setting in your .bashrc file so it will works on every node. > But setting it to unlimited may not be really safe. IE, if you got in a badly > coded recursive function calling itself without a stop condition you can > request all the system memory and crash the node. So set a large but limited > value, it's safer. > > I'm managing a cluster and I always set a maximum value to stack size. I also > limit the memory available for each core for system stability. If a user > request only one of the 12 cores of a node he can only access 1/12 of the > node memory amount. If he needs more memory he has to request 2 cores, even > if he uses a sequential code. This avoid crashing jobs of other users on the > same node with memory requirements. But this is not configured on your node. This is one way to implement memory limits as a policy - it's up to the user to request the correct number of cores then although he wants to run a serial job only. Personally I prefer that the user specifies the requested memory in such a case. It's up to the queuingsystem then to avoid that additional jobs are scheduled to a machine unless the remaining memory is sufficient for their execution in such a situation. -- Reuti > Duke Nguyen a écrit : >> On 3/30/13 3:13 PM, Patrick Bégou wrote: >>> I do not know about your code but: >>> >>> 1) did you check stack limitations ? Typically intel fortran codes needs >>> large amount of stack when the problem size increase. >>> Check ulimit -a >> >> First time I heard of stack limitations. Anyway, ulimit -a gives >> >> $ ulimit -a >> core file size (blocks, -c) 0 >> data seg size (kbytes, -d) unlimited >> scheduling priority (-e) 0 >> file size (blocks, -f) unlimited >> pending signals (-i) 127368 >> max locked memory (kbytes, -l) unlimited >> max memory size (kbytes, -m) unlimited >> open files (-n) 1024 >> pipe size(512 bytes, -p) 8 >> POSIX message queues (bytes, -q) 819200 >> real-time priority (-r) 0 >> stack size (kbytes, -s) 10240 >> cpu time (seconds, -t) unlimited >> max user processes (-u) 1024 >> virtual memory (kbytes, -v) unlimited >> file locks (-x) unlimited >> >> So stack size is 10MB??? Does this one create problem? How do I change this? >> >>> >>> 2) did your node uses cpuset and memory limitation like fake numa to set >>> the maximum amount of memory available for a job ? >> >> Not really understand (also first time heard of fake numa), but I am pretty >> sure we do not have such things. The server I tried was a dedicated server >> with 2 x5420 and 16GB physical memory. >> >>> >>> Patrick >>> >>> Duke Nguyen a écrit : Hi folks, I am sorry if this question had been asked before, but after ten days of searching/working on the system, I surrender :(. We try to use mpirun to run abinit (abinit.org) which in turns will call an input file to run some simulation. The command to run is pretty simple $ mpirun -np 4 /opt/apps/abinit/bin/abinit < input.files >& output.log We ran this command on a server with two quad core x5420 and 16GB of memory. I called only 4 core, and I guess in theory each of the core should take up to 2GB each. In the output of the log, there is something about memory: P This job should need less than 717.175 Mbytes of memory. Rough estimation (10% accuracy) of disk space for files : WF disk file : 69.524 Mbytes ; DEN or POT disk file : 14.240 Mbytes. So basically it reported that the above job should not take more than 718MB each core. But I still have the Segmentation Fault error: mpirun noticed that process rank 0 with PID 16099 on node biobos exited on signal 11 (Segmentation fault). The system already has limits up to unlimited: $ cat /etc/security/limits.conf | grep -v '#' * soft memlock unlimited * hard memlock unlimited I also tried to run $ ulimit -l unlimited before the mpirun command above, but it did not help at all. If we adjust the parameters of the input.files to give the reported mem per core is less than 512MB, then the job runs fine. Please help, Thanks, D. ___ users mailing list us...@open-mpi.org
Re: [OMPI users] memory per core/process
Hi, Am 02.04.2013 um 13:22 schrieb Duke Nguyen: > On 4/1/13 9:20 PM, Ralph Castain wrote: >> It's probably the same problem - try running 'mpirun -npernode 1 -tag-output >> ulimit -a" on the remote nodes and see what it says. I suspect you'll find >> that they aren't correct. > > Somehow I could not run your advised CMD: > > $ qsub -l nodes=4:ppn=8 -I > qsub: waiting for job 481.biobos to start > qsub: job 481.biobos ready > > $ /usr/local/bin/mpirun -npernode 1 -tag-output ulimit -a > -- > mpirun was unable to launch the specified application as it could not find an > executable: `ulimit` is a shell builtin: $ type ulimit ulimit is a shell builtin It should work wit: $ /usr/local/bin/mpirun -npernode 1 -tag-output sh -c "ulimit -a" -- Reuti > Executable: ulimit > Node: node0108.biobos > > while attempting to start process rank 0. > -- > 4 total processes failed to start > > But anyway, I figured out the reason. Yes, it is the cluster nodes that did > not update ulimit settings (our system is a diskless node with warewulf so > basically we have to update the vnfs and reboot all nodes before the nodes > can run with new settings). > > Thanks for all the helps :) > > D. > >> >> BTW: the "-tag-output'" option marks each line of output with the rank of >> the process. Since all the outputs will be interleaved, this will help you >> identify what came from each node. >> >> >> On Mar 31, 2013, at 11:30 PM, Duke Nguyenwrote: >> >>> On 3/31/13 12:20 AM, Duke Nguyen wrote: I should really have asked earlier. Thanks for all the helps. >>> I think I was excited too soon :). Increasing stacksize does help if I run >>> a job in a dedicated server. Today I tried to modify the cluster >>> (/etc/security/limits.conf, /etc/init.d/pbs_mom) and tried to run a >>> different job with 4 nodes/8 core each (nodes=4:ppn=8), but I still get the >>> mpirun error. My ulimit now reads: >>> >>> $ ulimit -a >>> core file size (blocks, -c) 0 >>> data seg size (kbytes, -d) unlimited >>> scheduling priority (-e) 0 >>> file size (blocks, -f) unlimited >>> pending signals (-i) 8271027 >>> max locked memory (kbytes, -l) unlimited >>> max memory size (kbytes, -m) unlimited >>> open files (-n) 32768 >>> pipe size(512 bytes, -p) 8 >>> POSIX message queues (bytes, -q) 819200 >>> real-time priority (-r) 0 >>> stack size (kbytes, -s) unlimited >>> cpu time (seconds, -t) unlimited >>> max user processes (-u) 8192 >>> virtual memory (kbytes, -v) unlimited >>> file locks (-x) unlimited >>> >>> Any other advice??? >>> On 3/30/13 10:28 PM, Ralph Castain wrote: > FWIW: there is an MCA param that helps with such problems: > > opal_set_max_sys_limits > "Set to non-zero to automatically set any system-imposed > limits to the maximum allowed", > > At the moment, it only sets the limits on number of files open, and max > size of a file we can create. Easy enough to add the stack size, though > as someone pointed out, it has some negatives as well. > > > On Mar 30, 2013, at 7:35 AM, Gustavo Correa > wrote: > >> On Mar 30, 2013, at 10:02 AM, Duke Nguyen wrote: >> >>> On 3/30/13 8:20 PM, Reuti wrote: Am 30.03.2013 um 13:26 schrieb Tim Prince: > On 03/30/2013 06:36 AM, Duke Nguyen wrote: >> On 3/30/13 5:22 PM, Duke Nguyen wrote: >>> On 3/30/13 3:13 PM, Patrick Bégou wrote: I do not know about your code but: 1) did you check stack limitations ? Typically intel fortran codes needs large amount of stack when the problem size increase. Check ulimit -a >>> First time I heard of stack limitations. Anyway, ulimit -a gives >>> >>> $ ulimit -a >>> core file size (blocks, -c) 0 >>> data seg size (kbytes, -d) unlimited >>> scheduling priority (-e) 0 >>> file size (blocks, -f) unlimited >>> pending signals (-i) 127368 >>> max locked memory (kbytes, -l) unlimited >>> max memory size (kbytes, -m) unlimited >>> open files (-n) 1024 >>> pipe size(512 bytes, -p) 8 >>> POSIX message queues (bytes, -q) 819200 >>> real-time priority (-r) 0 >>> stack size (kbytes, -s) 10240 >>> cpu time (seconds, -t) unlimited >>> max user processes (-u) 1024
Re: [OMPI users] memory per core/process
On 4/1/13 9:20 PM, Ralph Castain wrote: It's probably the same problem - try running 'mpirun -npernode 1 -tag-output ulimit -a" on the remote nodes and see what it says. I suspect you'll find that they aren't correct. Somehow I could not run your advised CMD: $ qsub -l nodes=4:ppn=8 -I qsub: waiting for job 481.biobos to start qsub: job 481.biobos ready $ /usr/local/bin/mpirun -npernode 1 -tag-output ulimit -a -- mpirun was unable to launch the specified application as it could not find an executable: Executable: ulimit Node: node0108.biobos while attempting to start process rank 0. -- 4 total processes failed to start But anyway, I figured out the reason. Yes, it is the cluster nodes that did not update ulimit settings (our system is a diskless node with warewulf so basically we have to update the vnfs and reboot all nodes before the nodes can run with new settings). Thanks for all the helps :) D. BTW: the "-tag-output'" option marks each line of output with the rank of the process. Since all the outputs will be interleaved, this will help you identify what came from each node. On Mar 31, 2013, at 11:30 PM, Duke Nguyenwrote: On 3/31/13 12:20 AM, Duke Nguyen wrote: I should really have asked earlier. Thanks for all the helps. I think I was excited too soon :). Increasing stacksize does help if I run a job in a dedicated server. Today I tried to modify the cluster (/etc/security/limits.conf, /etc/init.d/pbs_mom) and tried to run a different job with 4 nodes/8 core each (nodes=4:ppn=8), but I still get the mpirun error. My ulimit now reads: $ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 8271027 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 32768 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) unlimited cpu time (seconds, -t) unlimited max user processes (-u) 8192 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited Any other advice??? On 3/30/13 10:28 PM, Ralph Castain wrote: FWIW: there is an MCA param that helps with such problems: opal_set_max_sys_limits "Set to non-zero to automatically set any system-imposed limits to the maximum allowed", At the moment, it only sets the limits on number of files open, and max size of a file we can create. Easy enough to add the stack size, though as someone pointed out, it has some negatives as well. On Mar 30, 2013, at 7:35 AM, Gustavo Correa wrote: On Mar 30, 2013, at 10:02 AM, Duke Nguyen wrote: On 3/30/13 8:20 PM, Reuti wrote: Am 30.03.2013 um 13:26 schrieb Tim Prince: On 03/30/2013 06:36 AM, Duke Nguyen wrote: On 3/30/13 5:22 PM, Duke Nguyen wrote: On 3/30/13 3:13 PM, Patrick Bégou wrote: I do not know about your code but: 1) did you check stack limitations ? Typically intel fortran codes needs large amount of stack when the problem size increase. Check ulimit -a First time I heard of stack limitations. Anyway, ulimit -a gives $ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 127368 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 1024 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited So stack size is 10MB??? Does this one create problem? How do I change this? I did $ ulimit -s unlimited to have stack size to be unlimited, and the job ran fine!!! So it looks like stack limit is the problem. Questions are: * how do I set this automatically (and permanently)? * should I set all other ulimits to be unlimited? In our environment, the only solution we found is to have mpirun run a script on each node which sets ulimit (as well as environment variables which are more convenient to set there than in the mpirun), before starting the executable. We had expert recommendations against this but no other working solution. It seems unlikely that you would want to remove any limits which work at default. Stack size
Re: [OMPI users] memory per core/process
On 3/31/13 12:20 AM, Duke Nguyen wrote: I should really have asked earlier. Thanks for all the helps. I think I was excited too soon :). Increasing stacksize does help if I run a job in a dedicated server. Today I tried to modify the cluster (/etc/security/limits.conf, /etc/init.d/pbs_mom) and tried to run a different job with 4 nodes/8 core each (nodes=4:ppn=8), but I still get the mpirun error. My ulimit now reads: $ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 8271027 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 32768 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) unlimited cpu time (seconds, -t) unlimited max user processes (-u) 8192 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited Any other advice??? On 3/30/13 10:28 PM, Ralph Castain wrote: FWIW: there is an MCA param that helps with such problems: opal_set_max_sys_limits "Set to non-zero to automatically set any system-imposed limits to the maximum allowed", At the moment, it only sets the limits on number of files open, and max size of a file we can create. Easy enough to add the stack size, though as someone pointed out, it has some negatives as well. On Mar 30, 2013, at 7:35 AM, Gustavo Correawrote: On Mar 30, 2013, at 10:02 AM, Duke Nguyen wrote: On 3/30/13 8:20 PM, Reuti wrote: Am 30.03.2013 um 13:26 schrieb Tim Prince: On 03/30/2013 06:36 AM, Duke Nguyen wrote: On 3/30/13 5:22 PM, Duke Nguyen wrote: On 3/30/13 3:13 PM, Patrick Bégou wrote: I do not know about your code but: 1) did you check stack limitations ? Typically intel fortran codes needs large amount of stack when the problem size increase. Check ulimit -a First time I heard of stack limitations. Anyway, ulimit -a gives $ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 127368 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 1024 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited So stack size is 10MB??? Does this one create problem? How do I change this? I did $ ulimit -s unlimited to have stack size to be unlimited, and the job ran fine!!! So it looks like stack limit is the problem. Questions are: * how do I set this automatically (and permanently)? * should I set all other ulimits to be unlimited? In our environment, the only solution we found is to have mpirun run a script on each node which sets ulimit (as well as environment variables which are more convenient to set there than in the mpirun), before starting the executable. We had expert recommendations against this but no other working solution. It seems unlikely that you would want to remove any limits which work at default. Stack size unlimited in reality is not unlimited; it may be limited by a system limit or implementation. As we run up to 120 threads per rank and many applications have threadprivate data regions, ability to run without considering stack limit is the exception rather than the rule. Even if I would be the only user on a cluster of machines, I would define this in any queuingsystem to set the limits for the job. Sorry if I dont get this correctly, but do you mean I should set this using Torque/Maui (our queuing manager) instead of the system itself (/etc/security/limits.conf and /etc/profile.d/)? Hi Duke We do both. Set memlock and stacksize to unlimited, and increase the maximum number of open files in the pbs_mom script in /etc/init.d, and do the same in /etc/security/limits.conf. This maybe an overzealous "belt and suspenders" policy, but it works. As everybody else said, a small stacksize is a common cause of segmentation fault in large codes. Basically all codes that we run here have this problem, with too many automatic arrays, structures, etc in functions and subroutines. But also a small memlock is trouble for OFED/Infinband, and the small (default) max number of open file handles may hit the limit easily if many programs (or poorly written programs) are running in the same node. The default Linux
Re: [OMPI users] memory per core/process
I should really have asked earlier. Thanks for all the helps. D. On 3/30/13 10:28 PM, Ralph Castain wrote: FWIW: there is an MCA param that helps with such problems: opal_set_max_sys_limits "Set to non-zero to automatically set any system-imposed limits to the maximum allowed", At the moment, it only sets the limits on number of files open, and max size of a file we can create. Easy enough to add the stack size, though as someone pointed out, it has some negatives as well. On Mar 30, 2013, at 7:35 AM, Gustavo Correawrote: On Mar 30, 2013, at 10:02 AM, Duke Nguyen wrote: On 3/30/13 8:20 PM, Reuti wrote: Am 30.03.2013 um 13:26 schrieb Tim Prince: On 03/30/2013 06:36 AM, Duke Nguyen wrote: On 3/30/13 5:22 PM, Duke Nguyen wrote: On 3/30/13 3:13 PM, Patrick Bégou wrote: I do not know about your code but: 1) did you check stack limitations ? Typically intel fortran codes needs large amount of stack when the problem size increase. Check ulimit -a First time I heard of stack limitations. Anyway, ulimit -a gives $ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 127368 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 1024 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited So stack size is 10MB??? Does this one create problem? How do I change this? I did $ ulimit -s unlimited to have stack size to be unlimited, and the job ran fine!!! So it looks like stack limit is the problem. Questions are: * how do I set this automatically (and permanently)? * should I set all other ulimits to be unlimited? In our environment, the only solution we found is to have mpirun run a script on each node which sets ulimit (as well as environment variables which are more convenient to set there than in the mpirun), before starting the executable. We had expert recommendations against this but no other working solution. It seems unlikely that you would want to remove any limits which work at default. Stack size unlimited in reality is not unlimited; it may be limited by a system limit or implementation. As we run up to 120 threads per rank and many applications have threadprivate data regions, ability to run without considering stack limit is the exception rather than the rule. Even if I would be the only user on a cluster of machines, I would define this in any queuingsystem to set the limits for the job. Sorry if I dont get this correctly, but do you mean I should set this using Torque/Maui (our queuing manager) instead of the system itself (/etc/security/limits.conf and /etc/profile.d/)? Hi Duke We do both. Set memlock and stacksize to unlimited, and increase the maximum number of open files in the pbs_mom script in /etc/init.d, and do the same in /etc/security/limits.conf. This maybe an overzealous "belt and suspenders" policy, but it works. As everybody else said, a small stacksize is a common cause of segmentation fault in large codes. Basically all codes that we run here have this problem, with too many automatic arrays, structures, etc in functions and subroutines. But also a small memlock is trouble for OFED/Infinband, and the small (default) max number of open file handles may hit the limit easily if many programs (or poorly written programs) are running in the same node. The default Linux distribution limits don't seem to be tailored for HPC, I guess. I hope this helps, Gus Correa ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] memory per core/process
FWIW: there is an MCA param that helps with such problems: opal_set_max_sys_limits "Set to non-zero to automatically set any system-imposed limits to the maximum allowed", At the moment, it only sets the limits on number of files open, and max size of a file we can create. Easy enough to add the stack size, though as someone pointed out, it has some negatives as well. On Mar 30, 2013, at 7:35 AM, Gustavo Correawrote: > > On Mar 30, 2013, at 10:02 AM, Duke Nguyen wrote: > >> On 3/30/13 8:20 PM, Reuti wrote: >>> Am 30.03.2013 um 13:26 schrieb Tim Prince: >>> On 03/30/2013 06:36 AM, Duke Nguyen wrote: > On 3/30/13 5:22 PM, Duke Nguyen wrote: >> On 3/30/13 3:13 PM, Patrick Bégou wrote: >>> I do not know about your code but: >>> >>> 1) did you check stack limitations ? Typically intel fortran codes >>> needs large amount of stack when the problem size increase. >>> Check ulimit -a >> First time I heard of stack limitations. Anyway, ulimit -a gives >> >> $ ulimit -a >> core file size (blocks, -c) 0 >> data seg size (kbytes, -d) unlimited >> scheduling priority (-e) 0 >> file size (blocks, -f) unlimited >> pending signals (-i) 127368 >> max locked memory (kbytes, -l) unlimited >> max memory size (kbytes, -m) unlimited >> open files (-n) 1024 >> pipe size(512 bytes, -p) 8 >> POSIX message queues (bytes, -q) 819200 >> real-time priority (-r) 0 >> stack size (kbytes, -s) 10240 >> cpu time (seconds, -t) unlimited >> max user processes (-u) 1024 >> virtual memory (kbytes, -v) unlimited >> file locks (-x) unlimited >> >> So stack size is 10MB??? Does this one create problem? How do I change >> this? > I did $ ulimit -s unlimited to have stack size to be unlimited, and the > job ran fine!!! So it looks like stack limit is the problem. Questions > are: > > * how do I set this automatically (and permanently)? > * should I set all other ulimits to be unlimited? > In our environment, the only solution we found is to have mpirun run a script on each node which sets ulimit (as well as environment variables which are more convenient to set there than in the mpirun), before starting the executable. We had expert recommendations against this but no other working solution. It seems unlikely that you would want to remove any limits which work at default. Stack size unlimited in reality is not unlimited; it may be limited by a system limit or implementation. As we run up to 120 threads per rank and many applications have threadprivate data regions, ability to run without considering stack limit is the exception rather than the rule. >>> Even if I would be the only user on a cluster of machines, I would define >>> this in any queuingsystem to set the limits for the job. >> >> Sorry if I dont get this correctly, but do you mean I should set this using >> Torque/Maui (our queuing manager) instead of the system itself >> (/etc/security/limits.conf and /etc/profile.d/)? > > Hi Duke > > We do both. > Set memlock and stacksize to unlimited, and increase the maximum number of > open files in the pbs_mom script in /etc/init.d, and do the same in > /etc/security/limits.conf. > This maybe an overzealous "belt and suspenders" policy, but it works. > As everybody else said, a small stacksize is a common cause of segmentation > fault in > large codes. > Basically all codes that we run here have this problem, with too many > automatic arrays, structures, etc in functions and subroutines. > But also a small memlock is trouble for OFED/Infinband, and the small > (default) > max number of open file handles may hit the limit easily if many programs > (or poorly written programs) are running in the same node. > The default Linux distribution limits don't seem to be tailored for HPC, I > guess. > > I hope this helps, > Gus Correa > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] memory per core/process
On Mar 30, 2013, at 10:02 AM, Duke Nguyen wrote: > On 3/30/13 8:20 PM, Reuti wrote: >> Am 30.03.2013 um 13:26 schrieb Tim Prince: >> >>> On 03/30/2013 06:36 AM, Duke Nguyen wrote: On 3/30/13 5:22 PM, Duke Nguyen wrote: > On 3/30/13 3:13 PM, Patrick Bégou wrote: >> I do not know about your code but: >> >> 1) did you check stack limitations ? Typically intel fortran codes needs >> large amount of stack when the problem size increase. >> Check ulimit -a > First time I heard of stack limitations. Anyway, ulimit -a gives > > $ ulimit -a > core file size (blocks, -c) 0 > data seg size (kbytes, -d) unlimited > scheduling priority (-e) 0 > file size (blocks, -f) unlimited > pending signals (-i) 127368 > max locked memory (kbytes, -l) unlimited > max memory size (kbytes, -m) unlimited > open files (-n) 1024 > pipe size(512 bytes, -p) 8 > POSIX message queues (bytes, -q) 819200 > real-time priority (-r) 0 > stack size (kbytes, -s) 10240 > cpu time (seconds, -t) unlimited > max user processes (-u) 1024 > virtual memory (kbytes, -v) unlimited > file locks (-x) unlimited > > So stack size is 10MB??? Does this one create problem? How do I change > this? I did $ ulimit -s unlimited to have stack size to be unlimited, and the job ran fine!!! So it looks like stack limit is the problem. Questions are: * how do I set this automatically (and permanently)? * should I set all other ulimits to be unlimited? >>> In our environment, the only solution we found is to have mpirun run a >>> script on each node which sets ulimit (as well as environment variables >>> which are more convenient to set there than in the mpirun), before starting >>> the executable. We had expert recommendations against this but no other >>> working solution. It seems unlikely that you would want to remove any >>> limits which work at default. >>> Stack size unlimited in reality is not unlimited; it may be limited by a >>> system limit or implementation. As we run up to 120 threads per rank and >>> many applications have threadprivate data regions, ability to run without >>> considering stack limit is the exception rather than the rule. >> Even if I would be the only user on a cluster of machines, I would define >> this in any queuingsystem to set the limits for the job. > > Sorry if I dont get this correctly, but do you mean I should set this using > Torque/Maui (our queuing manager) instead of the system itself > (/etc/security/limits.conf and /etc/profile.d/)? Hi Duke We do both. Set memlock and stacksize to unlimited, and increase the maximum number of open files in the pbs_mom script in /etc/init.d, and do the same in /etc/security/limits.conf. This maybe an overzealous "belt and suspenders" policy, but it works. As everybody else said, a small stacksize is a common cause of segmentation fault in large codes. Basically all codes that we run here have this problem, with too many automatic arrays, structures, etc in functions and subroutines. But also a small memlock is trouble for OFED/Infinband, and the small (default) max number of open file handles may hit the limit easily if many programs (or poorly written programs) are running in the same node. The default Linux distribution limits don't seem to be tailored for HPC, I guess. I hope this helps, Gus Correa
Re: [OMPI users] memory per core/process
On 3/30/13 8:20 PM, Reuti wrote: Am 30.03.2013 um 13:26 schrieb Tim Prince: On 03/30/2013 06:36 AM, Duke Nguyen wrote: On 3/30/13 5:22 PM, Duke Nguyen wrote: On 3/30/13 3:13 PM, Patrick Bégou wrote: I do not know about your code but: 1) did you check stack limitations ? Typically intel fortran codes needs large amount of stack when the problem size increase. Check ulimit -a First time I heard of stack limitations. Anyway, ulimit -a gives $ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 127368 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 1024 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited So stack size is 10MB??? Does this one create problem? How do I change this? I did $ ulimit -s unlimited to have stack size to be unlimited, and the job ran fine!!! So it looks like stack limit is the problem. Questions are: * how do I set this automatically (and permanently)? * should I set all other ulimits to be unlimited? In our environment, the only solution we found is to have mpirun run a script on each node which sets ulimit (as well as environment variables which are more convenient to set there than in the mpirun), before starting the executable. We had expert recommendations against this but no other working solution. It seems unlikely that you would want to remove any limits which work at default. Stack size unlimited in reality is not unlimited; it may be limited by a system limit or implementation. As we run up to 120 threads per rank and many applications have threadprivate data regions, ability to run without considering stack limit is the exception rather than the rule. Even if I would be the only user on a cluster of machines, I would define this in any queuingsystem to set the limits for the job. Sorry if I dont get this correctly, but do you mean I should set this using Torque/Maui (our queuing manager) instead of the system itself (/etc/security/limits.conf and /etc/profile.d/)?
Re: [OMPI users] memory per core/process
Ok, so your problem is identified as a stack size problem. I went into these limitations using Intel fortran compilers on large data problems. First, it seems you can increase your stack size as "ulimit -s unlimited" works (you didn't enforce the system hard limit). The best way is to set this setting in your .bashrc file so it will works on every node. But setting it to unlimited may not be really safe. IE, if you got in a badly coded recursive function calling itself without a stop condition you can request all the system memory and crash the node. So set a large but limited value, it's safer. I'm managing a cluster and I always set a maximum value to stack size. I also limit the memory available for each core for system stability. If a user request only one of the 12 cores of a node he can only access 1/12 of the node memory amount. If he needs more memory he has to request 2 cores, even if he uses a sequential code. This avoid crashing jobs of other users on the same node with memory requirements. But this is not configured on your node. Duke Nguyen a écrit : On 3/30/13 3:13 PM, Patrick Bégou wrote: I do not know about your code but: 1) did you check stack limitations ? Typically intel fortran codes needs large amount of stack when the problem size increase. Check ulimit -a First time I heard of stack limitations. Anyway, ulimit -a gives $ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 127368 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 1024 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited So stack size is 10MB??? Does this one create problem? How do I change this? 2) did your node uses cpuset and memory limitation like fake numa to set the maximum amount of memory available for a job ? Not really understand (also first time heard of fake numa), but I am pretty sure we do not have such things. The server I tried was a dedicated server with 2 x5420 and 16GB physical memory. Patrick Duke Nguyen a écrit : Hi folks, I am sorry if this question had been asked before, but after ten days of searching/working on the system, I surrender :(. We try to use mpirun to run abinit (abinit.org) which in turns will call an input file to run some simulation. The command to run is pretty simple $ mpirun -np 4 /opt/apps/abinit/bin/abinit < input.files >& output.log We ran this command on a server with two quad core x5420 and 16GB of memory. I called only 4 core, and I guess in theory each of the core should take up to 2GB each. In the output of the log, there is something about memory: P This job should need less than 717.175 Mbytes of memory. Rough estimation (10% accuracy) of disk space for files : WF disk file : 69.524 Mbytes ; DEN or POT disk file : 14.240 Mbytes. So basically it reported that the above job should not take more than 718MB each core. But I still have the Segmentation Fault error: mpirun noticed that process rank 0 with PID 16099 on node biobos exited on signal 11 (Segmentation fault). The system already has limits up to unlimited: $ cat /etc/security/limits.conf | grep -v '#' * soft memlock unlimited * hard memlock unlimited I also tried to run $ ulimit -l unlimited before the mpirun command above, but it did not help at all. If we adjust the parameters of the input.files to give the reported mem per core is less than 512MB, then the job runs fine. Please help, Thanks, D. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] memory per core/process
Am 30.03.2013 um 13:26 schrieb Tim Prince: > On 03/30/2013 06:36 AM, Duke Nguyen wrote: >> On 3/30/13 5:22 PM, Duke Nguyen wrote: >>> On 3/30/13 3:13 PM, Patrick Bégou wrote: I do not know about your code but: 1) did you check stack limitations ? Typically intel fortran codes needs large amount of stack when the problem size increase. Check ulimit -a >>> >>> First time I heard of stack limitations. Anyway, ulimit -a gives >>> >>> $ ulimit -a >>> core file size (blocks, -c) 0 >>> data seg size (kbytes, -d) unlimited >>> scheduling priority (-e) 0 >>> file size (blocks, -f) unlimited >>> pending signals (-i) 127368 >>> max locked memory (kbytes, -l) unlimited >>> max memory size (kbytes, -m) unlimited >>> open files (-n) 1024 >>> pipe size(512 bytes, -p) 8 >>> POSIX message queues (bytes, -q) 819200 >>> real-time priority (-r) 0 >>> stack size (kbytes, -s) 10240 >>> cpu time (seconds, -t) unlimited >>> max user processes (-u) 1024 >>> virtual memory (kbytes, -v) unlimited >>> file locks (-x) unlimited >>> >>> So stack size is 10MB??? Does this one create problem? How do I change this? >> >> I did $ ulimit -s unlimited to have stack size to be unlimited, and the job >> ran fine!!! So it looks like stack limit is the problem. Questions are: >> >> * how do I set this automatically (and permanently)? >> * should I set all other ulimits to be unlimited? >> > In our environment, the only solution we found is to have mpirun run a script > on each node which sets ulimit (as well as environment variables which are > more convenient to set there than in the mpirun), before starting the > executable. We had expert recommendations against this but no other working > solution. It seems unlikely that you would want to remove any limits which > work at default. > Stack size unlimited in reality is not unlimited; it may be limited by a > system limit or implementation. As we run up to 120 threads per rank and > many applications have threadprivate data regions, ability to run without > considering stack limit is the exception rather than the rule. Even if I would be the only user on a cluster of machines, I would define this in any queuingsystem to set the limits for the job. -- Reuti > -- > Tim Prince > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] memory per core/process
On 03/30/2013 06:36 AM, Duke Nguyen wrote: On 3/30/13 5:22 PM, Duke Nguyen wrote: On 3/30/13 3:13 PM, Patrick Bégou wrote: I do not know about your code but: 1) did you check stack limitations ? Typically intel fortran codes needs large amount of stack when the problem size increase. Check ulimit -a First time I heard of stack limitations. Anyway, ulimit -a gives $ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 127368 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 1024 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited So stack size is 10MB??? Does this one create problem? How do I change this? I did $ ulimit -s unlimited to have stack size to be unlimited, and the job ran fine!!! So it looks like stack limit is the problem. Questions are: * how do I set this automatically (and permanently)? * should I set all other ulimits to be unlimited? In our environment, the only solution we found is to have mpirun run a script on each node which sets ulimit (as well as environment variables which are more convenient to set there than in the mpirun), before starting the executable. We had expert recommendations against this but no other working solution. It seems unlikely that you would want to remove any limits which work at default. Stack size unlimited in reality is not unlimited; it may be limited by a system limit or implementation. As we run up to 120 threads per rank and many applications have threadprivate data regions, ability to run without considering stack limit is the exception rather than the rule. -- Tim Prince
Re: [OMPI users] memory per core/process
On 3/30/13 5:22 PM, Duke Nguyen wrote: On 3/30/13 3:13 PM, Patrick Bégou wrote: I do not know about your code but: 1) did you check stack limitations ? Typically intel fortran codes needs large amount of stack when the problem size increase. Check ulimit -a First time I heard of stack limitations. Anyway, ulimit -a gives $ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 127368 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 1024 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited So stack size is 10MB??? Does this one create problem? How do I change this? I did $ ulimit -s unlimited to have stack size to be unlimited, and the job ran fine!!! So it looks like stack limit is the problem. Questions are: * how do I set this automatically (and permanently)? * should I set all other ulimits to be unlimited? Thanks, D. 2) did your node uses cpuset and memory limitation like fake numa to set the maximum amount of memory available for a job ? Not really understand (also first time heard of fake numa), but I am pretty sure we do not have such things. The server I tried was a dedicated server with 2 x5420 and 16GB physical memory. Patrick Duke Nguyen a écrit : Hi folks, I am sorry if this question had been asked before, but after ten days of searching/working on the system, I surrender :(. We try to use mpirun to run abinit (abinit.org) which in turns will call an input file to run some simulation. The command to run is pretty simple $ mpirun -np 4 /opt/apps/abinit/bin/abinit < input.files >& output.log We ran this command on a server with two quad core x5420 and 16GB of memory. I called only 4 core, and I guess in theory each of the core should take up to 2GB each. In the output of the log, there is something about memory: P This job should need less than 717.175 Mbytes of memory. Rough estimation (10% accuracy) of disk space for files : WF disk file : 69.524 Mbytes ; DEN or POT disk file : 14.240 Mbytes. So basically it reported that the above job should not take more than 718MB each core. But I still have the Segmentation Fault error: mpirun noticed that process rank 0 with PID 16099 on node biobos exited on signal 11 (Segmentation fault). The system already has limits up to unlimited: $ cat /etc/security/limits.conf | grep -v '#' * soft memlock unlimited * hard memlock unlimited I also tried to run $ ulimit -l unlimited before the mpirun command above, but it did not help at all. If we adjust the parameters of the input.files to give the reported mem per core is less than 512MB, then the job runs fine. Please help, Thanks, D. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] memory per core/process
Am 30.03.2013 um 05:21 schrieb Duke Nguyen: > Hi folks, > > I am sorry if this question had been asked before, but after ten days of > searching/working on the system, I surrender :(. We try to use mpirun to run > abinit (abinit.org) which in turns will call an input file to run some > simulation. The command to run is pretty simple > > $ mpirun -np 4 /opt/apps/abinit/bin/abinit < input.files >& output.log > > We ran this command on a server with two quad core x5420 and 16GB of memory. > I called only 4 core, and I guess in theory each of the core should take up > to 2GB each. > > In the output of the log, there is something about memory: > > P This job should need less than 717.175 Mbytes of memory. > Rough estimation (10% accuracy) of disk space for files : > WF disk file : 69.524 Mbytes ; DEN or POT disk file : 14.240 Mbytes. > > So basically it reported that the above job should not take more than 718MB > each core. > > But I still have the Segmentation Fault error: It might also be a programming error in abinit. You compiled abinit with the compiler version they suggest and Open MPI was compiled with the same version? It's running fine in serial mode? The `make check` of abinit succeeded? -- Reuti > mpirun noticed that process rank 0 with PID 16099 on node biobos exited on > signal 11 (Segmentation fault). > > The system already has limits up to unlimited: > > $ cat /etc/security/limits.conf | grep -v '#' > * soft memlock unlimited > * hard memlock unlimited > > I also tried to run > > $ ulimit -l unlimited > > before the mpirun command above, but it did not help at all. > > If we adjust the parameters of the input.files to give the reported mem per > core is less than 512MB, then the job runs fine. > > Please help, > > Thanks, > > D. > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] memory per core/process
On 3/30/13 3:13 PM, Patrick Bégou wrote: I do not know about your code but: 1) did you check stack limitations ? Typically intel fortran codes needs large amount of stack when the problem size increase. Check ulimit -a First time I heard of stack limitations. Anyway, ulimit -a gives $ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 127368 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 1024 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited So stack size is 10MB??? Does this one create problem? How do I change this? 2) did your node uses cpuset and memory limitation like fake numa to set the maximum amount of memory available for a job ? Not really understand (also first time heard of fake numa), but I am pretty sure we do not have such things. The server I tried was a dedicated server with 2 x5420 and 16GB physical memory. Patrick Duke Nguyen a écrit : Hi folks, I am sorry if this question had been asked before, but after ten days of searching/working on the system, I surrender :(. We try to use mpirun to run abinit (abinit.org) which in turns will call an input file to run some simulation. The command to run is pretty simple $ mpirun -np 4 /opt/apps/abinit/bin/abinit < input.files >& output.log We ran this command on a server with two quad core x5420 and 16GB of memory. I called only 4 core, and I guess in theory each of the core should take up to 2GB each. In the output of the log, there is something about memory: P This job should need less than 717.175 Mbytes of memory. Rough estimation (10% accuracy) of disk space for files : WF disk file : 69.524 Mbytes ; DEN or POT disk file : 14.240 Mbytes. So basically it reported that the above job should not take more than 718MB each core. But I still have the Segmentation Fault error: mpirun noticed that process rank 0 with PID 16099 on node biobos exited on signal 11 (Segmentation fault). The system already has limits up to unlimited: $ cat /etc/security/limits.conf | grep -v '#' * soft memlock unlimited * hard memlock unlimited I also tried to run $ ulimit -l unlimited before the mpirun command above, but it did not help at all. If we adjust the parameters of the input.files to give the reported mem per core is less than 512MB, then the job runs fine. Please help, Thanks, D. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] memory per core/process
I do not know about your code but: 1) did you check stack limitations ? Typically intel fortran codes needs large amount of stack when the problem size increase. Check ulimit -a 2) did your node uses cpuset and memory limitation like fake numa to set the maximum amount of memory available for a job ? Patrick Duke Nguyen a écrit : Hi folks, I am sorry if this question had been asked before, but after ten days of searching/working on the system, I surrender :(. We try to use mpirun to run abinit (abinit.org) which in turns will call an input file to run some simulation. The command to run is pretty simple $ mpirun -np 4 /opt/apps/abinit/bin/abinit < input.files >& output.log We ran this command on a server with two quad core x5420 and 16GB of memory. I called only 4 core, and I guess in theory each of the core should take up to 2GB each. In the output of the log, there is something about memory: P This job should need less than 717.175 Mbytes of memory. Rough estimation (10% accuracy) of disk space for files : WF disk file : 69.524 Mbytes ; DEN or POT disk file : 14.240 Mbytes. So basically it reported that the above job should not take more than 718MB each core. But I still have the Segmentation Fault error: mpirun noticed that process rank 0 with PID 16099 on node biobos exited on signal 11 (Segmentation fault). The system already has limits up to unlimited: $ cat /etc/security/limits.conf | grep -v '#' * soft memlock unlimited * hard memlock unlimited I also tried to run $ ulimit -l unlimited before the mpirun command above, but it did not help at all. If we adjust the parameters of the input.files to give the reported mem per core is less than 512MB, then the job runs fine. Please help, Thanks, D. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] memory per core/process
Hi folks, I am sorry if this question had been asked before, but after ten days of searching/working on the system, I surrender :(. We try to use mpirun to run abinit (abinit.org) which in turns will call an input file to run some simulation. The command to run is pretty simple $ mpirun -np 4 /opt/apps/abinit/bin/abinit < input.files >& output.log We ran this command on a server with two quad core x5420 and 16GB of memory. I called only 4 core, and I guess in theory each of the core should take up to 2GB each. In the output of the log, there is something about memory: P This job should need less than 717.175 Mbytes of memory. Rough estimation (10% accuracy) of disk space for files : WF disk file : 69.524 Mbytes ; DEN or POT disk file : 14.240 Mbytes. So basically it reported that the above job should not take more than 718MB each core. But I still have the Segmentation Fault error: mpirun noticed that process rank 0 with PID 16099 on node biobos exited on signal 11 (Segmentation fault). The system already has limits up to unlimited: $ cat /etc/security/limits.conf | grep -v '#' * soft memlock unlimited * hard memlock unlimited I also tried to run $ ulimit -l unlimited before the mpirun command above, but it did not help at all. If we adjust the parameters of the input.files to give the reported mem per core is less than 512MB, then the job runs fine. Please help, Thanks, D.