On 3/31/13 7:52 AM, Paul Robert Marino wrote:
I might use a simple loop in a Perl script where I repeatedly multiply variables till I hit a limit or ran the system out of memory And if I wanted to test shared memory I would do it with the threads::shared module to explicitly put the variables in shared memory

Updated: it turns out that - from the great help from openmpi people - intel fortran (abinit) codes usually need a large size of stacksize (ulimit -s). I had to increase this in order for the job to run fine. Our system in fact still uses all the available memory (not just half as I had thought of).

Thanks for all the helps,

D.





-- Sent from my HP Pre3

------------------------------------------------------------------------
On Mar 30, 2013 12:24 AM, Duke Nguyen <duke.li...@gmx.com> wrote:

On 3/29/13 10:12 PM, Paul Robert Marino wrote:
> well openmip is the app that executes it so thats where the limitation
> is probably coming from.
> With a little time on Google you will find plenty of posts on the
> subject of openmpi not being able to take advantage of all the
> resources available to it.
> The problem is Ive never seen an answer as to why, not that I looked
> all that long. Most of the suggestions talk about the ulimit setting
> which on the surface makes some sense but those numbers aren't right
> for an issue caused by a ulimit. the other the most of the openmpi
> users who have asked the question and got told it was ulimits said
> latter that adjusting the ulimits didn't fix their issues. so again it
> sounds like a problem in the code for either openmpi or the code you
> are trying to execute with it.
> but the only other possibility is maybe SELinux is preventing
> something that capping the memory somehow as a side effect but i doubt
> it.
>

Very useful comments Paul. I am jumping to openmpi forum to ask if they
are of help. Anyway, is there a way of testing the total memory of the
system? Any simple bash program (no use of openmpi) that I can try for
all the cores so that I can know that my system can take up to 8GB RAM?

Thanks,

D.

>
>
> On Thu, Mar 28, 2013 at 12:39 PM, Duke Nguyen <duke.li...@gmx.com> wrote:
>> On 3/28/13 9:00 PM, Paul Robert Marino wrote:
>>> kernel.shmmax does nothing if you don't bump up kernel.shmall
>>> accordingly but I can tell you the cap is something wrong with your
>>> application not the OS.
>>> at one time I supported an application that in normal operation used
>>> 64BG Resident memory per instance.
>>> And currently my PostgreSQL servers often spike to as much as 2GB of
>>> ram per connection and would use more if i didn't cap it there in the
>>> configurations.
>>
>> Interesting, I never knew of any server process that takes that much of
>> memory. Anyway, it is good to know :).
>>
>>
>>> I don't think the kernel settings are your problem what language is
>>> the application written in?
>>> Is it executed by an other process like Apache or Tomcat for example?
>>
>> The app (a material simulation app) is just an input file which will calling >> abinit (http://www.abinit.org/) using openmpi to run. So it is executed by
>> abinit. At the time the app runs, we make sure that no other process
>> (apache, tomcat etc...) is running, so basically the app should take all
>> available memory.
>>
>> Thanks,
>>
>> D.
>>
>>
>>>
>>> On Wed, Mar 27, 2013 at 11:09 PM, Duke Nguyen <duke.li...@gmx.com> wrote:
>>>> On 3/27/13 11:52 PM, Attilio De Falco wrote:
>>>>> Just a stab in the dark, but did you check the Shared Memory kernel
>>>>> parameter (shmmax), type "cat /proc/sys/kernel/shmmax". We have it set
>>>>> very
>>>>> high so that any process/thread can use as much memory as it needs. You
>>>>> set
>>>>> the limit to 1 GB without rebooting by typing "echo 1073741824 >
>>>>> /proc/sys/kernel/shmmax" or modify /etc/sysctl.conf and add the line >>>>> "kernel.shmmax = 1073741824" so remains after a reboot. I'm not sure
>>>>> about
>>>>> abinit but some fortran programs need shmmax limit to be set high…
>>>>
>>>> Hi Attilio, we already had it at very high value (not sure why, I never
>>>> changed/edited this value before)
>>>>
>>>> [root@biobos:~]# sysctl -p
>>>> net.ipv4.ip_forward = 1
>>>> net.ipv4.conf.default.rp_filter = 1
>>>> net.ipv4.conf.default.accept_source_route = 0
>>>> kernel.sysrq = 0
>>>> kernel.core_uses_pid = 1
>>>> net.ipv4.tcp_syncookies = 1
>>>> error: "net.bridge.bridge-nf-call-ip6tables" is an unknown key
>>>> error: "net.bridge.bridge-nf-call-iptables" is an unknown key
>>>> error: "net.bridge.bridge-nf-call-arptables" is an unknown key
>>>> kernel.msgmnb = 65536
>>>> kernel.msgmax = 65536
>>>> kernel.shmmax = 68719476736
>>>> kernel.shmall = 4294967296
>>>> [root@biobos:~]# cat /proc/sys/kernel/shmmax
>>>> 68719476736
>>>>
>>>> Any other suggestions?
>>>>
>>>>
>>>>> On Mar 26, 2013, at 9:59 PM, Duke Nguyen <duke.li...@gmx.com> wrote:
>>>>>
>>>>>> Hi folks,
>>>>>>
>>>>>> We have SL6.3 64bit installed on a box with two quad core and 8GB RAM.
>>>>>> We
>>>>>> installed openmpi, Intel Studio XE and abinit to run parallel (8
>>>>>> cores/processes) some of our applications. To our surprise, the system >>>>>> usually takes only about half of available memory (about 500MB each
>>>>>> core)
>>>>>> and then the job/task was killed with the low-resource error.
>>>>>>
>>>>>> We dont really understand why there is a cap of "512MB" (I guess it
>>>>>> would
>>>>>> be 512MB instead of 500MB) for each of our cores whereas in theory,
>>>>>> each of
>>>>>> the core should be able to run up to 1GB. Any
>>>>>> suggestions/comments/experience about this issue?
>>>>>>
>>>>>> Thanks in advance,
>>>>>>
>>>>>> D.
>>>>>>


Reply via email to