„schedd_job_info“ does not scale due to its nature (the amount of 
messages per job are depend on the cluster size and for each job
messages are generated). It is also questionable if all scheduler 
decisions for each job and resource (queue instances) needs to
be documented temporarily. Hence the recommendation is always
to turn it off (I think we changed the default to that in one of the 
last Sun versions). Alternatively you can use "qalter -w p <jobid>“
for figuring out why a job is not scheduled (produces similar messages
but for only one particular job instead).

Daniel

> Am 09.02.2015 um 09:43 schrieb Remy Dernat <[email protected]>:
> 
> 
> Le 09/02/2015 03:56, Christopher Samuel a écrit :
>> On 07/02/15 14:57, Alan Louis Scheinine wrote:
>> 
>>> Only problem I've seen is that if a user allocates too much memory,
>>> OOM killer can kill maintenance processes such as a scheduler daemon.
>> This is why we disable overcommit. :-)
>> 
> Hi,
> 
> I already saw that problem on our master. The scheduler, SGE, runs out of 
> memory and OOM decided to kill it:
> 
> Dec  1 15:01:07 cluster1 kernel: Out of memory: Kill process 7963 
> (sge_qmaster) score 948 or sacrifice child
> 
> I resolved that issue by disabling "schedd_job_info" in SGE with "qconf 
> -msconf".
> 
> However, this setting gives significant informations about our jobs.
> 
> How should I adjust OOM killer ? Sould I set 
>  vm.overcomm!
>  it_memory 
> = 2
> ?
> 
> Best regards,
> 
> Rémy
> 
> -- 
> Rémy Dernat
> MBB/ISE-M
> _______________________________________________
> Beowulf mailing list, [email protected] sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to