Re: [gridengine users] slots equals cores

2020-01-31 Thread Reuti


> Am 31.01.2020 um 18:23 schrieb Jerome IBt :
> 
> Le 31/01/2020 à 10:19, Reuti a écrit :
>> Hi Jérôme,
>> 
>> Personally I would prefer to keep the output of `qquota` short and use it 
>> only for users's limits. I.e. defining the slot limit on an exechost basis 
>> instead. This can also be done in a loop containing a command line like:
>> 
>> $ qconf -mattr exechost complex_values slots=16 node29
>> 
>> My experience is, that sometime RQS are screwed up especially if used in 
>> combination with some load values (although $num_proc is of course fixed in 
>> your case).
>> 
>> -- Reuti
>> Dear Reuti,
> 
> If i understand correctly, you recomend me to disable the RQS for the
> case of core, and add a complex_value of slots for all of the computes
> nodes?

Exactly. Doing it on the command line within a loop is not so laborious and 
it's a fixed feature of a node which will never change during its lifetime.

-- Reuti


> Thank's
> 
> -- 
> -- Jérôme
> Quand un arbre tombe, on l'entend ; quand la forêt pousse, pas un bruit.
>   (Proverbe africain)


___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] slots equals cores

2020-01-31 Thread Jerome IBt
Le 31/01/2020 à 10:19, Reuti a écrit :
> Hi Jérôme,
> 
> Personally I would prefer to keep the output of `qquota` short and use it 
> only for users's limits. I.e. defining the slot limit on an exechost basis 
> instead. This can also be done in a loop containing a command line like:
> 
> $ qconf -mattr exechost complex_values slots=16 node29
> 
> My experience is, that sometime RQS are screwed up especially if used in 
> combination with some load values (although $num_proc is of course fixed in 
> your case).
> 
> -- Reuti
>  Dear Reuti,

If i understand correctly, you recomend me to disable the RQS for the
case of core, and add a complex_value of slots for all of the computes
nodes?

Thank's

-- 
-- Jérôme
Quand un arbre tombe, on l'entend ; quand la forêt pousse, pas un bruit.
(Proverbe africain)
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] slots equals cores

2020-01-31 Thread Reuti
Hi Jérôme,

Personally I would prefer to keep the output of `qquota` short and use it only 
for users's limits. I.e. defining the slot limit on an exechost basis instead. 
This can also be done in a loop containing a command line like:

$ qconf -mattr exechost complex_values slots=16 node29

My experience is, that sometime RQS are screwed up especially if used in 
combination with some load values (although $num_proc is of course fixed in 
your case).

-- Reuti


> Am 31.01.2020 um 17:00 schrieb Jerome :
> 
> Dear all
> 
> I'm facing a new problem on my cluster with SGE. I don't show this
> before.. O maybe I never detect it.
> I have some nodes with 2 queue, one (named "all.q" ) to run jobs no more
> than 24h , and another queue (named "lenta.q" ) to run jobs than need
> more than 24 h.
> I determine qa resource quota as i read some time in this email list,
> defined as following:
> 
> {
>   name slots_equals_cores
>   description  Prevent core over-subscription across queues
>   enabled  TRUE
>   limithosts {*} to slots=$num_proc
> }
> 
> 
> For now, i have a node with 64 cores, 40 cores for the normal queue ,
> and 24 for the large queue.
> 
> 
> all.q@compute-2-0.localBP0/16/4015.93lx-amd64
> 
> lenta.q@compute-2-0.local  BP0/0/24 15.93lx-amd64
> 
> Some jobs with 2 cores don't enter in this node on the large time queue,
> althougth there is no problem with memory or core. The qstat indicate me
> this:
> 
> "compute-2-0/" in rule "slots_equals_cores/1"
>cannot run because it exceeds limit
> "compute-2-0/" in rule "slots_equals_cores/1"
>cannot run because it exceeds limit
> "compute-0-4/" in rule "slots_equals_cores/1"
>cannot run in PE "thread" because it only
> offers 0 slots
> 
> I really don't understand why the job is not running on tis nodes, at
> for my opinion it's free for this.
> 
> Somenoe can help me about this?
> 
> REgards.
> 
> -- 
> -- Jérôme
> Le baiser est la plus sûre façon de se taire en disant tout.
>   (Guy de Maupassant)
> ___
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users


___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


[gridengine users] slots equals cores

2020-01-31 Thread Jerome
Dear all

I'm facing a new problem on my cluster with SGE. I don't show this
before.. O maybe I never detect it.
I have some nodes with 2 queue, one (named "all.q" ) to run jobs no more
than 24h , and another queue (named "lenta.q" ) to run jobs than need
more than 24 h.
I determine qa resource quota as i read some time in this email list,
defined as following:

{
   name slots_equals_cores
   description  Prevent core over-subscription across queues
   enabled  TRUE
   limithosts {*} to slots=$num_proc
}


For now, i have a node with 64 cores, 40 cores for the normal queue ,
and 24 for the large queue.


all.q@compute-2-0.localBP0/16/4015.93lx-amd64

lenta.q@compute-2-0.local  BP0/0/24 15.93lx-amd64

Some jobs with 2 cores don't enter in this node on the large time queue,
althougth there is no problem with memory or core. The qstat indicate me
this:

"compute-2-0/" in rule "slots_equals_cores/1"
cannot run because it exceeds limit
"compute-2-0/" in rule "slots_equals_cores/1"
cannot run because it exceeds limit
"compute-0-4/" in rule "slots_equals_cores/1"
cannot run in PE "thread" because it only
offers 0 slots

I really don't understand why the job is not running on tis nodes, at
for my opinion it's free for this.

Somenoe can help me about this?

REgards.

-- 
-- Jérôme
Le baiser est la plus sûre façon de se taire en disant tout.
(Guy de Maupassant)
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users