Re: [gridengine users] Start jobs on exec host in sequential order

2018-08-08 Thread Derrick Lin
Thanks guys, I will take a look at each option.

On Mon, Aug 6, 2018 at 9:52 PM, William Hay  wrote:

> On Wed, Aug 01, 2018 at 11:06:19AM +1000, Derrick Lin wrote:
> >HI Reuti,
> >The prolog script is set to run by root indeed. The xfs quota requires
> >root privilege.
> >I also tried the 2nd approach but it seems that the addgrpid file has
> not
> >been created when the prolog script executed:
> >/opt/gridengine/default/common/prolog_exec.sh: line 21:
> >/opt/gridengine/default/spool/omega-1-27/active_jobs/
> 1187086.1/addgrpid:
> You can also extract the groupid from the config file which should be
> present on the master node
> when the prolog is run.
>
> XFS_PROJID="$(awk -F= '/^add_grp_id=/{print $2}'
> <${SGE_JOB_SPOOL_DIR}/config)"
>
> NB: If you want this on the slave node of a multi-node job and you allow
> multi-node jobs to share nodes (we don't) then you will need to extract
> a project id on each slave node.  Probably the best place to do this
> would be in a wrapper around rsh_daemon. However you'll need some sort of
> locking in case a program launches multiple slave tasks(most codes
> just launch one slave task per node which then forks) or launches
> a slave task on the master node.
>
> William
>
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Start jobs on exec host in sequential order

2018-08-06 Thread William Hay
On Wed, Aug 01, 2018 at 11:06:19AM +1000, Derrick Lin wrote:
>HI Reuti,
>The prolog script is set to run by root indeed. The xfs quota requires
>root privilege.
>I also tried the 2nd approach but it seems that the addgrpid file has not
>been created when the prolog script executed:
>/opt/gridengine/default/common/prolog_exec.sh: line 21:
>/opt/gridengine/default/spool/omega-1-27/active_jobs/1187086.1/addgrpid:
You can also extract the groupid from the config file which should be present 
on the master node
when the prolog is run.

XFS_PROJID="$(awk -F= '/^add_grp_id=/{print $2}' <${SGE_JOB_SPOOL_DIR}/config)"

NB: If you want this on the slave node of a multi-node job and you allow
multi-node jobs to share nodes (we don't) then you will need to extract 
a project id on each slave node.  Probably the best place to do this 
would be in a wrapper around rsh_daemon. However you'll need some sort of 
locking in case a program launches multiple slave tasks(most codes
just launch one slave task per node which then forks) or launches
a slave task on the master node.  

William


signature.asc
Description: PGP signature
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Start jobs on exec host in sequential order

2018-08-01 Thread Reuti

> Am 01.08.2018 um 03:06 schrieb Derrick Lin :
> 
> HI Reuti,
> 
> The prolog script is set to run by root indeed. The xfs quota requires root 
> privilege.
> 
> I also tried the 2nd approach but it seems that the addgrpid file has not 
> been created when the prolog script executed:
> 
> /opt/gridengine/default/common/prolog_exec.sh: line 21: 
> /opt/gridengine/default/spool/omega-1-27/active_jobs/1187086.1/addgrpid: No 
> such file or directory

I must admit: I wasn't aware of this. Only during the execution of the job it's 
essentially available.

But this has the side effect, that anything done in a prolog or epilog when it 
runs (especially under the user's account) can't be traced or accounted (or a 
`qdel` might fail). This is somewhat surprising.

Do you set the quota with a shell script or a binary? Another idea could be to 
use a starter_method in the queue configuration. Then the addgrpid exists (I 
checked it), and you could call a binary with a SUID to root therein (as the 
starter_method will eventually call the user's script and will run under his 
account only). The SUID won't work for scripts, hence the final call to binary 
with set SUID.

#!/bin/sh
export ADDGRPID=(< $SGE_JOB_SPOOL_DIR/addgrpid)

call some script/binary to set the quota

exec "${@}"

-- Reuti


> Maybe some of my scheduler conf is not correct?
> 
> Regards,
> Derrick
> 
> On Mon, Jul 30, 2018 at 7:35 PM, Reuti  wrote:
> 
> > Am 30.07.2018 um 02:31 schrieb Derrick Lin :
> > 
> > Hi Reuti,
> > 
> > The approach sounds great.
> > 
> > But the prolog script seems to be run by root, so this is what I got:
> > 
> > XFS_PROJID:uid=0(root) gid=0(root) groups=0(root),396(sfcb)
> 
> This is quite unusual. Do you run the prolog as root by intention? I assume 
> so to set the limits:
> 
> $ qconf -sq my.q
> …prolog/some/script
> 
> Do you have here "root:" to change the user (in the global `qconf -sconf`) 
> under which it is run? Please note that this my open some root doors, 
> depending on environment variable setting. I have here "sgeadmin:" for some 
> special handling and use:
> 
> sgeadmin@/usr/sge/cluster/busybox env -u LD_LIBRARY_PATH -u LD_PRELOAD -u IFS 
> /usr/sge/cluster/context.sh
> 
> Nevertheless: the second approach to get the additional group ID from the 
> job's spool area should work.
> 
> -- Reuti
> 
> 
> > 
> > Maybe I am still missing something or prolog script is the wrong place for 
> > getting the group ID generated by SGE?
> > 
> > Cheers,
> > D
> > 
> > On Sat, Jul 28, 2018 at 11:53 AM, Reuti  wrote:
> > 
> > > Am 28.07.2018 um 03:00 schrieb Derrick Lin :
> > > 
> > > Thanks Reuti,
> > > 
> > > I know little about group ID created by SGE, and also pretty much 
> > > confused with the Linux group ID.
> > 
> > Yes, SGE assigns a conventional group ID to each job to track the CPU and 
> > memory consumption. This group ID is in the range you defined in:
> > 
> > $ qconf -sconf
> > …
> > gid_range2-20100
> > 
> > and this will be unique per node. First approach could be either `sed`:
> > 
> > $ id
> > uid=25000(reuti) gid=25000(ourgroup) 
> > groups=25000(ourgroup),10(wheel),1000(operator),20052,24000(common),26000(anothergroup)
> > $ id | sed -e "s/.*),\([0-9]*\),.*/\1/"
> > 20052
> > 
> > or:
> > 
> > ADD_GRP_ID=$(< $SGE_JOB_SPOOL_DIR/addgrpid)
> > echo $ADD_GRP_ID
> > 
> > -- Reuti
> > 
> > 
> > > I assume that "ïd" is called inside the prolog script, typically what the 
> > > output looks like?
> > > 
> > > Cheers,
> > > 
> > > On Fri, Jul 27, 2018 at 4:12 PM, Reuti  wrote:
> > > 
> > > Am 27.07.2018 um 03:14 schrieb Derrick Lin:
> > > 
> > > > We are using $JOB_ID as xfs_projid at the moment, but this approach 
> > > > introduces problem to array jobs whose tasks have the same $JOB_ID 
> > > > (with different $TASK_ID).
> > > > 
> > > > Also it is possible that tasks from two different array jobs run on the 
> > > > same node contain the same $TASK_ID, thus the uniqueness of the 
> > > > $TASK_ID on the same host cannot be maintained.
> > > 
> > > So the number you are looking for needs to be unique per node only?
> > > 
> > > What about using the additional group ID then which SGE creates – this 
> > > will be unique per node.
> > > 
> > > This can be found in the `id` command's output or in location of the 
> > > spool directory for the execd_spool_dir in 
> > > ${HOSTNAME}/active_jobs/${JOB_ID}.${TASK_ID}/addgrpid
> > > 
> > > -- Reuti
> > > 
> > > 
> > > > That's why I am trying to implement the xfs_projid to be independent 
> > > > from SGE.
> > > > 
> > > > 
> > > > 
> > > > On Thu, Jul 26, 2018 at 9:27 PM, Reuti  
> > > > wrote:
> > > > Hi,
> > > > 
> > > > > Am 26.07.2018 um 06:01 schrieb Derrick Lin :
> > > > > 
> > > > > Hi all,
> > > > > 
> > > > > I am working on a prolog script which setup xfs quota on disk space 
> > > > > per job basis.
> > > > > 
> > > > > For setting up xfs quota in sub directory, I need to provide project 
> > > > > ID.
> > > > 

Re: [gridengine users] Start jobs on exec host in sequential order

2018-07-31 Thread Derrick Lin
HI Reuti,

The prolog script is set to run by root indeed. The xfs quota requires root
privilege.

I also tried the 2nd approach but it seems that the addgrpid file has not
been created when the prolog script executed:

/opt/gridengine/default/common/prolog_exec.sh: line 21:
/opt/gridengine/default/spool/omega-1-27/active_jobs/1187086.1/addgrpid: No
such file or directory

Maybe some of my scheduler conf is not correct?

Regards,
Derrick

On Mon, Jul 30, 2018 at 7:35 PM, Reuti  wrote:

>
> > Am 30.07.2018 um 02:31 schrieb Derrick Lin :
> >
> > Hi Reuti,
> >
> > The approach sounds great.
> >
> > But the prolog script seems to be run by root, so this is what I got:
> >
> > XFS_PROJID:uid=0(root) gid=0(root) groups=0(root),396(sfcb)
>
> This is quite unusual. Do you run the prolog as root by intention? I
> assume so to set the limits:
>
> $ qconf -sq my.q
> …prolog/some/script
>
> Do you have here "root:" to change the user (in the global `qconf -sconf`)
> under which it is run? Please note that this my open some root doors,
> depending on environment variable setting. I have here "sgeadmin:" for some
> special handling and use:
>
> sgeadmin@/usr/sge/cluster/busybox env -u LD_LIBRARY_PATH -u LD_PRELOAD -u
> IFS /usr/sge/cluster/context.sh
>
> Nevertheless: the second approach to get the additional group ID from the
> job's spool area should work.
>
> -- Reuti
>
>
> >
> > Maybe I am still missing something or prolog script is the wrong place
> for getting the group ID generated by SGE?
> >
> > Cheers,
> > D
> >
> > On Sat, Jul 28, 2018 at 11:53 AM, Reuti 
> wrote:
> >
> > > Am 28.07.2018 um 03:00 schrieb Derrick Lin :
> > >
> > > Thanks Reuti,
> > >
> > > I know little about group ID created by SGE, and also pretty much
> confused with the Linux group ID.
> >
> > Yes, SGE assigns a conventional group ID to each job to track the CPU
> and memory consumption. This group ID is in the range you defined in:
> >
> > $ qconf -sconf
> > …
> > gid_range2-20100
> >
> > and this will be unique per node. First approach could be either `sed`:
> >
> > $ id
> > uid=25000(reuti) gid=25000(ourgroup) groups=25000(ourgroup),10(
> wheel),1000(operator),20052,24000(common),26000(anothergroup)
> > $ id | sed -e "s/.*),\([0-9]*\),.*/\1/"
> > 20052
> >
> > or:
> >
> > ADD_GRP_ID=$(< $SGE_JOB_SPOOL_DIR/addgrpid)
> > echo $ADD_GRP_ID
> >
> > -- Reuti
> >
> >
> > > I assume that "ïd" is called inside the prolog script, typically what
> the output looks like?
> > >
> > > Cheers,
> > >
> > > On Fri, Jul 27, 2018 at 4:12 PM, Reuti 
> wrote:
> > >
> > > Am 27.07.2018 um 03:14 schrieb Derrick Lin:
> > >
> > > > We are using $JOB_ID as xfs_projid at the moment, but this approach
> introduces problem to array jobs whose tasks have the same $JOB_ID (with
> different $TASK_ID).
> > > >
> > > > Also it is possible that tasks from two different array jobs run on
> the same node contain the same $TASK_ID, thus the uniqueness of the
> $TASK_ID on the same host cannot be maintained.
> > >
> > > So the number you are looking for needs to be unique per node only?
> > >
> > > What about using the additional group ID then which SGE creates – this
> will be unique per node.
> > >
> > > This can be found in the `id` command's output or in location of the
> spool directory for the execd_spool_dir in ${HOSTNAME}/active_jobs/${JOB_
> ID}.${TASK_ID}/addgrpid
> > >
> > > -- Reuti
> > >
> > >
> > > > That's why I am trying to implement the xfs_projid to be independent
> from SGE.
> > > >
> > > >
> > > >
> > > > On Thu, Jul 26, 2018 at 9:27 PM, Reuti 
> wrote:
> > > > Hi,
> > > >
> > > > > Am 26.07.2018 um 06:01 schrieb Derrick Lin :
> > > > >
> > > > > Hi all,
> > > > >
> > > > > I am working on a prolog script which setup xfs quota on disk
> space per job basis.
> > > > >
> > > > > For setting up xfs quota in sub directory, I need to provide
> project ID.
> > > > >
> > > > > Here is how I did for generating project ID:
> > > > >
> > > > > XFS_PROJID_CF="/tmp/xfs_projid_counter"
> > > > >
> > > > > echo $JOB_ID >> $XFS_PROJID_CF
> > > > > xfs_projid=$(wc -l < $XFS_PROJID_CF)
> > > >
> > > > The xfs_projid is then the number of lines in the file? Why not
> using $JOB_ID directly? Is there a limit in max. project ID and the $JOB_ID
> might be larger?
> > > >
> > > > -- Reuti
> > > >
> > > >
> > > > > My test shows, when there are multiple jobs start on the same exec
> host at the same time, the prolog script is executed almost the same time,
> results multiple jobs share the same xfs_projid, which is no good.
> > > > >
> > > > > I am wondering if I can configure the scheduler to start the jobs
> in a sequential way (probably has a interval in between).
> > > > >
> > > > >
> > > > > Cheers,
> > > > > Derrick
> > > > > ___
> > > > > users mailing list
> > > > > users@gridengine.org
> > > > > https://gridengine.org/mailman/listinfo/users
> > > >
> > > >
> > >
> > >
> >
> >
>
>

Re: [gridengine users] Start jobs on exec host in sequential order

2018-07-30 Thread Reuti

> Am 30.07.2018 um 02:31 schrieb Derrick Lin :
> 
> Hi Reuti,
> 
> The approach sounds great.
> 
> But the prolog script seems to be run by root, so this is what I got:
> 
> XFS_PROJID:uid=0(root) gid=0(root) groups=0(root),396(sfcb)

This is quite unusual. Do you run the prolog as root by intention? I assume so 
to set the limits:

$ qconf -sq my.q
…prolog/some/script

Do you have here "root:" to change the user (in the global `qconf -sconf`) 
under which it is run? Please note that this my open some root doors, depending 
on environment variable setting. I have here "sgeadmin:" for some special 
handling and use:

sgeadmin@/usr/sge/cluster/busybox env -u LD_LIBRARY_PATH -u LD_PRELOAD -u IFS 
/usr/sge/cluster/context.sh

Nevertheless: the second approach to get the additional group ID from the job's 
spool area should work.

-- Reuti


> 
> Maybe I am still missing something or prolog script is the wrong place for 
> getting the group ID generated by SGE?
> 
> Cheers,
> D
> 
> On Sat, Jul 28, 2018 at 11:53 AM, Reuti  wrote:
> 
> > Am 28.07.2018 um 03:00 schrieb Derrick Lin :
> > 
> > Thanks Reuti,
> > 
> > I know little about group ID created by SGE, and also pretty much confused 
> > with the Linux group ID.
> 
> Yes, SGE assigns a conventional group ID to each job to track the CPU and 
> memory consumption. This group ID is in the range you defined in:
> 
> $ qconf -sconf
> …
> gid_range2-20100
> 
> and this will be unique per node. First approach could be either `sed`:
> 
> $ id
> uid=25000(reuti) gid=25000(ourgroup) 
> groups=25000(ourgroup),10(wheel),1000(operator),20052,24000(common),26000(anothergroup)
> $ id | sed -e "s/.*),\([0-9]*\),.*/\1/"
> 20052
> 
> or:
> 
> ADD_GRP_ID=$(< $SGE_JOB_SPOOL_DIR/addgrpid)
> echo $ADD_GRP_ID
> 
> -- Reuti
> 
> 
> > I assume that "ïd" is called inside the prolog script, typically what the 
> > output looks like?
> > 
> > Cheers,
> > 
> > On Fri, Jul 27, 2018 at 4:12 PM, Reuti  wrote:
> > 
> > Am 27.07.2018 um 03:14 schrieb Derrick Lin:
> > 
> > > We are using $JOB_ID as xfs_projid at the moment, but this approach 
> > > introduces problem to array jobs whose tasks have the same $JOB_ID (with 
> > > different $TASK_ID).
> > > 
> > > Also it is possible that tasks from two different array jobs run on the 
> > > same node contain the same $TASK_ID, thus the uniqueness of the $TASK_ID 
> > > on the same host cannot be maintained.
> > 
> > So the number you are looking for needs to be unique per node only?
> > 
> > What about using the additional group ID then which SGE creates – this will 
> > be unique per node.
> > 
> > This can be found in the `id` command's output or in location of the spool 
> > directory for the execd_spool_dir in 
> > ${HOSTNAME}/active_jobs/${JOB_ID}.${TASK_ID}/addgrpid
> > 
> > -- Reuti
> > 
> > 
> > > That's why I am trying to implement the xfs_projid to be independent from 
> > > SGE.
> > > 
> > > 
> > > 
> > > On Thu, Jul 26, 2018 at 9:27 PM, Reuti  wrote:
> > > Hi,
> > > 
> > > > Am 26.07.2018 um 06:01 schrieb Derrick Lin :
> > > > 
> > > > Hi all,
> > > > 
> > > > I am working on a prolog script which setup xfs quota on disk space per 
> > > > job basis.
> > > > 
> > > > For setting up xfs quota in sub directory, I need to provide project ID.
> > > > 
> > > > Here is how I did for generating project ID:
> > > > 
> > > > XFS_PROJID_CF="/tmp/xfs_projid_counter"
> > > > 
> > > > echo $JOB_ID >> $XFS_PROJID_CF
> > > > xfs_projid=$(wc -l < $XFS_PROJID_CF)
> > > 
> > > The xfs_projid is then the number of lines in the file? Why not using 
> > > $JOB_ID directly? Is there a limit in max. project ID and the $JOB_ID 
> > > might be larger?
> > > 
> > > -- Reuti
> > > 
> > > 
> > > > My test shows, when there are multiple jobs start on the same exec host 
> > > > at the same time, the prolog script is executed almost the same time, 
> > > > results multiple jobs share the same xfs_projid, which is no good.
> > > > 
> > > > I am wondering if I can configure the scheduler to start the jobs in a 
> > > > sequential way (probably has a interval in between).
> > > > 
> > > > 
> > > > Cheers,
> > > > Derrick
> > > > ___
> > > > users mailing list
> > > > users@gridengine.org
> > > > https://gridengine.org/mailman/listinfo/users
> > > 
> > > 
> > 
> > 
> 
> 


___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Start jobs on exec host in sequential order

2018-07-29 Thread Derrick Lin
Hi Reuti,

The approach sounds great.

But the prolog script seems to be run by root, so this is what I got:

XFS_PROJID:uid=0(root) gid=0(root) groups=0(root),396(sfcb)

Maybe I am still missing something or prolog script is the wrong place for
getting the group ID generated by SGE?

Cheers,
D

On Sat, Jul 28, 2018 at 11:53 AM, Reuti  wrote:

>
> > Am 28.07.2018 um 03:00 schrieb Derrick Lin :
> >
> > Thanks Reuti,
> >
> > I know little about group ID created by SGE, and also pretty much
> confused with the Linux group ID.
>
> Yes, SGE assigns a conventional group ID to each job to track the CPU and
> memory consumption. This group ID is in the range you defined in:
>
> $ qconf -sconf
> …
> gid_range2-20100
>
> and this will be unique per node. First approach could be either `sed`:
>
> $ id
> uid=25000(reuti) gid=25000(ourgroup) groups=25000(ourgroup),10(
> wheel),1000(operator),20052,24000(common),26000(anothergroup)
> $ id | sed -e "s/.*),\([0-9]*\),.*/\1/"
> 20052
>
> or:
>
> ADD_GRP_ID=$(< $SGE_JOB_SPOOL_DIR/addgrpid)
> echo $ADD_GRP_ID
>
> -- Reuti
>
>
> > I assume that "ïd" is called inside the prolog script, typically what
> the output looks like?
> >
> > Cheers,
> >
> > On Fri, Jul 27, 2018 at 4:12 PM, Reuti 
> wrote:
> >
> > Am 27.07.2018 um 03:14 schrieb Derrick Lin:
> >
> > > We are using $JOB_ID as xfs_projid at the moment, but this approach
> introduces problem to array jobs whose tasks have the same $JOB_ID (with
> different $TASK_ID).
> > >
> > > Also it is possible that tasks from two different array jobs run on
> the same node contain the same $TASK_ID, thus the uniqueness of the
> $TASK_ID on the same host cannot be maintained.
> >
> > So the number you are looking for needs to be unique per node only?
> >
> > What about using the additional group ID then which SGE creates – this
> will be unique per node.
> >
> > This can be found in the `id` command's output or in location of the
> spool directory for the execd_spool_dir in ${HOSTNAME}/active_jobs/${JOB_
> ID}.${TASK_ID}/addgrpid
> >
> > -- Reuti
> >
> >
> > > That's why I am trying to implement the xfs_projid to be independent
> from SGE.
> > >
> > >
> > >
> > > On Thu, Jul 26, 2018 at 9:27 PM, Reuti 
> wrote:
> > > Hi,
> > >
> > > > Am 26.07.2018 um 06:01 schrieb Derrick Lin :
> > > >
> > > > Hi all,
> > > >
> > > > I am working on a prolog script which setup xfs quota on disk space
> per job basis.
> > > >
> > > > For setting up xfs quota in sub directory, I need to provide project
> ID.
> > > >
> > > > Here is how I did for generating project ID:
> > > >
> > > > XFS_PROJID_CF="/tmp/xfs_projid_counter"
> > > >
> > > > echo $JOB_ID >> $XFS_PROJID_CF
> > > > xfs_projid=$(wc -l < $XFS_PROJID_CF)
> > >
> > > The xfs_projid is then the number of lines in the file? Why not using
> $JOB_ID directly? Is there a limit in max. project ID and the $JOB_ID might
> be larger?
> > >
> > > -- Reuti
> > >
> > >
> > > > My test shows, when there are multiple jobs start on the same exec
> host at the same time, the prolog script is executed almost the same time,
> results multiple jobs share the same xfs_projid, which is no good.
> > > >
> > > > I am wondering if I can configure the scheduler to start the jobs in
> a sequential way (probably has a interval in between).
> > > >
> > > >
> > > > Cheers,
> > > > Derrick
> > > > ___
> > > > users mailing list
> > > > users@gridengine.org
> > > > https://gridengine.org/mailman/listinfo/users
> > >
> > >
> >
> >
>
>
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Start jobs on exec host in sequential order

2018-07-27 Thread Reuti

> Am 28.07.2018 um 03:00 schrieb Derrick Lin :
> 
> Thanks Reuti,
> 
> I know little about group ID created by SGE, and also pretty much confused 
> with the Linux group ID.

Yes, SGE assigns a conventional group ID to each job to track the CPU and 
memory consumption. This group ID is in the range you defined in:

$ qconf -sconf
…
gid_range2-20100

and this will be unique per node. First approach could be either `sed`:

$ id
uid=25000(reuti) gid=25000(ourgroup) 
groups=25000(ourgroup),10(wheel),1000(operator),20052,24000(common),26000(anothergroup)
$ id | sed -e "s/.*),\([0-9]*\),.*/\1/"
20052

or:

ADD_GRP_ID=$(< $SGE_JOB_SPOOL_DIR/addgrpid)
echo $ADD_GRP_ID

-- Reuti


> I assume that "ïd" is called inside the prolog script, typically what the 
> output looks like?
> 
> Cheers,
> 
> On Fri, Jul 27, 2018 at 4:12 PM, Reuti  wrote:
> 
> Am 27.07.2018 um 03:14 schrieb Derrick Lin:
> 
> > We are using $JOB_ID as xfs_projid at the moment, but this approach 
> > introduces problem to array jobs whose tasks have the same $JOB_ID (with 
> > different $TASK_ID).
> > 
> > Also it is possible that tasks from two different array jobs run on the 
> > same node contain the same $TASK_ID, thus the uniqueness of the $TASK_ID on 
> > the same host cannot be maintained.
> 
> So the number you are looking for needs to be unique per node only?
> 
> What about using the additional group ID then which SGE creates – this will 
> be unique per node.
> 
> This can be found in the `id` command's output or in location of the spool 
> directory for the execd_spool_dir in 
> ${HOSTNAME}/active_jobs/${JOB_ID}.${TASK_ID}/addgrpid
> 
> -- Reuti
> 
> 
> > That's why I am trying to implement the xfs_projid to be independent from 
> > SGE.
> > 
> > 
> > 
> > On Thu, Jul 26, 2018 at 9:27 PM, Reuti  wrote:
> > Hi,
> > 
> > > Am 26.07.2018 um 06:01 schrieb Derrick Lin :
> > > 
> > > Hi all,
> > > 
> > > I am working on a prolog script which setup xfs quota on disk space per 
> > > job basis.
> > > 
> > > For setting up xfs quota in sub directory, I need to provide project ID.
> > > 
> > > Here is how I did for generating project ID:
> > > 
> > > XFS_PROJID_CF="/tmp/xfs_projid_counter"
> > > 
> > > echo $JOB_ID >> $XFS_PROJID_CF
> > > xfs_projid=$(wc -l < $XFS_PROJID_CF)
> > 
> > The xfs_projid is then the number of lines in the file? Why not using 
> > $JOB_ID directly? Is there a limit in max. project ID and the $JOB_ID might 
> > be larger?
> > 
> > -- Reuti
> > 
> > 
> > > My test shows, when there are multiple jobs start on the same exec host 
> > > at the same time, the prolog script is executed almost the same time, 
> > > results multiple jobs share the same xfs_projid, which is no good.
> > > 
> > > I am wondering if I can configure the scheduler to start the jobs in a 
> > > sequential way (probably has a interval in between).
> > > 
> > > 
> > > Cheers,
> > > Derrick
> > > ___
> > > users mailing list
> > > users@gridengine.org
> > > https://gridengine.org/mailman/listinfo/users
> > 
> > 
> 
> 


___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Start jobs on exec host in sequential order

2018-07-27 Thread Derrick Lin
Thanks Reuti,

I know little about group ID created by SGE, and also pretty much confused
with the Linux group ID.

I assume that "ïd" is called inside the prolog script, typically what the
output looks like?

Cheers,

On Fri, Jul 27, 2018 at 4:12 PM, Reuti  wrote:

>
> Am 27.07.2018 um 03:14 schrieb Derrick Lin:
>
> > We are using $JOB_ID as xfs_projid at the moment, but this approach
> introduces problem to array jobs whose tasks have the same $JOB_ID (with
> different $TASK_ID).
> >
> > Also it is possible that tasks from two different array jobs run on the
> same node contain the same $TASK_ID, thus the uniqueness of the $TASK_ID on
> the same host cannot be maintained.
>
> So the number you are looking for needs to be unique per node only?
>
> What about using the additional group ID then which SGE creates – this
> will be unique per node.
>
> This can be found in the `id` command's output or in location of the spool
> directory for the execd_spool_dir in ${HOSTNAME}/active_jobs/${JOB_
> ID}.${TASK_ID}/addgrpid
>
> -- Reuti
>
>
> > That's why I am trying to implement the xfs_projid to be independent
> from SGE.
> >
> >
> >
> > On Thu, Jul 26, 2018 at 9:27 PM, Reuti 
> wrote:
> > Hi,
> >
> > > Am 26.07.2018 um 06:01 schrieb Derrick Lin :
> > >
> > > Hi all,
> > >
> > > I am working on a prolog script which setup xfs quota on disk space
> per job basis.
> > >
> > > For setting up xfs quota in sub directory, I need to provide project
> ID.
> > >
> > > Here is how I did for generating project ID:
> > >
> > > XFS_PROJID_CF="/tmp/xfs_projid_counter"
> > >
> > > echo $JOB_ID >> $XFS_PROJID_CF
> > > xfs_projid=$(wc -l < $XFS_PROJID_CF)
> >
> > The xfs_projid is then the number of lines in the file? Why not using
> $JOB_ID directly? Is there a limit in max. project ID and the $JOB_ID might
> be larger?
> >
> > -- Reuti
> >
> >
> > > My test shows, when there are multiple jobs start on the same exec
> host at the same time, the prolog script is executed almost the same time,
> results multiple jobs share the same xfs_projid, which is no good.
> > >
> > > I am wondering if I can configure the scheduler to start the jobs in a
> sequential way (probably has a interval in between).
> > >
> > >
> > > Cheers,
> > > Derrick
> > > ___
> > > users mailing list
> > > users@gridengine.org
> > > https://gridengine.org/mailman/listinfo/users
> >
> >
>
>
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Start jobs on exec host in sequential order

2018-07-27 Thread Reuti


Am 27.07.2018 um 03:14 schrieb Derrick Lin:

> We are using $JOB_ID as xfs_projid at the moment, but this approach 
> introduces problem to array jobs whose tasks have the same $JOB_ID (with 
> different $TASK_ID).
> 
> Also it is possible that tasks from two different array jobs run on the same 
> node contain the same $TASK_ID, thus the uniqueness of the $TASK_ID on the 
> same host cannot be maintained.

So the number you are looking for needs to be unique per node only?

What about using the additional group ID then which SGE creates – this will be 
unique per node.

This can be found in the `id` command's output or in location of the spool 
directory for the execd_spool_dir in 
${HOSTNAME}/active_jobs/${JOB_ID}.${TASK_ID}/addgrpid

-- Reuti


> That's why I am trying to implement the xfs_projid to be independent from SGE.
> 
> 
> 
> On Thu, Jul 26, 2018 at 9:27 PM, Reuti  wrote:
> Hi,
> 
> > Am 26.07.2018 um 06:01 schrieb Derrick Lin :
> > 
> > Hi all,
> > 
> > I am working on a prolog script which setup xfs quota on disk space per job 
> > basis.
> > 
> > For setting up xfs quota in sub directory, I need to provide project ID.
> > 
> > Here is how I did for generating project ID:
> > 
> > XFS_PROJID_CF="/tmp/xfs_projid_counter"
> > 
> > echo $JOB_ID >> $XFS_PROJID_CF
> > xfs_projid=$(wc -l < $XFS_PROJID_CF)
> 
> The xfs_projid is then the number of lines in the file? Why not using $JOB_ID 
> directly? Is there a limit in max. project ID and the $JOB_ID might be larger?
> 
> -- Reuti
> 
> 
> > My test shows, when there are multiple jobs start on the same exec host at 
> > the same time, the prolog script is executed almost the same time, results 
> > multiple jobs share the same xfs_projid, which is no good.
> > 
> > I am wondering if I can configure the scheduler to start the jobs in a 
> > sequential way (probably has a interval in between).
> > 
> > 
> > Cheers,
> > Derrick
> > ___
> > users mailing list
> > users@gridengine.org
> > https://gridengine.org/mailman/listinfo/users
> 
> 


___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Start jobs on exec host in sequential order

2018-07-26 Thread Derrick Lin
We are using $JOB_ID as xfs_projid at the moment, but this approach
introduces problem to array jobs whose tasks have the same $JOB_ID (with
different $TASK_ID).

Also it is possible that tasks from two different array jobs run on the
same node contain the same $TASK_ID, thus the uniqueness of the $TASK_ID on
the same host cannot be maintained.

That's why I am trying to implement the xfs_projid to be independent from
SGE.



On Thu, Jul 26, 2018 at 9:27 PM, Reuti  wrote:

> Hi,
>
> > Am 26.07.2018 um 06:01 schrieb Derrick Lin :
> >
> > Hi all,
> >
> > I am working on a prolog script which setup xfs quota on disk space per
> job basis.
> >
> > For setting up xfs quota in sub directory, I need to provide project ID.
> >
> > Here is how I did for generating project ID:
> >
> > XFS_PROJID_CF="/tmp/xfs_projid_counter"
> >
> > echo $JOB_ID >> $XFS_PROJID_CF
> > xfs_projid=$(wc -l < $XFS_PROJID_CF)
>
> The xfs_projid is then the number of lines in the file? Why not using
> $JOB_ID directly? Is there a limit in max. project ID and the $JOB_ID might
> be larger?
>
> -- Reuti
>
>
> > My test shows, when there are multiple jobs start on the same exec host
> at the same time, the prolog script is executed almost the same time,
> results multiple jobs share the same xfs_projid, which is no good.
> >
> > I am wondering if I can configure the scheduler to start the jobs in a
> sequential way (probably has a interval in between).
> >
> >
> > Cheers,
> > Derrick
> > ___
> > users mailing list
> > users@gridengine.org
> > https://gridengine.org/mailman/listinfo/users
>
>
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Start jobs on exec host in sequential order

2018-07-26 Thread Reuti
Hi,

> Am 26.07.2018 um 06:01 schrieb Derrick Lin :
> 
> Hi all,
> 
> I am working on a prolog script which setup xfs quota on disk space per job 
> basis.
> 
> For setting up xfs quota in sub directory, I need to provide project ID.
> 
> Here is how I did for generating project ID:
> 
> XFS_PROJID_CF="/tmp/xfs_projid_counter"
> 
> echo $JOB_ID >> $XFS_PROJID_CF
> xfs_projid=$(wc -l < $XFS_PROJID_CF)

The xfs_projid is then the number of lines in the file? Why not using $JOB_ID 
directly? Is there a limit in max. project ID and the $JOB_ID might be larger?

-- Reuti


> My test shows, when there are multiple jobs start on the same exec host at 
> the same time, the prolog script is executed almost the same time, results 
> multiple jobs share the same xfs_projid, which is no good.
> 
> I am wondering if I can configure the scheduler to start the jobs in a 
> sequential way (probably has a interval in between).
> 
> 
> Cheers,
> Derrick
> ___
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users


___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Start jobs on exec host in sequential order

2018-07-26 Thread Christopher Heiny
On Thu, 2018-07-26 at 14:01 +1000, Derrick Lin wrote:
> Hi all,
> 
> I am working on a prolog script which setup xfs quota on disk space
> per job basis.
> 
> For setting up xfs quota in sub directory, I need to provide project
> ID.
> 
> Here is how I did for generating project ID:
> 
> XFS_PROJID_CF="/tmp/xfs_projid_counter"
> 
> echo $JOB_ID >> $XFS_PROJID_CF
> xfs_projid=$(wc -l < $XFS_PROJID_CF)
> 
> My test shows, when there are multiple jobs start on the same exec
> host at the same time, the prolog script is executed almost the same
> time, results multiple jobs share the same xfs_projid, which is no
> good.
> 
> I am wondering if I can configure the scheduler to start the jobs in
> a sequential way (probably has a interval in between).

Hi Derrick,

Assuming you're using Bash (based on your snippets above), it's
probably better to use flock to lock/unlock your project id counter
file and ensure exclusive access.  There's some basic examples in the
man page, and a Gist with very useful functions here:

    https://gist.github.com/przemoc/571091

Cheers,
Chris

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users