Re: [gridengine users] Start jobs on exec host in sequential order
Thanks guys, I will take a look at each option. On Mon, Aug 6, 2018 at 9:52 PM, William Hay wrote: > On Wed, Aug 01, 2018 at 11:06:19AM +1000, Derrick Lin wrote: > >HI Reuti, > >The prolog script is set to run by root indeed. The xfs quota requires > >root privilege. > >I also tried the 2nd approach but it seems that the addgrpid file has > not > >been created when the prolog script executed: > >/opt/gridengine/default/common/prolog_exec.sh: line 21: > >/opt/gridengine/default/spool/omega-1-27/active_jobs/ > 1187086.1/addgrpid: > You can also extract the groupid from the config file which should be > present on the master node > when the prolog is run. > > XFS_PROJID="$(awk -F= '/^add_grp_id=/{print $2}' > <${SGE_JOB_SPOOL_DIR}/config)" > > NB: If you want this on the slave node of a multi-node job and you allow > multi-node jobs to share nodes (we don't) then you will need to extract > a project id on each slave node. Probably the best place to do this > would be in a wrapper around rsh_daemon. However you'll need some sort of > locking in case a program launches multiple slave tasks(most codes > just launch one slave task per node which then forks) or launches > a slave task on the master node. > > William > ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Start jobs on exec host in sequential order
On Wed, Aug 01, 2018 at 11:06:19AM +1000, Derrick Lin wrote: >HI Reuti, >The prolog script is set to run by root indeed. The xfs quota requires >root privilege. >I also tried the 2nd approach but it seems that the addgrpid file has not >been created when the prolog script executed: >/opt/gridengine/default/common/prolog_exec.sh: line 21: >/opt/gridengine/default/spool/omega-1-27/active_jobs/1187086.1/addgrpid: You can also extract the groupid from the config file which should be present on the master node when the prolog is run. XFS_PROJID="$(awk -F= '/^add_grp_id=/{print $2}' <${SGE_JOB_SPOOL_DIR}/config)" NB: If you want this on the slave node of a multi-node job and you allow multi-node jobs to share nodes (we don't) then you will need to extract a project id on each slave node. Probably the best place to do this would be in a wrapper around rsh_daemon. However you'll need some sort of locking in case a program launches multiple slave tasks(most codes just launch one slave task per node which then forks) or launches a slave task on the master node. William signature.asc Description: PGP signature ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Start jobs on exec host in sequential order
> Am 01.08.2018 um 03:06 schrieb Derrick Lin : > > HI Reuti, > > The prolog script is set to run by root indeed. The xfs quota requires root > privilege. > > I also tried the 2nd approach but it seems that the addgrpid file has not > been created when the prolog script executed: > > /opt/gridengine/default/common/prolog_exec.sh: line 21: > /opt/gridengine/default/spool/omega-1-27/active_jobs/1187086.1/addgrpid: No > such file or directory I must admit: I wasn't aware of this. Only during the execution of the job it's essentially available. But this has the side effect, that anything done in a prolog or epilog when it runs (especially under the user's account) can't be traced or accounted (or a `qdel` might fail). This is somewhat surprising. Do you set the quota with a shell script or a binary? Another idea could be to use a starter_method in the queue configuration. Then the addgrpid exists (I checked it), and you could call a binary with a SUID to root therein (as the starter_method will eventually call the user's script and will run under his account only). The SUID won't work for scripts, hence the final call to binary with set SUID. #!/bin/sh export ADDGRPID=(< $SGE_JOB_SPOOL_DIR/addgrpid) call some script/binary to set the quota exec "${@}" -- Reuti > Maybe some of my scheduler conf is not correct? > > Regards, > Derrick > > On Mon, Jul 30, 2018 at 7:35 PM, Reuti wrote: > > > Am 30.07.2018 um 02:31 schrieb Derrick Lin : > > > > Hi Reuti, > > > > The approach sounds great. > > > > But the prolog script seems to be run by root, so this is what I got: > > > > XFS_PROJID:uid=0(root) gid=0(root) groups=0(root),396(sfcb) > > This is quite unusual. Do you run the prolog as root by intention? I assume > so to set the limits: > > $ qconf -sq my.q > …prolog/some/script > > Do you have here "root:" to change the user (in the global `qconf -sconf`) > under which it is run? Please note that this my open some root doors, > depending on environment variable setting. I have here "sgeadmin:" for some > special handling and use: > > sgeadmin@/usr/sge/cluster/busybox env -u LD_LIBRARY_PATH -u LD_PRELOAD -u IFS > /usr/sge/cluster/context.sh > > Nevertheless: the second approach to get the additional group ID from the > job's spool area should work. > > -- Reuti > > > > > > Maybe I am still missing something or prolog script is the wrong place for > > getting the group ID generated by SGE? > > > > Cheers, > > D > > > > On Sat, Jul 28, 2018 at 11:53 AM, Reuti wrote: > > > > > Am 28.07.2018 um 03:00 schrieb Derrick Lin : > > > > > > Thanks Reuti, > > > > > > I know little about group ID created by SGE, and also pretty much > > > confused with the Linux group ID. > > > > Yes, SGE assigns a conventional group ID to each job to track the CPU and > > memory consumption. This group ID is in the range you defined in: > > > > $ qconf -sconf > > … > > gid_range2-20100 > > > > and this will be unique per node. First approach could be either `sed`: > > > > $ id > > uid=25000(reuti) gid=25000(ourgroup) > > groups=25000(ourgroup),10(wheel),1000(operator),20052,24000(common),26000(anothergroup) > > $ id | sed -e "s/.*),\([0-9]*\),.*/\1/" > > 20052 > > > > or: > > > > ADD_GRP_ID=$(< $SGE_JOB_SPOOL_DIR/addgrpid) > > echo $ADD_GRP_ID > > > > -- Reuti > > > > > > > I assume that "ïd" is called inside the prolog script, typically what the > > > output looks like? > > > > > > Cheers, > > > > > > On Fri, Jul 27, 2018 at 4:12 PM, Reuti wrote: > > > > > > Am 27.07.2018 um 03:14 schrieb Derrick Lin: > > > > > > > We are using $JOB_ID as xfs_projid at the moment, but this approach > > > > introduces problem to array jobs whose tasks have the same $JOB_ID > > > > (with different $TASK_ID). > > > > > > > > Also it is possible that tasks from two different array jobs run on the > > > > same node contain the same $TASK_ID, thus the uniqueness of the > > > > $TASK_ID on the same host cannot be maintained. > > > > > > So the number you are looking for needs to be unique per node only? > > > > > > What about using the additional group ID then which SGE creates – this > > > will be unique per node. > > > > > > This can be found in the `id` command's output or in location of the > > > spool directory for the execd_spool_dir in > > > ${HOSTNAME}/active_jobs/${JOB_ID}.${TASK_ID}/addgrpid > > > > > > -- Reuti > > > > > > > > > > That's why I am trying to implement the xfs_projid to be independent > > > > from SGE. > > > > > > > > > > > > > > > > On Thu, Jul 26, 2018 at 9:27 PM, Reuti > > > > wrote: > > > > Hi, > > > > > > > > > Am 26.07.2018 um 06:01 schrieb Derrick Lin : > > > > > > > > > > Hi all, > > > > > > > > > > I am working on a prolog script which setup xfs quota on disk space > > > > > per job basis. > > > > > > > > > > For setting up xfs quota in sub directory, I need to provide project > > > > > ID. > > > >
Re: [gridengine users] Start jobs on exec host in sequential order
HI Reuti, The prolog script is set to run by root indeed. The xfs quota requires root privilege. I also tried the 2nd approach but it seems that the addgrpid file has not been created when the prolog script executed: /opt/gridengine/default/common/prolog_exec.sh: line 21: /opt/gridengine/default/spool/omega-1-27/active_jobs/1187086.1/addgrpid: No such file or directory Maybe some of my scheduler conf is not correct? Regards, Derrick On Mon, Jul 30, 2018 at 7:35 PM, Reuti wrote: > > > Am 30.07.2018 um 02:31 schrieb Derrick Lin : > > > > Hi Reuti, > > > > The approach sounds great. > > > > But the prolog script seems to be run by root, so this is what I got: > > > > XFS_PROJID:uid=0(root) gid=0(root) groups=0(root),396(sfcb) > > This is quite unusual. Do you run the prolog as root by intention? I > assume so to set the limits: > > $ qconf -sq my.q > …prolog/some/script > > Do you have here "root:" to change the user (in the global `qconf -sconf`) > under which it is run? Please note that this my open some root doors, > depending on environment variable setting. I have here "sgeadmin:" for some > special handling and use: > > sgeadmin@/usr/sge/cluster/busybox env -u LD_LIBRARY_PATH -u LD_PRELOAD -u > IFS /usr/sge/cluster/context.sh > > Nevertheless: the second approach to get the additional group ID from the > job's spool area should work. > > -- Reuti > > > > > > Maybe I am still missing something or prolog script is the wrong place > for getting the group ID generated by SGE? > > > > Cheers, > > D > > > > On Sat, Jul 28, 2018 at 11:53 AM, Reuti > wrote: > > > > > Am 28.07.2018 um 03:00 schrieb Derrick Lin : > > > > > > Thanks Reuti, > > > > > > I know little about group ID created by SGE, and also pretty much > confused with the Linux group ID. > > > > Yes, SGE assigns a conventional group ID to each job to track the CPU > and memory consumption. This group ID is in the range you defined in: > > > > $ qconf -sconf > > … > > gid_range2-20100 > > > > and this will be unique per node. First approach could be either `sed`: > > > > $ id > > uid=25000(reuti) gid=25000(ourgroup) groups=25000(ourgroup),10( > wheel),1000(operator),20052,24000(common),26000(anothergroup) > > $ id | sed -e "s/.*),\([0-9]*\),.*/\1/" > > 20052 > > > > or: > > > > ADD_GRP_ID=$(< $SGE_JOB_SPOOL_DIR/addgrpid) > > echo $ADD_GRP_ID > > > > -- Reuti > > > > > > > I assume that "ïd" is called inside the prolog script, typically what > the output looks like? > > > > > > Cheers, > > > > > > On Fri, Jul 27, 2018 at 4:12 PM, Reuti > wrote: > > > > > > Am 27.07.2018 um 03:14 schrieb Derrick Lin: > > > > > > > We are using $JOB_ID as xfs_projid at the moment, but this approach > introduces problem to array jobs whose tasks have the same $JOB_ID (with > different $TASK_ID). > > > > > > > > Also it is possible that tasks from two different array jobs run on > the same node contain the same $TASK_ID, thus the uniqueness of the > $TASK_ID on the same host cannot be maintained. > > > > > > So the number you are looking for needs to be unique per node only? > > > > > > What about using the additional group ID then which SGE creates – this > will be unique per node. > > > > > > This can be found in the `id` command's output or in location of the > spool directory for the execd_spool_dir in ${HOSTNAME}/active_jobs/${JOB_ > ID}.${TASK_ID}/addgrpid > > > > > > -- Reuti > > > > > > > > > > That's why I am trying to implement the xfs_projid to be independent > from SGE. > > > > > > > > > > > > > > > > On Thu, Jul 26, 2018 at 9:27 PM, Reuti > wrote: > > > > Hi, > > > > > > > > > Am 26.07.2018 um 06:01 schrieb Derrick Lin : > > > > > > > > > > Hi all, > > > > > > > > > > I am working on a prolog script which setup xfs quota on disk > space per job basis. > > > > > > > > > > For setting up xfs quota in sub directory, I need to provide > project ID. > > > > > > > > > > Here is how I did for generating project ID: > > > > > > > > > > XFS_PROJID_CF="/tmp/xfs_projid_counter" > > > > > > > > > > echo $JOB_ID >> $XFS_PROJID_CF > > > > > xfs_projid=$(wc -l < $XFS_PROJID_CF) > > > > > > > > The xfs_projid is then the number of lines in the file? Why not > using $JOB_ID directly? Is there a limit in max. project ID and the $JOB_ID > might be larger? > > > > > > > > -- Reuti > > > > > > > > > > > > > My test shows, when there are multiple jobs start on the same exec > host at the same time, the prolog script is executed almost the same time, > results multiple jobs share the same xfs_projid, which is no good. > > > > > > > > > > I am wondering if I can configure the scheduler to start the jobs > in a sequential way (probably has a interval in between). > > > > > > > > > > > > > > > Cheers, > > > > > Derrick > > > > > ___ > > > > > users mailing list > > > > > users@gridengine.org > > > > > https://gridengine.org/mailman/listinfo/users > > > > > > > > > > > > > > > > > > > >
Re: [gridengine users] Start jobs on exec host in sequential order
> Am 30.07.2018 um 02:31 schrieb Derrick Lin : > > Hi Reuti, > > The approach sounds great. > > But the prolog script seems to be run by root, so this is what I got: > > XFS_PROJID:uid=0(root) gid=0(root) groups=0(root),396(sfcb) This is quite unusual. Do you run the prolog as root by intention? I assume so to set the limits: $ qconf -sq my.q …prolog/some/script Do you have here "root:" to change the user (in the global `qconf -sconf`) under which it is run? Please note that this my open some root doors, depending on environment variable setting. I have here "sgeadmin:" for some special handling and use: sgeadmin@/usr/sge/cluster/busybox env -u LD_LIBRARY_PATH -u LD_PRELOAD -u IFS /usr/sge/cluster/context.sh Nevertheless: the second approach to get the additional group ID from the job's spool area should work. -- Reuti > > Maybe I am still missing something or prolog script is the wrong place for > getting the group ID generated by SGE? > > Cheers, > D > > On Sat, Jul 28, 2018 at 11:53 AM, Reuti wrote: > > > Am 28.07.2018 um 03:00 schrieb Derrick Lin : > > > > Thanks Reuti, > > > > I know little about group ID created by SGE, and also pretty much confused > > with the Linux group ID. > > Yes, SGE assigns a conventional group ID to each job to track the CPU and > memory consumption. This group ID is in the range you defined in: > > $ qconf -sconf > … > gid_range2-20100 > > and this will be unique per node. First approach could be either `sed`: > > $ id > uid=25000(reuti) gid=25000(ourgroup) > groups=25000(ourgroup),10(wheel),1000(operator),20052,24000(common),26000(anothergroup) > $ id | sed -e "s/.*),\([0-9]*\),.*/\1/" > 20052 > > or: > > ADD_GRP_ID=$(< $SGE_JOB_SPOOL_DIR/addgrpid) > echo $ADD_GRP_ID > > -- Reuti > > > > I assume that "ïd" is called inside the prolog script, typically what the > > output looks like? > > > > Cheers, > > > > On Fri, Jul 27, 2018 at 4:12 PM, Reuti wrote: > > > > Am 27.07.2018 um 03:14 schrieb Derrick Lin: > > > > > We are using $JOB_ID as xfs_projid at the moment, but this approach > > > introduces problem to array jobs whose tasks have the same $JOB_ID (with > > > different $TASK_ID). > > > > > > Also it is possible that tasks from two different array jobs run on the > > > same node contain the same $TASK_ID, thus the uniqueness of the $TASK_ID > > > on the same host cannot be maintained. > > > > So the number you are looking for needs to be unique per node only? > > > > What about using the additional group ID then which SGE creates – this will > > be unique per node. > > > > This can be found in the `id` command's output or in location of the spool > > directory for the execd_spool_dir in > > ${HOSTNAME}/active_jobs/${JOB_ID}.${TASK_ID}/addgrpid > > > > -- Reuti > > > > > > > That's why I am trying to implement the xfs_projid to be independent from > > > SGE. > > > > > > > > > > > > On Thu, Jul 26, 2018 at 9:27 PM, Reuti wrote: > > > Hi, > > > > > > > Am 26.07.2018 um 06:01 schrieb Derrick Lin : > > > > > > > > Hi all, > > > > > > > > I am working on a prolog script which setup xfs quota on disk space per > > > > job basis. > > > > > > > > For setting up xfs quota in sub directory, I need to provide project ID. > > > > > > > > Here is how I did for generating project ID: > > > > > > > > XFS_PROJID_CF="/tmp/xfs_projid_counter" > > > > > > > > echo $JOB_ID >> $XFS_PROJID_CF > > > > xfs_projid=$(wc -l < $XFS_PROJID_CF) > > > > > > The xfs_projid is then the number of lines in the file? Why not using > > > $JOB_ID directly? Is there a limit in max. project ID and the $JOB_ID > > > might be larger? > > > > > > -- Reuti > > > > > > > > > > My test shows, when there are multiple jobs start on the same exec host > > > > at the same time, the prolog script is executed almost the same time, > > > > results multiple jobs share the same xfs_projid, which is no good. > > > > > > > > I am wondering if I can configure the scheduler to start the jobs in a > > > > sequential way (probably has a interval in between). > > > > > > > > > > > > Cheers, > > > > Derrick > > > > ___ > > > > users mailing list > > > > users@gridengine.org > > > > https://gridengine.org/mailman/listinfo/users > > > > > > > > > > > > ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Start jobs on exec host in sequential order
Hi Reuti, The approach sounds great. But the prolog script seems to be run by root, so this is what I got: XFS_PROJID:uid=0(root) gid=0(root) groups=0(root),396(sfcb) Maybe I am still missing something or prolog script is the wrong place for getting the group ID generated by SGE? Cheers, D On Sat, Jul 28, 2018 at 11:53 AM, Reuti wrote: > > > Am 28.07.2018 um 03:00 schrieb Derrick Lin : > > > > Thanks Reuti, > > > > I know little about group ID created by SGE, and also pretty much > confused with the Linux group ID. > > Yes, SGE assigns a conventional group ID to each job to track the CPU and > memory consumption. This group ID is in the range you defined in: > > $ qconf -sconf > … > gid_range2-20100 > > and this will be unique per node. First approach could be either `sed`: > > $ id > uid=25000(reuti) gid=25000(ourgroup) groups=25000(ourgroup),10( > wheel),1000(operator),20052,24000(common),26000(anothergroup) > $ id | sed -e "s/.*),\([0-9]*\),.*/\1/" > 20052 > > or: > > ADD_GRP_ID=$(< $SGE_JOB_SPOOL_DIR/addgrpid) > echo $ADD_GRP_ID > > -- Reuti > > > > I assume that "ïd" is called inside the prolog script, typically what > the output looks like? > > > > Cheers, > > > > On Fri, Jul 27, 2018 at 4:12 PM, Reuti > wrote: > > > > Am 27.07.2018 um 03:14 schrieb Derrick Lin: > > > > > We are using $JOB_ID as xfs_projid at the moment, but this approach > introduces problem to array jobs whose tasks have the same $JOB_ID (with > different $TASK_ID). > > > > > > Also it is possible that tasks from two different array jobs run on > the same node contain the same $TASK_ID, thus the uniqueness of the > $TASK_ID on the same host cannot be maintained. > > > > So the number you are looking for needs to be unique per node only? > > > > What about using the additional group ID then which SGE creates – this > will be unique per node. > > > > This can be found in the `id` command's output or in location of the > spool directory for the execd_spool_dir in ${HOSTNAME}/active_jobs/${JOB_ > ID}.${TASK_ID}/addgrpid > > > > -- Reuti > > > > > > > That's why I am trying to implement the xfs_projid to be independent > from SGE. > > > > > > > > > > > > On Thu, Jul 26, 2018 at 9:27 PM, Reuti > wrote: > > > Hi, > > > > > > > Am 26.07.2018 um 06:01 schrieb Derrick Lin : > > > > > > > > Hi all, > > > > > > > > I am working on a prolog script which setup xfs quota on disk space > per job basis. > > > > > > > > For setting up xfs quota in sub directory, I need to provide project > ID. > > > > > > > > Here is how I did for generating project ID: > > > > > > > > XFS_PROJID_CF="/tmp/xfs_projid_counter" > > > > > > > > echo $JOB_ID >> $XFS_PROJID_CF > > > > xfs_projid=$(wc -l < $XFS_PROJID_CF) > > > > > > The xfs_projid is then the number of lines in the file? Why not using > $JOB_ID directly? Is there a limit in max. project ID and the $JOB_ID might > be larger? > > > > > > -- Reuti > > > > > > > > > > My test shows, when there are multiple jobs start on the same exec > host at the same time, the prolog script is executed almost the same time, > results multiple jobs share the same xfs_projid, which is no good. > > > > > > > > I am wondering if I can configure the scheduler to start the jobs in > a sequential way (probably has a interval in between). > > > > > > > > > > > > Cheers, > > > > Derrick > > > > ___ > > > > users mailing list > > > > users@gridengine.org > > > > https://gridengine.org/mailman/listinfo/users > > > > > > > > > > > > ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Start jobs on exec host in sequential order
> Am 28.07.2018 um 03:00 schrieb Derrick Lin : > > Thanks Reuti, > > I know little about group ID created by SGE, and also pretty much confused > with the Linux group ID. Yes, SGE assigns a conventional group ID to each job to track the CPU and memory consumption. This group ID is in the range you defined in: $ qconf -sconf … gid_range2-20100 and this will be unique per node. First approach could be either `sed`: $ id uid=25000(reuti) gid=25000(ourgroup) groups=25000(ourgroup),10(wheel),1000(operator),20052,24000(common),26000(anothergroup) $ id | sed -e "s/.*),\([0-9]*\),.*/\1/" 20052 or: ADD_GRP_ID=$(< $SGE_JOB_SPOOL_DIR/addgrpid) echo $ADD_GRP_ID -- Reuti > I assume that "ïd" is called inside the prolog script, typically what the > output looks like? > > Cheers, > > On Fri, Jul 27, 2018 at 4:12 PM, Reuti wrote: > > Am 27.07.2018 um 03:14 schrieb Derrick Lin: > > > We are using $JOB_ID as xfs_projid at the moment, but this approach > > introduces problem to array jobs whose tasks have the same $JOB_ID (with > > different $TASK_ID). > > > > Also it is possible that tasks from two different array jobs run on the > > same node contain the same $TASK_ID, thus the uniqueness of the $TASK_ID on > > the same host cannot be maintained. > > So the number you are looking for needs to be unique per node only? > > What about using the additional group ID then which SGE creates – this will > be unique per node. > > This can be found in the `id` command's output or in location of the spool > directory for the execd_spool_dir in > ${HOSTNAME}/active_jobs/${JOB_ID}.${TASK_ID}/addgrpid > > -- Reuti > > > > That's why I am trying to implement the xfs_projid to be independent from > > SGE. > > > > > > > > On Thu, Jul 26, 2018 at 9:27 PM, Reuti wrote: > > Hi, > > > > > Am 26.07.2018 um 06:01 schrieb Derrick Lin : > > > > > > Hi all, > > > > > > I am working on a prolog script which setup xfs quota on disk space per > > > job basis. > > > > > > For setting up xfs quota in sub directory, I need to provide project ID. > > > > > > Here is how I did for generating project ID: > > > > > > XFS_PROJID_CF="/tmp/xfs_projid_counter" > > > > > > echo $JOB_ID >> $XFS_PROJID_CF > > > xfs_projid=$(wc -l < $XFS_PROJID_CF) > > > > The xfs_projid is then the number of lines in the file? Why not using > > $JOB_ID directly? Is there a limit in max. project ID and the $JOB_ID might > > be larger? > > > > -- Reuti > > > > > > > My test shows, when there are multiple jobs start on the same exec host > > > at the same time, the prolog script is executed almost the same time, > > > results multiple jobs share the same xfs_projid, which is no good. > > > > > > I am wondering if I can configure the scheduler to start the jobs in a > > > sequential way (probably has a interval in between). > > > > > > > > > Cheers, > > > Derrick > > > ___ > > > users mailing list > > > users@gridengine.org > > > https://gridengine.org/mailman/listinfo/users > > > > > > ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Start jobs on exec host in sequential order
Thanks Reuti, I know little about group ID created by SGE, and also pretty much confused with the Linux group ID. I assume that "ïd" is called inside the prolog script, typically what the output looks like? Cheers, On Fri, Jul 27, 2018 at 4:12 PM, Reuti wrote: > > Am 27.07.2018 um 03:14 schrieb Derrick Lin: > > > We are using $JOB_ID as xfs_projid at the moment, but this approach > introduces problem to array jobs whose tasks have the same $JOB_ID (with > different $TASK_ID). > > > > Also it is possible that tasks from two different array jobs run on the > same node contain the same $TASK_ID, thus the uniqueness of the $TASK_ID on > the same host cannot be maintained. > > So the number you are looking for needs to be unique per node only? > > What about using the additional group ID then which SGE creates – this > will be unique per node. > > This can be found in the `id` command's output or in location of the spool > directory for the execd_spool_dir in ${HOSTNAME}/active_jobs/${JOB_ > ID}.${TASK_ID}/addgrpid > > -- Reuti > > > > That's why I am trying to implement the xfs_projid to be independent > from SGE. > > > > > > > > On Thu, Jul 26, 2018 at 9:27 PM, Reuti > wrote: > > Hi, > > > > > Am 26.07.2018 um 06:01 schrieb Derrick Lin : > > > > > > Hi all, > > > > > > I am working on a prolog script which setup xfs quota on disk space > per job basis. > > > > > > For setting up xfs quota in sub directory, I need to provide project > ID. > > > > > > Here is how I did for generating project ID: > > > > > > XFS_PROJID_CF="/tmp/xfs_projid_counter" > > > > > > echo $JOB_ID >> $XFS_PROJID_CF > > > xfs_projid=$(wc -l < $XFS_PROJID_CF) > > > > The xfs_projid is then the number of lines in the file? Why not using > $JOB_ID directly? Is there a limit in max. project ID and the $JOB_ID might > be larger? > > > > -- Reuti > > > > > > > My test shows, when there are multiple jobs start on the same exec > host at the same time, the prolog script is executed almost the same time, > results multiple jobs share the same xfs_projid, which is no good. > > > > > > I am wondering if I can configure the scheduler to start the jobs in a > sequential way (probably has a interval in between). > > > > > > > > > Cheers, > > > Derrick > > > ___ > > > users mailing list > > > users@gridengine.org > > > https://gridengine.org/mailman/listinfo/users > > > > > > ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Start jobs on exec host in sequential order
Am 27.07.2018 um 03:14 schrieb Derrick Lin: > We are using $JOB_ID as xfs_projid at the moment, but this approach > introduces problem to array jobs whose tasks have the same $JOB_ID (with > different $TASK_ID). > > Also it is possible that tasks from two different array jobs run on the same > node contain the same $TASK_ID, thus the uniqueness of the $TASK_ID on the > same host cannot be maintained. So the number you are looking for needs to be unique per node only? What about using the additional group ID then which SGE creates – this will be unique per node. This can be found in the `id` command's output or in location of the spool directory for the execd_spool_dir in ${HOSTNAME}/active_jobs/${JOB_ID}.${TASK_ID}/addgrpid -- Reuti > That's why I am trying to implement the xfs_projid to be independent from SGE. > > > > On Thu, Jul 26, 2018 at 9:27 PM, Reuti wrote: > Hi, > > > Am 26.07.2018 um 06:01 schrieb Derrick Lin : > > > > Hi all, > > > > I am working on a prolog script which setup xfs quota on disk space per job > > basis. > > > > For setting up xfs quota in sub directory, I need to provide project ID. > > > > Here is how I did for generating project ID: > > > > XFS_PROJID_CF="/tmp/xfs_projid_counter" > > > > echo $JOB_ID >> $XFS_PROJID_CF > > xfs_projid=$(wc -l < $XFS_PROJID_CF) > > The xfs_projid is then the number of lines in the file? Why not using $JOB_ID > directly? Is there a limit in max. project ID and the $JOB_ID might be larger? > > -- Reuti > > > > My test shows, when there are multiple jobs start on the same exec host at > > the same time, the prolog script is executed almost the same time, results > > multiple jobs share the same xfs_projid, which is no good. > > > > I am wondering if I can configure the scheduler to start the jobs in a > > sequential way (probably has a interval in between). > > > > > > Cheers, > > Derrick > > ___ > > users mailing list > > users@gridengine.org > > https://gridengine.org/mailman/listinfo/users > > ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Start jobs on exec host in sequential order
We are using $JOB_ID as xfs_projid at the moment, but this approach introduces problem to array jobs whose tasks have the same $JOB_ID (with different $TASK_ID). Also it is possible that tasks from two different array jobs run on the same node contain the same $TASK_ID, thus the uniqueness of the $TASK_ID on the same host cannot be maintained. That's why I am trying to implement the xfs_projid to be independent from SGE. On Thu, Jul 26, 2018 at 9:27 PM, Reuti wrote: > Hi, > > > Am 26.07.2018 um 06:01 schrieb Derrick Lin : > > > > Hi all, > > > > I am working on a prolog script which setup xfs quota on disk space per > job basis. > > > > For setting up xfs quota in sub directory, I need to provide project ID. > > > > Here is how I did for generating project ID: > > > > XFS_PROJID_CF="/tmp/xfs_projid_counter" > > > > echo $JOB_ID >> $XFS_PROJID_CF > > xfs_projid=$(wc -l < $XFS_PROJID_CF) > > The xfs_projid is then the number of lines in the file? Why not using > $JOB_ID directly? Is there a limit in max. project ID and the $JOB_ID might > be larger? > > -- Reuti > > > > My test shows, when there are multiple jobs start on the same exec host > at the same time, the prolog script is executed almost the same time, > results multiple jobs share the same xfs_projid, which is no good. > > > > I am wondering if I can configure the scheduler to start the jobs in a > sequential way (probably has a interval in between). > > > > > > Cheers, > > Derrick > > ___ > > users mailing list > > users@gridengine.org > > https://gridengine.org/mailman/listinfo/users > > ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Start jobs on exec host in sequential order
Hi, > Am 26.07.2018 um 06:01 schrieb Derrick Lin : > > Hi all, > > I am working on a prolog script which setup xfs quota on disk space per job > basis. > > For setting up xfs quota in sub directory, I need to provide project ID. > > Here is how I did for generating project ID: > > XFS_PROJID_CF="/tmp/xfs_projid_counter" > > echo $JOB_ID >> $XFS_PROJID_CF > xfs_projid=$(wc -l < $XFS_PROJID_CF) The xfs_projid is then the number of lines in the file? Why not using $JOB_ID directly? Is there a limit in max. project ID and the $JOB_ID might be larger? -- Reuti > My test shows, when there are multiple jobs start on the same exec host at > the same time, the prolog script is executed almost the same time, results > multiple jobs share the same xfs_projid, which is no good. > > I am wondering if I can configure the scheduler to start the jobs in a > sequential way (probably has a interval in between). > > > Cheers, > Derrick > ___ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Start jobs on exec host in sequential order
On Thu, 2018-07-26 at 14:01 +1000, Derrick Lin wrote: > Hi all, > > I am working on a prolog script which setup xfs quota on disk space > per job basis. > > For setting up xfs quota in sub directory, I need to provide project > ID. > > Here is how I did for generating project ID: > > XFS_PROJID_CF="/tmp/xfs_projid_counter" > > echo $JOB_ID >> $XFS_PROJID_CF > xfs_projid=$(wc -l < $XFS_PROJID_CF) > > My test shows, when there are multiple jobs start on the same exec > host at the same time, the prolog script is executed almost the same > time, results multiple jobs share the same xfs_projid, which is no > good. > > I am wondering if I can configure the scheduler to start the jobs in > a sequential way (probably has a interval in between). Hi Derrick, Assuming you're using Bash (based on your snippets above), it's probably better to use flock to lock/unlock your project id counter file and ensure exclusive access. There's some basic examples in the man page, and a Gist with very useful functions here: https://gist.github.com/przemoc/571091 Cheers, Chris ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users