Re: [gridengine users] Functional shares autonomously reset to 0 for recently added user

2020-08-25 Thread Mun Johl
Hi William,

... Text Deleted ...

> >Anyone else experience this phenomenon?
> Never seen this and we use a largely functional policy on our clusters  Could 
> the user
> be reaching delete_time and then be being recreated with the functional
> shares from sge_conf auto_user_fshare?

[Mun] That's an interesting point; I'm not too familiar with the delete_time 
parameter and I don't typically specify anything when creating a new user.  I 
just checked and currently all users have a delete_time of 0.  But I suppose 
it's possible that perhaps when I first create a user there may be a non-zero 
value for delete_time.  Certainly gives me something for which to be on the 
lookout.

Thank you,

-- 
Mun

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


[gridengine users] Functional shares autonomously reset to 0 for recently added user

2020-08-24 Thread Mun Johl
Hi all,

We are running SGE v8.1.9 on systems running Red Hat Enterprise Linux v6.8 .

This anomaly isn't a showstopper by any means, but it has happened enough that 
I decided to reach out and ask if anyone else has experienced this phenomenon 
and if a fix/workaround is available.

Here's the anomaly I've experienced many times:

We deploy the Functional Policy, and often times when a new user has been added 
and I have updated the user's Functional Shares via QMON, that user's 
Functional Shares will reset to 0 after a while.  I typically don't notice 
until the user reports that his/her jobs are queued longer than others, at 
which time I will update the user's functional shares again.  After one or two 
times through this loop the value sticks and I don't have any further problems 
... until I have to add another new user.

Anyone else experience this phenomenon?

Thank you and best regards,

--
Mun
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] How to export an X11 back to the client?

2020-05-14 Thread Mun Johl
Hi,

I just thought I'd report that I was finally able to get X11 forwarding to 
work.  The final step was for us to disable SELinux.  Once I did that (and 
turned off the firewall) X11 forwarding worked great.  So now I'll work with IT 
for a workable solution that they are happy with.

Thank you very much for all the great advice and support!

Best regards,

-- 
Mun


> Hi Reuti,
> 
> Thank you kindly for your response.
> I have provided comments below.
> 
> > -Original Message-
> > Hi,
> >
> > Am 12.05.2020 um 23:27 schrieb Mun Johl:
> >
> > > Hi,
> > >
> > > Just some additional testing results ...
> > >
> > > Our IT guy turned off the firewall on a Submit Host and Execution Host 
> > > for experimental purposes.  That got me further but not all
> > the way.  Here is the verbose log from qrsh:
> > >
> > > waiting for interactive job to be scheduled ...
> > > Your interactive job 460937 has been successfully scheduled.
> > > Establishing /usr/bin/ssh -X session to host sim.domain.com ...
> > > ssh_exchange_identification: Connection closed by remote host
> > > /usr/bin/ssh -X exited with exit code 255
> > > reading exit code from shepherd ... 129
> > >
> > > We aren't yet able to get around the ssh -X error.  Any ideas?
> >
> > But a plain `ssh`to the nodes work?
> 
> [Mun] Yes, I can ssh into the nodes.  I can also 'ssh -X' into the nodes from 
> a terminal and open X11 apps.
> 
> > In case a different hostname must be used, there is an option 
> > "HostbasedUsesNameFromPacketOnly" in "sshd_config".
> 
> [Mun] I don't _think_ that is/should be required.
> 
> > > But even if we could, we still need to figure out which ports of the 
> > > firewall need to be opened up.  Every time we ran an
> experiment,
> > the port number that was used for SSH was different.  I hope we don't have 
> > to open up too big a range of ports.
> >
> > Unfortunately the port is randomly chosen with any new connection.
> 
> [Mun] Yes, unfortunate; I thought I read that somewhere.
> 
> > But wouldn't it be possible to adjust the firewall to allow all ports only 
> > when connecting from the nodes in the cluster (are the
> nodes
> > in a VLAN behind a head node or all submit machines and nodes also 
> > connected to the Internet?)
> 
> [Mun] The nodes are on their own subnet, so what you suggest might be 
> possible.  I'll check with our IT guy about that since I'm not
> very well versed with firewall configuration.
> 
> > Also in SSH itself it is possible with the "match" option in "sshd_config" 
> > to allow only certain users from certain nodes.
> 
> [Mun] Good to know; thank you.
> 
> > Nevertheless: maybe adding "-v" to the `ssh` command will output additional 
> > info, also the messages of `sshd` might be in some log
> > file.
> 
> [Mun] We had tried that but unfortunately it was not much help to me.  In 
> case it is useful to anyone on this reflector, here is the log:
> 
> waiting for interactive job to be scheduled ...
> Your interactive job 460968 has been successfully scheduled.
> Establishing /usr/bin/ssh -X -vv session to host sim.domain.com ...
> OpenSSH_5.3p1, OpenSSL 1.0.1e-fips 11 Feb 2013
> debug1: Reading configuration data /etc/ssh/ssh_config
> debug1: Applying options for *
> debug2: ssh_connect: needpriv 0
> debug1: Connecting to sim.domain.com [10.203.224.81] port 43929.
> debug1: Connection established.
> debug1: identity file /home/mun/.ssh/identity type -1
> debug1: identity file /home/mun/.ssh/identity-cert type -1
> debug2: key_type_from_name: unknown key type '-BEGIN'
> debug2: key_type_from_name: unknown key type '-END'
> debug1: identity file /home/mun/.ssh/id_rsa type 1
> debug1: identity file /home/mun/.ssh/id_rsa-cert type -1
> debug1: identity file /home/mun/.ssh/id_dsa type -1
> debug1: identity file /home/mun/.ssh/id_dsa-cert type -1
> debug1: identity file /home/mun/.ssh/id_ecdsa type -1
> debug1: identity file /home/mun/.ssh/id_ecdsa-cert type -1
> ssh_exchange_identification: Connection closed by remote host
> /usr/bin/ssh -X -vv -o UserKnownHostsFile=/dev/null -o 
> StrictHostKeyChecking=no exited with exit code 255
> reading exit code from shepherd ... 129
> 
> Best regards,
> 
> --
> Mun

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] How to export an X11 back to the client?

2020-05-12 Thread Mun Johl
Hi Reuti,

Thank you kindly for your response.
I have provided comments below.

> -Original Message-
> Hi,
> 
> Am 12.05.2020 um 23:27 schrieb Mun Johl:
> 
> > Hi,
> >
> > Just some additional testing results ...
> >
> > Our IT guy turned off the firewall on a Submit Host and Execution Host for 
> > experimental purposes.  That got me further but not all
> the way.  Here is the verbose log from qrsh:
> >
> > waiting for interactive job to be scheduled ...
> > Your interactive job 460937 has been successfully scheduled.
> > Establishing /usr/bin/ssh -X session to host sim.domain.com ...
> > ssh_exchange_identification: Connection closed by remote host
> > /usr/bin/ssh -X exited with exit code 255
> > reading exit code from shepherd ... 129
> >
> > We aren't yet able to get around the ssh -X error.  Any ideas?
> 
> But a plain `ssh`to the nodes work?

[Mun] Yes, I can ssh into the nodes.  I can also 'ssh -X' into the nodes from a 
terminal and open X11 apps.

> In case a different hostname must be used, there is an option 
> "HostbasedUsesNameFromPacketOnly" in "sshd_config".

[Mun] I don't _think_ that is/should be required.

> > But even if we could, we still need to figure out which ports of the 
> > firewall need to be opened up.  Every time we ran an experiment,
> the port number that was used for SSH was different.  I hope we don't have to 
> open up too big a range of ports.
> 
> Unfortunately the port is randomly chosen with any new connection.

[Mun] Yes, unfortunate; I thought I read that somewhere.

> But wouldn't it be possible to adjust the firewall to allow all ports only 
> when connecting from the nodes in the cluster (are the nodes
> in a VLAN behind a head node or all submit machines and nodes also connected 
> to the Internet?)

[Mun] The nodes are on their own subnet, so what you suggest might be possible. 
 I'll check with our IT guy about that since I'm not very well versed with 
firewall configuration.

> Also in SSH itself it is possible with the "match" option in "sshd_config" to 
> allow only certain users from certain nodes.

[Mun] Good to know; thank you.

> Nevertheless: maybe adding "-v" to the `ssh` command will output additional 
> info, also the messages of `sshd` might be in some log
> file.

[Mun] We had tried that but unfortunately it was not much help to me.  In case 
it is useful to anyone on this reflector, here is the log:

waiting for interactive job to be scheduled ...
Your interactive job 460968 has been successfully scheduled.
Establishing /usr/bin/ssh -X -vv session to host sim.domain.com ...
OpenSSH_5.3p1, OpenSSL 1.0.1e-fips 11 Feb 2013
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Applying options for *
debug2: ssh_connect: needpriv 0
debug1: Connecting to sim.domain.com [10.203.224.81] port 43929.
debug1: Connection established.
debug1: identity file /home/mun/.ssh/identity type -1
debug1: identity file /home/mun/.ssh/identity-cert type -1
debug2: key_type_from_name: unknown key type '-BEGIN'
debug2: key_type_from_name: unknown key type '-END'
debug1: identity file /home/mun/.ssh/id_rsa type 1
debug1: identity file /home/mun/.ssh/id_rsa-cert type -1
debug1: identity file /home/mun/.ssh/id_dsa type -1
debug1: identity file /home/mun/.ssh/id_dsa-cert type -1
debug1: identity file /home/mun/.ssh/id_ecdsa type -1
debug1: identity file /home/mun/.ssh/id_ecdsa-cert type -1
ssh_exchange_identification: Connection closed by remote host
/usr/bin/ssh -X -vv -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no 
exited with exit code 255
reading exit code from shepherd ... 129

Best regards,

-- 
Mun

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] How to export an X11 back to the client?

2020-05-12 Thread Mun Johl
Hi,

Just some additional testing results ...

Our IT guy turned off the firewall on a Submit Host and Execution Host for 
experimental purposes.  That got me further but not all the way.  Here is the 
verbose log from qrsh:

waiting for interactive job to be scheduled ...
Your interactive job 460937 has been successfully scheduled.
Establishing /usr/bin/ssh -X session to host sim.domain.com ...
ssh_exchange_identification: Connection closed by remote host
/usr/bin/ssh -X exited with exit code 255
reading exit code from shepherd ... 129

We aren't yet able to get around the ssh -X error.  Any ideas?

But even if we could, we still need to figure out which ports of the firewall 
need to be opened up.  Every time we ran an experiment, the port number that 
was used for SSH was different.  I hope we don't have to open up too big a 
range of ports.

Feedback would be welcomed.

Best regards,

-- 
Mun



> -Original Message-
> Hi William, et al.,
> 
> > On Mon, May 11, 2020 at 09:30:14PM +, Mun Johl wrote:
> > > Hi William, et al.,
> > > [Mun] Thanks for the tip; I'm still trying to get back to where I can 
> > > launch qsrh again.  Even after I put the requisite
> /etc/pam.d/sshd
> > line at the head of the file I'm still getting the "Your "qrsh" request 
> > could not be scheduled, try again later." message for some
> reason.
> > But I will continue to debug that issue.
> >
> > The pam_sge-qrsh-setup.so shouldn't have anything to do with this since
> > the message occurs before any attempt to launch the job.  You could try
> > running a qrsh -w p or and/or qrsh -w v to get a report on why the qrsh
> > isn't being scheduled.  They aren't always easy to read and -w v doesn't
> > reliably ignore exclusive vars in use but can nevertheless be helpful.
> 
> [Mun] With 'qrsh -w p' and 'qrsh -w v' I got the following output:
> verification: found suitable queue(s)
> 
> I then replaced the -w option with -verbose which produced the following 
> output:
> 
> waiting for interactive job to be scheduled ...timeout (54 s) expired while 
> waiting on socket fd 4
> Your "qrsh" request could not be scheduled, try again later.
> 
> I have no idea what is meant by "socket fd 4"; but that leads me to believe 
> we have some sort of blocked port or something.
> 
> Are there any additional ports that need to be opened up in order to use 
> 'qrsh & ssh -X' ?
> 
> One last noteworthy item that recently occurred to me is that when SGE was 
> initially installed on our servers, we had a different
> domain name.  Late last year we were acquired and our domain changed.  
> However, our /etc/hosts still has the old domain simply
> because SGE couldn't deal with the change in the domain--or rather, it was 
> the easiest course of action for me to take and keep SGE
> working.  I wonder if that is in some way interfering with 'qrsh & ssh -X'?
> 
> I am going to try and do some additional debug today and will report any 
> progress.
> 
> Thank you and regards,
> 
> --
> Mun

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] How to export an X11 back to the client?

2020-05-12 Thread Mun Johl
Hi William, et al.,

> On Mon, May 11, 2020 at 09:30:14PM +0000, Mun Johl wrote:
> > Hi William, et al.,
> > [Mun] Thanks for the tip; I'm still trying to get back to where I can 
> > launch qsrh again.  Even after I put the requisite /etc/pam.d/sshd
> line at the head of the file I'm still getting the "Your "qrsh" request could 
> not be scheduled, try again later." message for some reason.
> But I will continue to debug that issue.
> 
> The pam_sge-qrsh-setup.so shouldn't have anything to do with this since
> the message occurs before any attempt to launch the job.  You could try
> running a qrsh -w p or and/or qrsh -w v to get a report on why the qrsh
> isn't being scheduled.  They aren't always easy to read and -w v doesn't
> reliably ignore exclusive vars in use but can nevertheless be helpful.

[Mun] With 'qrsh -w p' and 'qrsh -w v' I got the following output:
verification: found suitable queue(s)

I then replaced the -w option with -verbose which produced the following output:

waiting for interactive job to be scheduled ...timeout (54 s) expired while 
waiting on socket fd 4
Your "qrsh" request could not be scheduled, try again later.

I have no idea what is meant by "socket fd 4"; but that leads me to believe we 
have some sort of blocked port or something.

Are there any additional ports that need to be opened up in order to use 'qrsh 
& ssh -X' ?

One last noteworthy item that recently occurred to me is that when SGE was 
initially installed on our servers, we had a different domain name.  Late last 
year we were acquired and our domain changed.  However, our /etc/hosts still 
has the old domain simply because SGE couldn't deal with the change in the 
domain--or rather, it was the easiest course of action for me to take and keep 
SGE working.  I wonder if that is in some way interfering with 'qrsh & ssh -X'?

I am going to try and do some additional debug today and will report any 
progress.

Thank you and regards,

-- 
Mun

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] How to export an X11 back to the client?

2020-05-12 Thread Mun Johl
Hi William,

> On Mon, May 11, 2020 at 09:39:26PM +0000, Mun Johl wrote:
> > Hi William,
> >
> > Thank you for your reply.
> > See my comments below.
> >
> > > -Original Message-
> > > On Thu, May 07, 2020 at 06:29:05PM +, Mun Johl wrote:
> > > > Hi William, et al.,
> > > > I am not explicitly setting the DISPLAY--as that is how I normally use 
> > > > 'ssh -X'.  Nor have I done anything to open any additional
> ports.
> > > Again, since 'ssh -X' is working for us.  As a reminder, there is no way 
> > > for me to know what to set DISPLAY to even if I wanted to set
> it.
> > > >
> > > Do you invoke qrsh with -V by any chance?  I think that might cause the
> > > DISPLAY from the login node to override the one set by ssh -X.  If you
> > > do could you switch to using -v to transfer individual environment
> > > variables instead?
> >
> > [Mun] Yes, I do use -V normally.  When I once again get to a point where 
> > qrsh is able to launch I will certain try your suggestion.  But
> I may tweak it by simply "unset'ing" DISPLAY from the wrapper script rather 
> than using -v because we have many env vars that are
> required in order to correctly run a job.
> The problem with unsetting DISPLAY is that if you do it then ssh won't
> be able to forward it.

[Mun] Good point; not sure what I was thinking.

> Possibly env -u DISPLAY XDISPLAY="${DISPLAY}" qrsh -now n -V 
> Then a wrapper around ssh as your qrsh_command
> #!/bin/sh
> env DISPLAY="${XDISPLAY}" /usr/bin/ssh -X "$@"

[Mun] Thanks; if I can get qrsh to launch again, I'll keep this tip in mind :)

Regards,

-- 
Mun


___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] How to export an X11 back to the client?

2020-05-11 Thread Mun Johl
Hi William,

Thank you for your reply.
See my comments below.

> -Original Message-
> On Thu, May 07, 2020 at 06:29:05PM +0000, Mun Johl wrote:
> > Hi William, et al.,
> > I am not explicitly setting the DISPLAY--as that is how I normally use 'ssh 
> > -X'.  Nor have I done anything to open any additional ports.
> Again, since 'ssh -X' is working for us.  As a reminder, there is no way for 
> me to know what to set DISPLAY to even if I wanted to set it.
> >
> Do you invoke qrsh with -V by any chance?  I think that might cause the
> DISPLAY from the login node to override the one set by ssh -X.  If you
> do could you switch to using -v to transfer individual environment
> variables instead?

[Mun] Yes, I do use -V normally.  When I once again get to a point where qrsh 
is able to launch I will certain try your suggestion.  But I may tweak it by 
simply "unset'ing" DISPLAY from the wrapper script rather than using -v because 
we have many env vars that are required in order to correctly run a job.

Best regards,

-- 
Mun

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] How to export an X11 back to the client?

2020-05-11 Thread Mun Johl
Hi William, et al.,

Thank you for your reply.
Please see me comments below.

> On Thu, May 07, 2020 at 06:29:05PM +0000, Mun Johl wrote:
> > Hi William, et al.,
> >
> > Thank you kindly for your response and insight.
> > Please see my comments below.
> > Now for the issues:

[Mun] ... Stuff deleted ...

> > I first added the pam_sge-qrsh-setup.so at the top of the /etc/pam.d/sshd 
> > file.  When I did that the qrsh job was launched but
> quickly terminated with the following error from the tool I was attempting to 
> launch:
> >
> > ncsim/STRPIN =
> > The connection to SimVision could not be established due to an error
> > in SimVision. Check your DISPLAY environment variable,
> > which may be one of the reasons for this error.
> >
> > I am not explicitly setting the DISPLAY--as that is how I normally use 'ssh 
> > -X'.  Nor have I done anything to open any additional ports.
> Again, since 'ssh -X' is working for us.  As a reminder, there is no way for 
> me to know what to set DISPLAY to even if I wanted to set it.
> 
> If you can get it back to the actually launching mode then trying to run
> qrsh -now n /bin/env to list out the environment you are getting might
> help debug.

[Mun] Thanks for the tip; I'm still trying to get back to where I can launch 
qsrh again.  Even after I put the requisite /etc/pam.d/sshd line at the head of 
the file I'm still getting the "Your "qrsh" request could not be scheduled, try 
again later." message for some reason.  But I will continue to debug that issue.

> > Now, the /etc/pam.d/sshd update caused an ssh issue: Users could no longer 
> > ssh into our servers :(  I didn't realize the order of the
> lines in the sshd is significant.
> >
> > Therefore, I moved the pam_sge-qrsh-setup.so entry below the other "auth" 
> > lines.  Although, that resulted in the following error
> when I tried the qrsh command again:
> >
> >  Your "qrsh" request could not be scheduled, try again later.
> Did you remember the -now no option?  That looks like the sort of
> message one might get if you forgot it.

[Mun] Yes, I did include the "-now no" option in the qrsh call.

> > One final note is that we have "selinux" enabled on our servers.  I don't 
> > know if that makes any difference, but I thought I'd throw it
> out there.
> Depends how it is configured I guess.  Which linux distro are you using?

[Mun] Red Hat Enterprise Linux v6.8

Regards,

-- 
Mun

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] How to export an X11 back to the client?

2020-05-07 Thread Mun Johl
Hi William, et al.,

Thank you kindly for your response and insight.
Please see my comments below.

> On Wed, May 06, 2020 at 11:10:40PM +0000, Mun Johl wrote:
> > [Mun] In order to use ssh -X for our jobs that require an X11 window to be 
> > pushed to a user's VNC display, I am planning on the
> following changes.  But please let me know if I have missed something (or 
> everything).
> >
> > 1. Update the global configuration with the following parameters:
> >
> >  rsh_command /usr/bin/ssh -X
> >  rsh_daemon  /usr/sbin/sshd -i
> 
> As you are using the pam_sge-qrsh-setup.so you will need to set
> rsh_daemon to point to a rshd-wrapper which you should find in
> $SGE_ROOT/util/resources/wrappers eg if $SGE_ROOT is /opt/sge
> 
> rsh_command /usr/bin/ssh -X
> rsh_daemon /opt/sge/util/resources/wrappers/rshd-wrapper

[Mun] Thanks for pointing out my mistake!

> > 2. Use a PAM module to attach an additional group ID to sshd.  The 
> > following line will be added to /etc/pam.d/sshd on all SGE
> hosts:
> >
> >   auth required /opt/sge/lib/lx-amd64/pam_sge-qrsh-setup.so
> >
> > 3. Do I need to restart all of the SGE daemons at this point?
> 
> No it should be fine without a restart
> 
> >
> > 4. In order to run our GUI app, launch it thusly:
> >
> >   $ qrsh -now no wrapper.tcl
> 
> That looks fine, assuming sensible default resource requests, although
> obviously I don't know the details of the wrapper or application.

[Mun] After making the above changes, I'm still experiencing problems.  First, 
let me point out that I should have more accurately represented how qrsh will 
be used:

$ qrsh -now no  tclsh wrapper.tcl 

Now for the issues:

I first added the pam_sge-qrsh-setup.so at the top of the /etc/pam.d/sshd file. 
 When I did that the qrsh job was launched but quickly terminated with the 
following error from the tool I was attempting to launch:

ncsim/STRPIN =
The connection to SimVision could not be established due to an error
in SimVision. Check your DISPLAY environment variable,
which may be one of the reasons for this error.

I am not explicitly setting the DISPLAY--as that is how I normally use 'ssh 
-X'.  Nor have I done anything to open any additional ports.  Again, since 'ssh 
-X' is working for us.  As a reminder, there is no way for me to know what to 
set DISPLAY to even if I wanted to set it.

Now, the /etc/pam.d/sshd update caused an ssh issue: Users could no longer ssh 
into our servers :(  I didn't realize the order of the lines in the sshd is 
significant.

Therefore, I moved the pam_sge-qrsh-setup.so entry below the other "auth" 
lines.  Although, that resulted in the following error when I tried the qrsh 
command again:

 Your "qrsh" request could not be scheduled, try again later.

One final note is that we have "selinux" enabled on our servers.  I don't know 
if that makes any difference, but I thought I'd throw it out there.

Feedback would be most welcome.

Best regards,

-- 
Mun



___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] How to export an X11 back to the client?

2020-05-06 Thread Mun Johl
Hi,

As I stated in an earlier reply I will attempt to follow Reuti and William's 
advice regarding ssh inside of SGE.  Along those lines, please see my comments 
below to review what I believe are my next steps.

> Hi,
> 
> Am 01.05.2020 um 20:44 schrieb Mun Johl:
> 
> > Hi,
> >
> > I am using SGE on RHEL6.  I am trying to launch a qsub job (a TCL script) 
> > via grid that will result in a GUI application being opened on
> the caller's display (which is a VNC session).
> >
> > What I'm seeing is that if I set DISPLAY to the actual VNC display (e.g. 
> > host1:4) in the wrapper script that invokes qsub, the GUI
> application complains that it cannot make a connection.  On a side note, I 
> noticed that when I use ssh -X to login to one of our grid
> servers, my DISPLAY is set to something like localhost:10 .  Now, if I use 
> localhost:10 (for example) in my grid wrapper script, the GUI
> application _will_ open on my VNC display.
> 
> Yes, here X11 forwarding is provided by SSH. The forwarding of X11 is not 
> built into SGE.
> 
> 
> > Of course, with multiple users and multiple grid servers, I have no idea 
> > what a particular qsub command's DISPLAY should be set to.
> I must be missing something because I'm sure others have already solved this 
> issue.
> 
> Inside the wrapper it should always by something like localhost:10 with a 
> varying number. This is set by the login via SSH. Hence I'm
> not sure what you are looking for to be set.
> 
> Maybe you want to define in SGE to always use SSH -X?
> 
> https://arc.liv.ac.uk/SGE/htmlman/htmlman5/remote_startup.html

[Mun] In order to use ssh -X for our jobs that require an X11 window to be 
pushed to a user's VNC display, I am planning on the following changes.  But 
please let me know if I have missed something (or everything).

1. Update the global configuration with the following parameters:

 rsh_command /usr/bin/ssh -X
 rsh_daemon  /usr/sbin/sshd -i

2. Use a PAM module to attach an additional group ID to sshd.  The following 
line will be added to /etc/pam.d/sshd on all SGE hosts:

  auth required /opt/sge/lib/lx-amd64/pam_sge-qrsh-setup.so

3. Do I need to restart all of the SGE daemons at this point?

4. In order to run our GUI app, launch it thusly:

  $ qrsh -now no wrapper.tcl

Note that currently  we launch wrapper.tcl via qsub.

I would appreciate any and all feedback.

Best regards,

-- 
Mun

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] How to export an X11 back to the client?

2020-05-04 Thread Mun Johl
Hi William,

Thanks for replying.
Please see my comments below.

> On Fri, May 01, 2020 at 06:44:08PM +0000, Mun Johl wrote:
> >Hi,
> >
> >
> >
> >I am using SGE on RHEL6.  I am trying to launch a qsub job (a TCL script)
> >via grid that will result in a GUI application being opened on the
> >caller’s display (which is a VNC session).
> Using qsub for this makes this more difficult than it needs to be since
> qsub jobs run largely disconnected from the submit host.  I wouldn't
> have thought you would want a delay with something interactive like
> this.  As Reuti suggested you could set up ssh tight integration (with X
> forwarding enabled in ssh) and then use qrsh -now n  to launch
> your app.

[Mun] I will pursue the method you and Reuti have suggested.   Any tips or 
documentation to help get me there would be appreciated.  Although, I will do 
some Google searches soon.

> >What I’m seeing is that if I set DISPLAY to the actual VNC display (e.g.
> >host1:4) in the wrapper script that invokes qsub, the GUI application
> Have you checked that host1 resolves to on the machine where you submit
> the job and the machine where it runs.  If you are getting a failure to
> connect it might be because you need to use the FQDN.

[Mun] I have tried the FQDN and IP address; I get the same error (failure to 
connect).  Only if I use the localhost: can I get the X11 window to 
pop up on the respective display.

> >complains that it cannot make a connection.  On a side note, I noticed
> In general X11 servers don't allow random clients to talk to
> the display.  The app may be (mis-)reporting a failure to authorise as a
> failure to connect.  This has little to do with grid engine per se just a
> side effect of running the app on a different machine from the X-Server.
> You may need to perform some manipulations with xauth to enable it.
> 
> Do the grid engine servers share home directories with the machine where
> you are running qsub (eg via NFS or a cluster file system)?

[Mun] Yes.  The same directory structure is mounted on the login servers as 
well as the grid servers.

Best regards,

-- 
Mun

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] users Digest, Vol 113, Issue 2

2020-05-04 Thread Mun Johl
Hi Sylvain, William,

First of all, thank you both for your valuable feedback!
Please see my comments below.

> On Mon, May 04, 2020 at 09:06:46AM -0400, Korzennik, Sylvain wrote:
> >We have no problem having jobs w/ X11 enabled, BUT users must use qlogin,
> >not qsub or qrsh (the way we have configured it).
> >We have switched from SGE to UGE, but I'm sure the 'issue'  is the same,
> >you need to have the '[qlogin|rlogin|rsh]_command" and "_deamon" set
> >accordingly, and if you use ssh, you need to have X-tunnelling enabled
> >(lssh -X) - it's not just simply a matter of setting up DISPLAY or
> >adjusting ~/.Xauthority. Here is our config:
> The problem with using qlogin is that it doesn't provide an easy way to pass a
> command to it as you always get an interactive shell.  The original
> poster wanted to run a command.  You could I suppose feed the command you want
> run into qlogin's stdin but that feels a little more fragile.

[Mun] Yes, I do want to kick off a graphical tool; therefore, as William & 
Reuti suggest, using qrsh and ssh -X/sshd -I may be route to go.  
Unfortunately, I'm not too familiar with setting up ssh for SGE and thus have 
some research to do.

> Having the rsh_command and rsh_daemon set up using ssh -X/sshd -i (as in
> ssh tight integration) lets
> you pass commands in a simpler way.

[Mun] I'm all for "simpler" ;)

> If you don't want to or can't fiddle with the various _command and _daemon
> settings then your only real options are fiddling with DISPLAY and
> ~/.Xauthority

[Mun] I already tried the "fiddling" method with little success.  Therefore, I 
will pursue the ssh -X technique next.

Best regards,

-- 
Mun



___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] How to export an X11 back to the client?

2020-05-01 Thread Mun Johl
Hi Reuti,

Thank you for your reply.
Please see my comments below.

> Hi,
> 
> Am 01.05.2020 um 20:44 schrieb Mun Johl:
> 
> > Hi,
> >
> > I am using SGE on RHEL6.  I am trying to launch a qsub job (a TCL script) 
> > via grid that will result in a GUI application being opened on
> the caller's display (which is a VNC session).
> >
> > What I'm seeing is that if I set DISPLAY to the actual VNC display (e.g. 
> > host1:4) in the wrapper script that invokes qsub, the GUI
> application complains that it cannot make a connection.  On a side note, I 
> noticed that when I use ssh -X to login to one of our grid
> servers, my DISPLAY is set to something like localhost:10 .  Now, if I use 
> localhost:10 (for example) in my grid wrapper script, the GUI
> application _will_ open on my VNC display.
> 
> Yes, here X11 forwarding is provided by SSH. The forwarding of X11 is not 
> built into SGE.
> 
> 
> > Of course, with multiple users and multiple grid servers, I have no idea 
> > what a particular qsub command's DISPLAY should be set to.
> I must be missing something because I'm sure others have already solved this 
> issue.
> 
> Inside the wrapper it should always by something like localhost:10 with a 
> varying number. This is set by the login via SSH. Hence I'm
> not sure what you are looking for to be set.

[Mun]  Bottom line is I need for the GUI application to open in the user's VNC 
session.  It seems that unless I set DISPLAY to the "appropriate" localhost:# 
from within the wrapper script which makes the qsub call, that I cannot 
accomplish that goal.  Therefore, I need some way of setting the DISPLAY env 
var correctly.  Unless there is some other way for me to accomplish my goal?

Regards,

-- 
Mun


> Maybe you want to define in SGE to always use SSH -X?
> 
> https://arc.liv.ac.uk/SGE/htmlman/htmlman5/remote_startup.html
> 
> -- Reuti
> 
> 
> >
> > Please advise.
> >
> > Thank you and regards,
> >
> > --
> > Mun
> > ___
> > users mailing list
> > users@gridengine.org
> > https://gridengine.org/mailman/listinfo/users


___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


[gridengine users] How to export an X11 back to the client?

2020-05-01 Thread Mun Johl
Hi,

I am using SGE on RHEL6.  I am trying to launch a qsub job (a TCL script) via 
grid that will result in a GUI application being opened on the caller's display 
(which is a VNC session).

What I'm seeing is that if I set DISPLAY to the actual VNC display (e.g. 
host1:4) in the wrapper script that invokes qsub, the GUI application complains 
that it cannot make a connection.  On a side note, I noticed that when I use 
ssh -X to login to one of our grid servers, my DISPLAY is set to something like 
localhost:10 .  Now, if I use localhost:10 (for example) in my grid wrapper 
script, the GUI application _will_ open on my VNC display.

Of course, with multiple users and multiple grid servers, I have no idea what a 
particular qsub command's DISPLAY should be set to.  I must be missing 
something because I'm sure others have already solved this issue.

Please advise.

Thank you and regards,

--
Mun
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] What is the easiest/best way to update our servers' domain name?

2019-10-29 Thread Mun Johl
Hi Hugh,

Thank you for your reply.
See my comments below.

What’s the output of ‘qconf -sq long.q’? Are you sure it doesn’t still 
reference the old hostname, maybe within a hostgroup?

[Mun] I did check the queue after updating the queues and host groups and 
‘qconf -sq long.q’ looks good.  It has the new domain name listed.  I’ve 
checked the host groups via ‘qconf -shgrp @name’ and each of them also lists 
the new domain name.

A new finding is if I try to *add* a host with the new FQDN, SGE says it 
already exists even though ‘qconf -sh’ doesn’t show the host with the new 
domain name--it still shows the old domain name for the particular host.

Regards,

--
Mun


-Hugh

From: users-boun...@gridengine.org<mailto:users-boun...@gridengine.org> 
mailto:users-boun...@gridengine.org>> On Behalf 
Of Mun Johl
Sent: Tuesday, October 29, 2019 1:38 PM
To: dpo...@gmail.com<mailto:dpo...@gmail.com>
Cc: users@gridengine.org<mailto:users@gridengine.org>
Subject: Re: [gridengine users] What is the easiest/best way to update our 
servers' domain name?

Hi folks,

SGE is broken for me.  IT went through and updated the domain name for our 
hosts; but now I can’t seem to get SGE to update anything.  If I use “qconf -me 
” and update the domain name, SGE puts it right back to the old 
domain name.

I was able to get the queues and Host Groups updated with the new domain name, 
but that’s it.  If I try to delete a host, I get the following error:

Host object "knsim8" is still referenced in cluster queue "long.q".

However, like I said, I _was_ able to get the queues updated to the new domain 
name so I don’t know why I would be getting the aforementioned error; and I did 
stop/start the master daemon after doing updating the queues just in case it 
was necessary.

I’m at a loss as to what to try next in order to salvage our SGE installation.  
Any suggestions would be welcomed.

Regards,

--
Mun


From: Mun Johl mailto:mun.j...@wdc.com>>
Sent: Monday, October 28, 2019 5:17 PM
To: dpo...@gmail.com<mailto:dpo...@gmail.com>
Cc: Skylar Thompson mailto:skyl...@uw.edu>>; 
users@gridengine.org<mailto:users@gridengine.org>
Subject: RE: [gridengine users] What is the easiest/best way to update our 
servers' domain name?

Hi Daniel,

Thank you for your feedback.  I am kind of thinking of staying with the FQDN at 
this point since that technique has been working well for us.

Regards,

--
Mun


From: Daniel Povey mailto:dpo...@gmail.com>>
Sent: Monday, October 28, 2019 3:24 PM
To: Mun Johl mailto:mun.j...@wdc.com>>
Cc: Skylar Thompson mailto:skyl...@uw.edu>>; 
users@gridengine.org<mailto:users@gridengine.org>
Subject: Re: [gridengine users] What is the easiest/best way to update our 
servers' domain name?

CAUTION: This email originated from outside of Western Digital. Do not click on 
links or open attachments unless you recognize the sender and know that the 
content is safe.

I always use the FQDN.  I recall running into problems with SunRPC if not... 
there may be ways to get around that, e.g. have each host announce it's raw 
hostname as its FQDN, but it might not be compatible with the hosts having 
normal network access.
I forget what specific mechanism SunRPC uses to find the hostname.

On Mon, Oct 28, 2019 at 2:18 PM Mun Johl 
mailto:mun.j...@wdc.com>> wrote:
Hi all,

I do have a follow-up question: When I am specifying hostnames for the 
execution hosts, admin hosts, etc.; do I need to use the FQDN?  Or can I simply 
use the hostname in order for grid to operate correctly?  That is, do I have to 
use hostname.domain.com<http://hostname.domain.com> (as I am currently doing).  
Or is it sufficient to simply use “hostname”?

Regards,

--
Mun


From: Mun Johl mailto:mun.j...@wdc.com>>
Sent: Friday, October 25, 2019 5:42 PM
To: dpo...@gmail.com<mailto:dpo...@gmail.com>
Cc: Skylar Thompson mailto:skyl...@uw.edu>>; 
users@gridengine.org<mailto:users@gridengine.org>
Subject: RE: [gridengine users] What is the easiest/best way to update our 
servers' domain name?

Hi Daniel,

Thank you for your reply.

From: Daniel Povey mailto:dpo...@gmail.com>>
You may have to write a script to do that, but it could be something like

for exechost in $(qconf -sel); do
   qconf -se $exechost  | sed s/old_domain_name/new_domain_name/ > tmp
   qconf -de $exechost
   qconf -Ae tmp
done

but you might need to tweak that to get it to work, e.g. get rid of load_values 
from the tmp file.

[Mun] Understood.  Since we have a fairly small set of servers currently, I may 
just update them by hand via “qconf -me ”; and then address the 
queues via “qconf -mq ”.  Oh, and I just noticed I can modify hostgroups 
via “qconf -mhgrp @name”.

After that I can re-start the daemons and I “should” be good to go, right?

Thanks again Daniel.

Best regards,

--
Mun


On Fri, Oct 25, 2019 at 5:24 PM Mun Johl 
mailto:mun.j...@w

Re: [gridengine users] What is the easiest/best way to update our servers' domain name?

2019-10-29 Thread Mun Johl
Hi folks,

SGE is broken for me.  IT went through and updated the domain name for our 
hosts; but now I can’t seem to get SGE to update anything.  If I use “qconf -me 
” and update the domain name, SGE puts it right back to the old 
domain name.

I was able to get the queues and Host Groups updated with the new domain name, 
but that’s it.  If I try to delete a host, I get the following error:

Host object "knsim8" is still referenced in cluster queue "long.q".

However, like I said, I _was_ able to get the queues updated to the new domain 
name so I don’t know why I would be getting the aforementioned error; and I did 
stop/start the master daemon after doing updating the queues just in case it 
was necessary.

I’m at a loss as to what to try next in order to salvage our SGE installation.  
Any suggestions would be welcomed.

Regards,

--
Mun


From: Mun Johl 
Sent: Monday, October 28, 2019 5:17 PM
To: dpo...@gmail.com
Cc: Skylar Thompson ; users@gridengine.org
Subject: RE: [gridengine users] What is the easiest/best way to update our 
servers' domain name?

Hi Daniel,

Thank you for your feedback.  I am kind of thinking of staying with the FQDN at 
this point since that technique has been working well for us.

Regards,

--
Mun


From: Daniel Povey mailto:dpo...@gmail.com>>
Sent: Monday, October 28, 2019 3:24 PM
To: Mun Johl mailto:mun.j...@wdc.com>>
Cc: Skylar Thompson mailto:skyl...@uw.edu>>; 
users@gridengine.org<mailto:users@gridengine.org>
Subject: Re: [gridengine users] What is the easiest/best way to update our 
servers' domain name?

CAUTION: This email originated from outside of Western Digital. Do not click on 
links or open attachments unless you recognize the sender and know that the 
content is safe.

I always use the FQDN.  I recall running into problems with SunRPC if not... 
there may be ways to get around that, e.g. have each host announce it's raw 
hostname as its FQDN, but it might not be compatible with the hosts having 
normal network access.
I forget what specific mechanism SunRPC uses to find the hostname.

On Mon, Oct 28, 2019 at 2:18 PM Mun Johl 
mailto:mun.j...@wdc.com>> wrote:
Hi all,

I do have a follow-up question: When I am specifying hostnames for the 
execution hosts, admin hosts, etc.; do I need to use the FQDN?  Or can I simply 
use the hostname in order for grid to operate correctly?  That is, do I have to 
use hostname.domain.com<http://hostname.domain.com> (as I am currently doing).  
Or is it sufficient to simply use “hostname”?

Regards,

--
Mun


From: Mun Johl mailto:mun.j...@wdc.com>>
Sent: Friday, October 25, 2019 5:42 PM
To: dpo...@gmail.com<mailto:dpo...@gmail.com>
Cc: Skylar Thompson mailto:skyl...@uw.edu>>; 
users@gridengine.org<mailto:users@gridengine.org>
Subject: RE: [gridengine users] What is the easiest/best way to update our 
servers' domain name?

Hi Daniel,

Thank you for your reply.

From: Daniel Povey mailto:dpo...@gmail.com>>
You may have to write a script to do that, but it could be something like

for exechost in $(qconf -sel); do
   qconf -se $exechost  | sed s/old_domain_name/new_domain_name/ > tmp
   qconf -de $exechost
   qconf -Ae tmp
done

but you might need to tweak that to get it to work, e.g. get rid of load_values 
from the tmp file.

[Mun] Understood.  Since we have a fairly small set of servers currently, I may 
just update them by hand via “qconf -me ”; and then address the 
queues via “qconf -mq ”.  Oh, and I just noticed I can modify hostgroups 
via “qconf -mhgrp @name”.

After that I can re-start the daemons and I “should” be good to go, right?

Thanks again Daniel.

Best regards,

--
Mun


On Fri, Oct 25, 2019 at 5:24 PM Mun Johl 
mailto:mun.j...@wdc.com>> wrote:
Hi Daniel and Skylar,

Thank you for your replies.

> -Original Message-
> I think it might depend on the setting of ignore_fqdn in the bootstrap file
> (can't remember if this just tunes load reporting or also things like which
> qmaster the execd's talk to). I wouldn't count on it working, though, and
> agree with Daniel that you probably want to plan on an outage.

[Mun] An outage is acceptable; but I'm not sure what is the best/easiest 
approach to take in order to change the domain names within SGE for all of the 
servers as well as update the hostgroups and queues.  I mean, I know I can 
delete the hosts and add them back in; and the same for the queue 
specifications, etc.  However, I'm not sure if that is an adequate solution or 
one that will cause problems for me.  I'm also not sure if that is the best 
approach to take for this task.

Thanks,

--
Mun


>
> On Fri, Oct 25, 2019 at 04:12:11PM -0700, Daniel Povey wrote:
> > IIRC, GridEngine is very picky about machines having a consistent
> > hostname, e.g. that what hostname they think they have matches with
> > how they were addressed.  I think this is because of SunRP

Re: [gridengine users] What is the easiest/best way to update our servers' domain name?

2019-10-28 Thread Mun Johl
Hi Daniel,

Thank you for your feedback.  I am kind of thinking of staying with the FQDN at 
this point since that technique has been working well for us.

Regards,

--
Mun


From: Daniel Povey 
Sent: Monday, October 28, 2019 3:24 PM
To: Mun Johl 
Cc: Skylar Thompson ; users@gridengine.org
Subject: Re: [gridengine users] What is the easiest/best way to update our 
servers' domain name?

CAUTION: This email originated from outside of Western Digital. Do not click on 
links or open attachments unless you recognize the sender and know that the 
content is safe.

I always use the FQDN.  I recall running into problems with SunRPC if not... 
there may be ways to get around that, e.g. have each host announce it's raw 
hostname as its FQDN, but it might not be compatible with the hosts having 
normal network access.
I forget what specific mechanism SunRPC uses to find the hostname.

On Mon, Oct 28, 2019 at 2:18 PM Mun Johl 
mailto:mun.j...@wdc.com>> wrote:
Hi all,

I do have a follow-up question: When I am specifying hostnames for the 
execution hosts, admin hosts, etc.; do I need to use the FQDN?  Or can I simply 
use the hostname in order for grid to operate correctly?  That is, do I have to 
use hostname.domain.com<http://hostname.domain.com> (as I am currently doing).  
Or is it sufficient to simply use “hostname”?

Regards,

--
Mun


From: Mun Johl mailto:mun.j...@wdc.com>>
Sent: Friday, October 25, 2019 5:42 PM
To: dpo...@gmail.com<mailto:dpo...@gmail.com>
Cc: Skylar Thompson mailto:skyl...@uw.edu>>; 
users@gridengine.org<mailto:users@gridengine.org>
Subject: RE: [gridengine users] What is the easiest/best way to update our 
servers' domain name?

Hi Daniel,

Thank you for your reply.

From: Daniel Povey mailto:dpo...@gmail.com>>
You may have to write a script to do that, but it could be something like

for exechost in $(qconf -sel); do
   qconf -se $exechost  | sed s/old_domain_name/new_domain_name/ > tmp
   qconf -de $exechost
   qconf -Ae tmp
done

but you might need to tweak that to get it to work, e.g. get rid of load_values 
from the tmp file.

[Mun] Understood.  Since we have a fairly small set of servers currently, I may 
just update them by hand via “qconf -me ”; and then address the 
queues via “qconf -mq ”.  Oh, and I just noticed I can modify hostgroups 
via “qconf -mhgrp @name”.

After that I can re-start the daemons and I “should” be good to go, right?

Thanks again Daniel.

Best regards,

--
Mun


On Fri, Oct 25, 2019 at 5:24 PM Mun Johl 
mailto:mun.j...@wdc.com>> wrote:
Hi Daniel and Skylar,

Thank you for your replies.

> -Original Message-
> I think it might depend on the setting of ignore_fqdn in the bootstrap file
> (can't remember if this just tunes load reporting or also things like which
> qmaster the execd's talk to). I wouldn't count on it working, though, and
> agree with Daniel that you probably want to plan on an outage.

[Mun] An outage is acceptable; but I'm not sure what is the best/easiest 
approach to take in order to change the domain names within SGE for all of the 
servers as well as update the hostgroups and queues.  I mean, I know I can 
delete the hosts and add them back in; and the same for the queue 
specifications, etc.  However, I'm not sure if that is an adequate solution or 
one that will cause problems for me.  I'm also not sure if that is the best 
approach to take for this task.

Thanks,

--
Mun


>
> On Fri, Oct 25, 2019 at 04:12:11PM -0700, Daniel Povey wrote:
> > IIRC, GridEngine is very picky about machines having a consistent
> > hostname, e.g. that what hostname they think they have matches with
> > how they were addressed.  I think this is because of SunRPC.  I think
> > it may be hard to do what you want without an interruption  of some kind.
> But I may be wrong.
> >
> > On Fri, Oct 25, 2019 at 3:37 PM Mun Johl 
> > mailto:mun.j...@wdc.com>> wrote:
> >
> > > Hi,
> > >
> > >
> > >
> > > I need to update the domain names of our SGE servers.  What is the
> > > easiest way to do that?  Can I simply update the domain name somehow
> > > and have that propagate to hostgroupgs, queue specifications, etc.?
> > >
> > >
> > >
> > > Or do I have to delete the current hosts and add the new ones?
> > > Which I think also implies setting up the hostgroups and queues
> > > again as well for our implementation.
> > >
> > >
> > >
> > > Best regards,
> > >
> > >
> > >
> > > --
> > >
> > > Mun
> > > ___
> > > users mailing list
> > > users@gridengine.org<mailto:users@gridengine.org>
> > > https://gridengine.org/mailman/listinfo/users
> >

Re: [gridengine users] What is the easiest/best way to update our servers' domain name?

2019-10-28 Thread Mun Johl
Hi Reuti,

Thank you for  your quick response!

> -Original Message-
> Am 28.10.2019 um 22:18 schrieb Mun Johl:
> 
> > Hi all,
> >
> > I do have a follow-up question: When I am specifying hostnames for the
> execution hosts, admin hosts, etc.; do I need to use the FQDN?  Or can I
> simply use the hostname in order for grid to operate correctly?  That is, do I
> have to usehostname.domain.com (as I am currently doing).  Or is it
> sufficient to simply use "hostname"?
> 
> It's sufficient to use hostnames. The queue names then get shorter in `qstat`
> too:
> 
> queue
> -
> common@node25
> common@node29
> ramdisk@node19
> common@node27
> common@node23
> common@node23
> common@node28

[Mun] That would be a nice bonus, actually.  However, I tried modifying one of 
our execution hosts via "qconf -me ".  And although SGE reported the 
host hostname was modified in the exechost list, if I run the qconf command 
again it still show the FQDN for the host.  It's as if the hostname gets 
reverted after my change.

Does that seem normal?  Should I be doing something else to remove the domain 
names from the hosts?

Thank you and regards,

-- 
Mun


> 
> -- Reuti
> 
> 
> >
> > Regards,
> >
> > --
> > Mun
> >
> >
> > From: Mun Johl 
> > Sent: Friday, October 25, 2019 5:42 PM
> > To: dpo...@gmail.com
> > Cc: Skylar Thompson ; users@gridengine.org
> > Subject: RE: [gridengine users] What is the easiest/best way to update our
> servers' domain name?
> >
> > Hi Daniel,
> >
> > Thank you for your reply.
> >
> > From: Daniel Povey 
> >
> > You may have to write a script to do that, but it could be something
> > like
> >
> > for exechost in $(qconf -sel); do
> >qconf -se $exechost  | sed s/old_domain_name/new_domain_name/ >
> tmp
> >qconf -de $exechost
> >qconf -Ae tmp
> > done
> >
> > but you might need to tweak that to get it to work, e.g. get rid of
> load_values from the tmp file.
> >
> > [Mun] Understood.  Since we have a fairly small set of servers currently, I
> may just update them by hand via "qconf -me "; and then
> address the queues via "qconf -mq ".  Oh, and I just noticed I can
> modify hostgroups via "qconf -mhgrp @name".
> >
> > After that I can re-start the daemons and I "should" be good to go, right?
> >
> > Thanks again Daniel.
> >
> > Best regards,
> >
> > --
> > Mun
> >
> >
> > On Fri, Oct 25, 2019 at 5:24 PM Mun Johl  wrote:
> > Hi Daniel and Skylar,
> >
> > Thank you for your replies.
> >
> > > -Original Message-
> > > I think it might depend on the setting of ignore_fqdn in the
> > > bootstrap file (can't remember if this just tunes load reporting or
> > > also things like which qmaster the execd's talk to). I wouldn't
> > > count on it working, though, and agree with Daniel that you probably
> want to plan on an outage.
> >
> > [Mun] An outage is acceptable; but I'm not sure what is the best/easiest
> approach to take in order to change the domain names within SGE for all of
> the servers as well as update the hostgroups and queues.  I mean, I know I
> can delete the hosts and add them back in; and the same for the queue
> specifications, etc.  However, I'm not sure if that is an adequate solution or
> one that will cause problems for me.  I'm also not sure if that is the best
> approach to take for this task.
> >
> > Thanks,
> >
> > --
> > Mun
> >
> >
> > >
> > > On Fri, Oct 25, 2019 at 04:12:11PM -0700, Daniel Povey wrote:
> > > > IIRC, GridEngine is very picky about machines having a consistent
> > > > hostname, e.g. that what hostname they think they have matches
> > > > with how they were addressed.  I think this is because of SunRPC.
> > > > I think it may be hard to do what you want without an interruption  of
> some kind.
> > > But I may be wrong.
> > > >
> > > > On Fri, Oct 25, 2019 at 3:37 PM Mun Johl  wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > >
> > > > >
> > > > > I need to update the domain names of our SGE servers.  What is
> > > > > the easiest way to do that?  Can I simply update the domain name
> > > > > somehow and have that propagate to hostgroupgs, queue
> specifications, etc.?
> > > > >
> > > > >
> > > > &g

Re: [gridengine users] What is the easiest/best way to update our servers' domain name?

2019-10-28 Thread Mun Johl
Hi all,

I do have a follow-up question: When I am specifying hostnames for the 
execution hosts, admin hosts, etc.; do I need to use the FQDN?  Or can I simply 
use the hostname in order for grid to operate correctly?  That is, do I have to 
use hostname.domain.com (as I am currently doing).  Or is it sufficient to 
simply use “hostname”?

Regards,

--
Mun


From: Mun Johl 
Sent: Friday, October 25, 2019 5:42 PM
To: dpo...@gmail.com
Cc: Skylar Thompson ; users@gridengine.org
Subject: RE: [gridengine users] What is the easiest/best way to update our 
servers' domain name?

Hi Daniel,

Thank you for your reply.

From: Daniel Povey mailto:dpo...@gmail.com>>
You may have to write a script to do that, but it could be something like

for exechost in $(qconf -sel); do
   qconf -se $exechost  | sed s/old_domain_name/new_domain_name/ > tmp
   qconf -de $exechost
   qconf -Ae tmp
done

but you might need to tweak that to get it to work, e.g. get rid of load_values 
from the tmp file.

[Mun] Understood.  Since we have a fairly small set of servers currently, I may 
just update them by hand via “qconf -me ”; and then address the 
queues via “qconf -mq ”.  Oh, and I just noticed I can modify hostgroups 
via “qconf -mhgrp @name”.

After that I can re-start the daemons and I “should” be good to go, right?

Thanks again Daniel.

Best regards,

--
Mun


On Fri, Oct 25, 2019 at 5:24 PM Mun Johl 
mailto:mun.j...@wdc.com>> wrote:
Hi Daniel and Skylar,

Thank you for your replies.

> -Original Message-
> I think it might depend on the setting of ignore_fqdn in the bootstrap file
> (can't remember if this just tunes load reporting or also things like which
> qmaster the execd's talk to). I wouldn't count on it working, though, and
> agree with Daniel that you probably want to plan on an outage.

[Mun] An outage is acceptable; but I'm not sure what is the best/easiest 
approach to take in order to change the domain names within SGE for all of the 
servers as well as update the hostgroups and queues.  I mean, I know I can 
delete the hosts and add them back in; and the same for the queue 
specifications, etc.  However, I'm not sure if that is an adequate solution or 
one that will cause problems for me.  I'm also not sure if that is the best 
approach to take for this task.

Thanks,

--
Mun


>
> On Fri, Oct 25, 2019 at 04:12:11PM -0700, Daniel Povey wrote:
> > IIRC, GridEngine is very picky about machines having a consistent
> > hostname, e.g. that what hostname they think they have matches with
> > how they were addressed.  I think this is because of SunRPC.  I think
> > it may be hard to do what you want without an interruption  of some kind.
> But I may be wrong.
> >
> > On Fri, Oct 25, 2019 at 3:37 PM Mun Johl 
> > mailto:mun.j...@wdc.com>> wrote:
> >
> > > Hi,
> > >
> > >
> > >
> > > I need to update the domain names of our SGE servers.  What is the
> > > easiest way to do that?  Can I simply update the domain name somehow
> > > and have that propagate to hostgroupgs, queue specifications, etc.?
> > >
> > >
> > >
> > > Or do I have to delete the current hosts and add the new ones?
> > > Which I think also implies setting up the hostgroups and queues
> > > again as well for our implementation.
> > >
> > >
> > >
> > > Best regards,
> > >
> > >
> > >
> > > --
> > >
> > > Mun
> > > ___
> > > users mailing list
> > > users@gridengine.org<mailto:users@gridengine.org>
> > > https://gridengine.org/mailman/listinfo/users
> > >
>
> > ___
> > users mailing list
> > users@gridengine.org<mailto:users@gridengine.org>
> > https://gridengine.org/mailman/listinfo/users
>
>
> --
> -- Skylar Thompson (skyl...@u.washington.edu<mailto:skyl...@u.washington.edu>)
> -- Genome Sciences Department, System Administrator
> -- Foege Building S046, (206)-685-7354
> -- University of Washington School of Medicine
> ___
> users mailing list
> users@gridengine.org<mailto:users@gridengine.org>
> https://gridengine.org/mailman/listinfo/users

___
users mailing list
users@gridengine.org<mailto:users@gridengine.org>
https://gridengine.org/mailman/listinfo/users
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] What is the easiest/best way to update our servers' domain name?

2019-10-25 Thread Mun Johl
Hi Daniel,

Thank you for your reply.

From: Daniel Povey 

You may have to write a script to do that, but it could be something like

for exechost in $(qconf -sel); do
   qconf -se $exechost  | sed s/old_domain_name/new_domain_name/ > tmp
   qconf -de $exechost
   qconf -Ae tmp
done

but you might need to tweak that to get it to work, e.g. get rid of load_values 
from the tmp file.

[Mun] Understood.  Since we have a fairly small set of servers currently, I may 
just update them by hand via “qconf -me ”; and then address the 
queues via “qconf -mq ”.  Oh, and I just noticed I can modify hostgroups 
via “qconf -mhgrp @name”.

After that I can re-start the daemons and I “should” be good to go, right?

Thanks again Daniel.

Best regards,

--
Mun


On Fri, Oct 25, 2019 at 5:24 PM Mun Johl 
mailto:mun.j...@wdc.com>> wrote:
Hi Daniel and Skylar,

Thank you for your replies.

> -Original Message-
> I think it might depend on the setting of ignore_fqdn in the bootstrap file
> (can't remember if this just tunes load reporting or also things like which
> qmaster the execd's talk to). I wouldn't count on it working, though, and
> agree with Daniel that you probably want to plan on an outage.

[Mun] An outage is acceptable; but I'm not sure what is the best/easiest 
approach to take in order to change the domain names within SGE for all of the 
servers as well as update the hostgroups and queues.  I mean, I know I can 
delete the hosts and add them back in; and the same for the queue 
specifications, etc.  However, I'm not sure if that is an adequate solution or 
one that will cause problems for me.  I'm also not sure if that is the best 
approach to take for this task.

Thanks,

--
Mun


>
> On Fri, Oct 25, 2019 at 04:12:11PM -0700, Daniel Povey wrote:
> > IIRC, GridEngine is very picky about machines having a consistent
> > hostname, e.g. that what hostname they think they have matches with
> > how they were addressed.  I think this is because of SunRPC.  I think
> > it may be hard to do what you want without an interruption  of some kind.
> But I may be wrong.
> >
> > On Fri, Oct 25, 2019 at 3:37 PM Mun Johl 
> > mailto:mun.j...@wdc.com>> wrote:
> >
> > > Hi,
> > >
> > >
> > >
> > > I need to update the domain names of our SGE servers.  What is the
> > > easiest way to do that?  Can I simply update the domain name somehow
> > > and have that propagate to hostgroupgs, queue specifications, etc.?
> > >
> > >
> > >
> > > Or do I have to delete the current hosts and add the new ones?
> > > Which I think also implies setting up the hostgroups and queues
> > > again as well for our implementation.
> > >
> > >
> > >
> > > Best regards,
> > >
> > >
> > >
> > > --
> > >
> > > Mun
> > > ___
> > > users mailing list
> > > users@gridengine.org<mailto:users@gridengine.org>
> > > https://gridengine.org/mailman/listinfo/users
> > >
>
> > ___
> > users mailing list
> > users@gridengine.org<mailto:users@gridengine.org>
> > https://gridengine.org/mailman/listinfo/users
>
>
> --
> -- Skylar Thompson (skyl...@u.washington.edu<mailto:skyl...@u.washington.edu>)
> -- Genome Sciences Department, System Administrator
> -- Foege Building S046, (206)-685-7354
> -- University of Washington School of Medicine
> ___
> users mailing list
> users@gridengine.org<mailto:users@gridengine.org>
> https://gridengine.org/mailman/listinfo/users

___
users mailing list
users@gridengine.org<mailto:users@gridengine.org>
https://gridengine.org/mailman/listinfo/users
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] What is the easiest/best way to update our servers' domain name?

2019-10-25 Thread Mun Johl
Hi Daniel and Skylar,

Thank you for your replies.

> -Original Message-
> I think it might depend on the setting of ignore_fqdn in the bootstrap file
> (can't remember if this just tunes load reporting or also things like which
> qmaster the execd's talk to). I wouldn't count on it working, though, and
> agree with Daniel that you probably want to plan on an outage.

[Mun] An outage is acceptable; but I'm not sure what is the best/easiest 
approach to take in order to change the domain names within SGE for all of the 
servers as well as update the hostgroups and queues.  I mean, I know I can 
delete the hosts and add them back in; and the same for the queue 
specifications, etc.  However, I'm not sure if that is an adequate solution or 
one that will cause problems for me.  I'm also not sure if that is the best 
approach to take for this task.

Thanks,

-- 
Mun


> 
> On Fri, Oct 25, 2019 at 04:12:11PM -0700, Daniel Povey wrote:
> > IIRC, GridEngine is very picky about machines having a consistent
> > hostname, e.g. that what hostname they think they have matches with
> > how they were addressed.  I think this is because of SunRPC.  I think
> > it may be hard to do what you want without an interruption  of some kind.
> But I may be wrong.
> >
> > On Fri, Oct 25, 2019 at 3:37 PM Mun Johl  wrote:
> >
> > > Hi,
> > >
> > >
> > >
> > > I need to update the domain names of our SGE servers.  What is the
> > > easiest way to do that?  Can I simply update the domain name somehow
> > > and have that propagate to hostgroupgs, queue specifications, etc.?
> > >
> > >
> > >
> > > Or do I have to delete the current hosts and add the new ones?
> > > Which I think also implies setting up the hostgroups and queues
> > > again as well for our implementation.
> > >
> > >
> > >
> > > Best regards,
> > >
> > >
> > >
> > > --
> > >
> > > Mun
> > > ___
> > > users mailing list
> > > users@gridengine.org
> > > https://gridengine.org/mailman/listinfo/users
> > >
> 
> > ___
> > users mailing list
> > users@gridengine.org
> > https://gridengine.org/mailman/listinfo/users
> 
> 
> --
> -- Skylar Thompson (skyl...@u.washington.edu)
> -- Genome Sciences Department, System Administrator
> -- Foege Building S046, (206)-685-7354
> -- University of Washington School of Medicine
> ___
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] What is the easiest/best way to update our servers' domain name?

2019-10-25 Thread Mun Johl
Hi Reuti,

Thank you for your reply.
Please see my inline comments below.

> -Original Message-
> Hi,
> 
> Am 26.10.2019 um 00:37 schrieb Mun Johl:
> 
> > I need to update the domain names of our SGE servers.  What is the easiest
> way to do that?  Can I simply update the domain name somehow and have
> that propagate to hostgroupgs, queue specifications, etc.?
> >
> > Or do I have to delete the current hosts and add the new ones?  Which I
> think also implies setting up the hostgroups and queues again as well for our
> implementation.
> >
> 
> Are all machines on a single network and use the FQDN? 

[Mun] Yes.

> And/or have the
> qmaster machines two network interfaces and only the external name
> changes, while the internal ones stay the same?

[Mun] No.  It's the former: single network and use the FQDN.

Best regards,

-- 
Mun

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


[gridengine users] What is the easiest/best way to update our servers' domain name?

2019-10-25 Thread Mun Johl
Hi,

I need to update the domain names of our SGE servers.  What is the easiest way 
to do that?  Can I simply update the domain name somehow and have that 
propagate to hostgroupgs, queue specifications, etc.?

Or do I have to delete the current hosts and add the new ones?  Which I think 
also implies setting up the hostgroups and queues again as well for our 
implementation.

Best regards,

--
Mun
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] I need a decoder ring for the qacct output

2019-04-25 Thread Mun Johl
Hi Reuti and Skylar,

Sorry for misspelling your name last time, Skylar.

On Thu, Apr 25, 2019 at 08:56 AM PDT, Reuti wrote:
> > Am 25.04.2019 um 17:41 schrieb Mun Johl :
> >
> > Hi Skyler, Reuti,
> >
> > Thank you for your reply.
> > Please see my comments below.
> >
> > On Thu, Apr 25, 2019 at 08:03 AM PDT, Reuti wrote:
> >> Hi,
> >>
> >>> Am 25.04.2019 um 16:53 schrieb Mun Johl :
> >>>
> >>> Hi,
> >>>
> >>> I'm using 'qacct -P' in the hope of tracking metrics on a per project
> >>> basis.  I am getting data out of qacct, however I don't fully comprehend
> >>> what the data is trying to tell me.
> >>>
> >>> I've searched the man pages and web for definitions of the output of
> >>> qacct, but I have not been able to find a complete reference (just bits
> >>> and pieces here and there).
> >>>
> >>> Can anyone point me to a complete reference so that I can better
> >>> understand the output of qacct?
> >>
> >> There is a man page about it:
> >>
> >> man accounting
> >
> > Well, I _did_ look at that prior to posting but I guess I just didn't
> > see the keywords I was looking for.  So maybe I'll just ask the specific
> > questions regarding my confusion.
> >
> > WALLCLOCK is pretty well defined by ru_wallclock.  So that's basically
> > the total wall clock time the job was on the execution host.
> >
> > UTIME is user time used.
> > STIME is system time used.
> >
> > Should (UTIME + STIME) >= WALLCLOCK?  It isn't in my case and is mainly
> > why I am confused.  Or perhaps process wait time is not included?
> 
> You mean in case of a parallel application? You set "accounting_summary" to 
> "true" and get only a single record back?
> 
> This depends how the used CPU time is acquired by the OS (and whether all 
> created processes are taken into account, even if they jump out of the 
> process tree [like with `setsid`]). More reliable is the CPU time collected 
> by SGE by the additional group ID.

Actually, we aren't running a parallel application yet.

I think the answers you two have provided satisfies my confusion.  I
mainly just need to know the wallclock time spent per project.

Thank you again for your informative and quick replies.

Regards,

-- 
Mun

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


[gridengine users] I need a decoder ring for the qacct output

2019-04-25 Thread Mun Johl
Hi,

I'm using 'qacct -P' in the hope of tracking metrics on a per project
basis.  I am getting data out of qacct, however I don't fully comprehend
what the data is trying to tell me.

I've searched the man pages and web for definitions of the output of
qacct, but I have not been able to find a complete reference (just bits
and pieces here and there).

Can anyone point me to a complete reference so that I can better
understand the output of qacct?

Thank you,

-- 
Mun

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Best way to restrict a user to a specific exec host?

2019-04-09 Thread Mun Johl
Hi Reuti,

One clarification question below ...

On Tue, Apr 09, 2019 at 09:05 AM PDT, Reuti wrote:
> > Am 09.04.2019 um 17:43 schrieb Mun Johl :
> >
> > Hi Reuti,
> >
> > Thank you for your reply!
> > Please see my comments below.
> >
> > On Mon, Apr 08, 2019 at 10:27 PM PDT, Reuti wrote:
> >> Hi,
> >>
> >>> Am 09.04.2019 um 05:37 schrieb Mun Johl :
> >>>
> >>> Hi all,
> >>>
> >>> My company is hiring a contractor for some development work.  As such, I
> >>> need to modify our grid configuration so that he only has access to a
> >>> single execution host.  That particular host (let's call it serverA)
> >>> will not have all of our data disks mounted.
> >>>
> >>> NOTE: We are running SGE v8.1.9 on systems running Red Hat Enterprise 
> >>> Linux v6.8 .
> >>>
> >>> I'm not really sure how to proceed.  I'm thinking of perhaps creating a
> >>> new queue which only resides on serverA.
> >>
> >> There is no need for an additional queue. You can add him to the 
> >> xuser_lists of all oher queues. But a special queue with a limited number 
> >> of slots might give the contractor more priority to check his develoment 
> >> faster. Depends on personal taste whether this one is preferred. This 
> >> queue could have a forced complex with a high urgency, which he always 
> >> have to request (or you use JSV to add this to his job submissions).
> >
> > How would I proceed if I did not create an additional queue?  You have
> > me intrigued.  That is, if I add him to the xuser_lists of all queues,
> > he wouldn't be able to submit a job, would he?  Perhaps I'm confused.
> 
> All entries in the (cluster) queue definition allow a list of different 
> characteristics (similar to David's setup in the recent post):
> 
> $ qconf -sq all.q
> ...
> user_lists   NONE,[development_machine=banned_users]
> xuser_lists   NONE,[@ordinary_hosts=banned_users]

I created a host group of servers only accessible by employees (not the
contractor).  And then I created an ACL named "contractors" which
contains the contractor's username.

So if I want to forbid the "contractors" from accessing the @EmpOnly
servers on a given queue, would I simply modify the following
xuser_lists line in the queue file as shown below?

xuser_lists   NONE,[@EmpOnly=contractors]

Best regards,

-- 
Mun

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Best way to restrict a user to a specific exec host?

2019-04-09 Thread Mun Johl
Hi Reuti,

On Tue, Apr 09, 2019 at 09:05 AM PDT, Reuti wrote:
> > Am 09.04.2019 um 17:43 schrieb Mun Johl :
> >
> > Hi Reuti,
> >
> > Thank you for your reply!
> > Please see my comments below.
> >
> > On Mon, Apr 08, 2019 at 10:27 PM PDT, Reuti wrote:
> >> Hi,
> >>
> >>> Am 09.04.2019 um 05:37 schrieb Mun Johl :
> >>>
> >>> Hi all,
> >>>
> >>> My company is hiring a contractor for some development work.  As such, I
> >>> need to modify our grid configuration so that he only has access to a
> >>> single execution host.  That particular host (let's call it serverA)
> >>> will not have all of our data disks mounted.
> >>>
> >>> NOTE: We are running SGE v8.1.9 on systems running Red Hat Enterprise 
> >>> Linux v6.8 .
> >>>
> >>> I'm not really sure how to proceed.  I'm thinking of perhaps creating a
> >>> new queue which only resides on serverA.
> >>
> >> There is no need for an additional queue. You can add him to the 
> >> xuser_lists of all oher queues. But a special queue with a limited number 
> >> of slots might give the contractor more priority to check his develoment 
> >> faster. Depends on personal taste whether this one is preferred. This 
> >> queue could have a forced complex with a high urgency, which he always 
> >> have to request (or you use JSV to add this to his job submissions).
> >
> > How would I proceed if I did not create an additional queue?  You have
> > me intrigued.  That is, if I add him to the xuser_lists of all queues,
> > he wouldn't be able to submit a job, would he?  Perhaps I'm confused.
> 
> All entries in the (cluster) queue definition allow a list of different 
> characteristics (similar to David's setup in the recent post):
> 
> $ qconf -sq all.q
> ...
> user_lists   NONE,[development_machine=banned_users]
> xuser_lists   NONE,[@ordinary_hosts=banned_users]
> 
> to keep him away from certain machines only. You don't need both entries, it 
> depends whether there are machines for development use only, for ordinary 
> users only, and a pool of machines for mixed use. Sure, one would it rename 
> to "contractor_team" and not "banned_users", if it's used in "user_lists" too.

Oh, I think I understand that now.  You are putting a finer level of
control on each queue and configuring said queue for which user(s) can
access which host(s).  Clever.

> >>> We would ask the contractor to
> >>> specify this new queue for his jobs.  Furthermore, I would add the
> >>> contractor to the xuser_lists of all other queues.
> >>>
> >>> Does that sound reasonable
> >>
> >> Yes.
> >>
> >>
> >>> or is there an easier method for
> >>> accomplishing this task within SGE?
> >>>
> >>> IF it makes sense to proceed in this manner, what is the easiest way to
> >>> add the username of the contractor to the xuser_lists parameter?  Can I
> >>> simply add his username?  Or do I need to create a new access list for 
> >>> him?
> >>
> >> Yes.
> >>
> >> $ qconf -au john_doe banned_users
> >
> > Okay, so to confirm: I create the banned_users ACL and add that ACL to
> > all queues for which john_joe is banned.  Correct?
> >
> > Thanks again for your time and knowledge!
> 
> Either this or create a hostlist to shorten the number of machines for the 
> above setup.

Understood.

> ===
> 
> Even a forced complex could be bound this way to a hostgroup only:
> 
> $ qconf -sq all.q
> ...
> complex_valuesNONE,[@ ordinary_hosts =contractor=TRUE]
> 
> and the BOOL complex "contractor" with a high urgency.

This is starting to make my head hurt ;)

But I believe you have armed me with enough information for me to move
forward with the requisite configuration changes.

Thank you and best regards,

-- 
Mun


> -- Reuti
> 
> 
> > Best regards,
> >
> > --
> > Mun
> >
> >
> >>> Any and all examples of how to implement this type of configuration
> >>> would be greatly appreciated since I am not an SGE expert by any stretch
> >>> of the imagination.
> >>>
> >>> By the way, would the contractor only need an account on serverA in
> >>> order to utilize SGE?  Or would he need an account on the grid master as
> >>> well?
> >>
> >> Are you not using a central user administration by NIS or LDAP?
> >>
> >> AFAICS he needs an entry only on the execution host (and on the submission 
> >> host of course).
> >>
> >> -- Reuti

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Best way to restrict a user to a specific exec host?

2019-04-09 Thread Mun Johl
Hi Reuti,

Thank you for your reply!
Please see my comments below.

On Mon, Apr 08, 2019 at 10:27 PM PDT, Reuti wrote:
> Hi,
> 
> > Am 09.04.2019 um 05:37 schrieb Mun Johl :
> >
> > Hi all,
> >
> > My company is hiring a contractor for some development work.  As such, I
> > need to modify our grid configuration so that he only has access to a
> > single execution host.  That particular host (let's call it serverA)
> > will not have all of our data disks mounted.
> >
> > NOTE: We are running SGE v8.1.9 on systems running Red Hat Enterprise Linux 
> > v6.8 .
> >
> > I'm not really sure how to proceed.  I'm thinking of perhaps creating a
> > new queue which only resides on serverA.
> 
> There is no need for an additional queue. You can add him to the xuser_lists 
> of all oher queues. But a special queue with a limited number of slots might 
> give the contractor more priority to check his develoment faster. Depends on 
> personal taste whether this one is preferred. This queue could have a forced 
> complex with a high urgency, which he always have to request (or you use JSV 
> to add this to his job submissions).

How would I proceed if I did not create an additional queue?  You have
me intrigued.  That is, if I add him to the xuser_lists of all queues,
he wouldn't be able to submit a job, would he?  Perhaps I'm confused.

> >  We would ask the contractor to
> > specify this new queue for his jobs.  Furthermore, I would add the
> > contractor to the xuser_lists of all other queues.
> >
> > Does that sound reasonable
> 
> Yes.
> 
> 
> > or is there an easier method for
> > accomplishing this task within SGE?
> >
> > IF it makes sense to proceed in this manner, what is the easiest way to
> > add the username of the contractor to the xuser_lists parameter?  Can I
> > simply add his username?  Or do I need to create a new access list for him?
> 
> Yes.
> 
> $ qconf -au john_doe banned_users

Okay, so to confirm: I create the banned_users ACL and add that ACL to
all queues for which john_joe is banned.  Correct?

Thanks again for your time and knowledge!

Best regards,

-- 
Mun


> > Any and all examples of how to implement this type of configuration
> > would be greatly appreciated since I am not an SGE expert by any stretch
> > of the imagination.
> >
> > By the way, would the contractor only need an account on serverA in
> > order to utilize SGE?  Or would he need an account on the grid master as
> > well?
> 
> Are you not using a central user administration by NIS or LDAP?
> 
> AFAICS he needs an entry only on the execution host (and on the submission 
> host of course).
> 
> -- Reuti

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


[gridengine users] Best way to restrict a user to a specific exec host?

2019-04-08 Thread Mun Johl
Hi all,

My company is hiring a contractor for some development work.  As such, I
need to modify our grid configuration so that he only has access to a
single execution host.  That particular host (let's call it serverA)
will not have all of our data disks mounted.

NOTE: We are running SGE v8.1.9 on systems running Red Hat Enterprise Linux 
v6.8 .

I'm not really sure how to proceed.  I'm thinking of perhaps creating a
new queue which only resides on serverA.  We would ask the contractor to
specify this new queue for his jobs.  Furthermore, I would add the
contractor to the xuser_lists of all other queues.

Does that sound reasonable or is there an easier method for
accomplishing this task within SGE?

IF it makes sense to proceed in this manner, what is the easiest way to
add the username of the contractor to the xuser_lists parameter?  Can I
simply add his username?  Or do I need to create a new access list for him?

Any and all examples of how to implement this type of configuration
would be greatly appreciated since I am not an SGE expert by any stretch
of the imagination.

By the way, would the contractor only need an account on serverA in
order to utilize SGE?  Or would he need an account on the grid master as
well?

Thank you very much in advance.

Kind regards,

-- 
Mun

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


[gridengine users] Best way to restrict a user to a specific exec host?

2019-04-08 Thread Mun Johl
Hi all,

My company is hiring a contractor for some development work.  As such, I
need to modify our grid configuration so that he only has access to a
single execution host.  That particular host (let's call it serverA)
will not have all of our data disks mounted.

NOTE: We are running SGE v8.1.9 on systems running Red Hat Enterprise Linux 
v6.8 .

I'm not really sure how to proceed.  I'm thinking of perhaps creating a
new queue which only resides on serverA.  We would ask the contractor to
specify this new queue for his jobs.  Furthermore, I would add the
contractor to the xuser_lists of all other queues.

Does that sound reasonable or is there an easier method for
accomplishing this task within SGE?

IF it makes sense to proceed in this manner, what is the easiest way to
add the username of the contractor to the xuser_lists parameter?  Can I
simply add his username?  Or do I need to create a new access list for him?

Any and all examples of how to implement this type of configuration
would be greatly appreciated since I am not an SGE expert by any stretch
of the imagination.

By the way, would the contractor only need an account on serverA in
order to utilize SGE?  Or would he need an account on the grid master as
well?

Thank you very much in advance.

Kind regards,

-- 
Mun

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


[gridengine users] Debugging a commlib error following reboot of exec host

2018-07-03 Thread Mun Johl
Hi,

We're using SGE 8.1.9 on CentOS 6.9

"All of the sudden" we've noticed that when we reboot an execution host,
any jobs sent to it within the first 10-15 min following boot-up will
get stuck in the 't' state until deleted (sometimes that has to be done
forcibly).  However, after 10-ish minutes, the execution host will start
accepting jobs.

In the qmaster's messages file, I see the following entries:

06/25/2018 10:28:15|listen|sim1|E|commlib error: endpoint is not unique error 
(endpoint "sim4.work.com/execd/1" is already connected)
06/25/2018 10:38:36| timer|sim1|W|failed to deliver job 54312.1 to queue 
"shor...@sim4.work.com"
06/25/2018 10:38:36| timer|sim1|E|got max. unheard timeout for target "execd" 
on host "sim4.work.com", can't deliver job "54312"

Our IT person says he can connect to the SGE ports on both the qmaster
and exec hosts without issue.

I need some help trying to figure out exactly why the SGE qmaster is not
happy so that we can deploy a fix.  I am _assuming_ some kind of
DNS/Network issue on our end.  This phenomenon is repeatable on all of
our execution hosts (although, our server count is small at this point).
I am told by IT that nothing has changed regarding DNS from when SGE
execution hosts worked "correctly" following a reboot to now.

Regards,

--
Mun
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


[gridengine users] Debugging a commlib error following reboot of exec host

2018-07-03 Thread Mun Johl
Hi,

We're using SGE 8.1.9 on CentOS 6.9

"All of the sudden" we've noticed that when we reboot an execution host,
any jobs sent to it within the first 10-15 min following boot-up will
get stuck in the 't' state until deleted (sometimes that has to be done
forcibly).  However, after 10-ish minutes, the execution host will start
accepting jobs.

In the qmaster's messages file, I see the following entries:

06/25/2018 10:28:15|listen|sim1|E|commlib error: endpoint is not unique error 
(endpoint "sim4.work.com/execd/1" is already connected)
06/25/2018 10:38:36| timer|sim1|W|failed to deliver job 54312.1 to queue 
"shor...@sim4.work.com"
06/25/2018 10:38:36| timer|sim1|E|got max. unheard timeout for target "execd" 
on host "sim4.work.com", can't deliver job "54312"

Our IT person says he can connect to the SGE ports on both the qmaster
and exec hosts without issue.

I need some help trying to figure out exactly why the SGE qmaster is not
happy so that we can deploy a fix.  I am _assuming_ some kind of
DNS/Network issue on our end.  This phenomenon is repeatable on all of
our execution hosts (although, our server count is small at this point).
I am told by IT that nothing has changed regarding DNS from when SGE
execution hosts worked "correctly" following a reboot to now.

Regards,

-- 
Mun
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Weird interaction between TCL and SGE

2018-03-29 Thread Mun Johl
Hi Reuti,

> Hi,
> 
> >> What does $hostname stands for? Do you want the job to start on a
> >> particular machine?
> >
> > Precisely.  That is the end goal.
> 
> The syntax in SGE to address a particular host is: -l h=$hostname
> 
> It's not like ssh where you give just the target. But unless there is a
> bold reason, SGE will usually select an appropriate exechost for you.
> That's the purpose of a queuing system.

Actually, my script will use the "-l hostname=$hostname" expansion when someone 
elects to target a specific execution host; but I did not explain that detail.  
My apologies.


> >>> qsub: invalid option argument "-l vl"
> >>
> >> Looks like TCL will give the argument as one, but SGE expects two.
> >> Separating them as "-l" and "vl" might work, on the command line the
> >> splitting is done by the shell where $foo will be split but "$foo"
> >> won't and raise an error too. I have no clue whether TCL has an
> >> option `eval` the expression to split the options.
> >
> > I'm not sure I understand: It does appear as if the "-l" and "vl" are
> separated by a space character.  Is that not enough?
> 
> Not when they are inside a variable: "-l" is the option and "vl" the
> argument to that option. Having "-l vl" is something SGE doesn't understand
> as it should have been broken down. But there is nothing in the TCL
> performing that.
> 
> $ foo="-l h=node29"
> $ qsub "$foo" -b y /bin/sleep 5
> qsub: ERROR! invalid option argument "-l h=node29"
> $ qsub $foo -b y /bin/sleep 5
> Your job 277648 ("sleep") has been submitted
> 
> And it looks like TCL is forwarding the argument like the content of a
> variable is just one option and not option plus argument.

Ah yes, I understand.  And I tried an experiment that processed the args before 
passing them to qsub and that did in fact work.  I'm not sure how I missed 
that; but thanks very much for pointing it out to me!

Best regards,

-- 
Mun


___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Weird interaction between TCL and SGE

2018-03-29 Thread Mun Johl
Hi Reuti,

> -Original Message-
> From: Reuti <re...@staff.uni-marburg.de>
> Sent: Wednesday, March 28, 2018 11:52 PM
> To: Mun Johl <mun.j...@kazan-networks.com>
> Cc: users@gridengine.org
> Subject: Re: [gridengine users] Weird interaction between TCL and SGE
> 
> [EXTERNAL EMAIL]
> This email was received from outside the organization.
> 
> 
> Hi,
> 
> Am 29.03.2018 um 06:40 schrieb Mun Johl:
> 
> > HI,
> >
> > I'm updating some of our TCL scripts to submit jobs to grid and I've run 
> > into
> an issue I can't explain.  In the TCL script I initialize a variable thusly:
> >
> > set hostname ""
> >
> > and that gets passed into the qsub command as follows:
> >
> > set status [catch {exec qsub -b yes -cwd -sync n -V -q short.q -l vl
> > $hostname tclsh $sim} result]
> 
> What does $hostname stands for? Do you want the job to start on a
> particular machine?

Precisely.  That is the end goal.

> > SGE v8.1.9 is not happy with that for some reason, and I get nasty emails
> such as the following:
> >
> >
> 
> ==
> > = failed before prolog: shepherd exited with exit status 7: before
> > prolog Shepherd trace:
> > 03/28/2018 16:25:20 [495:19306]: shepherd called with uid = 0, euid =
> > 495
> > 03/28/2018 16:25:20 [495:19306]: starting up 8.1.9
> > 03/28/2018 16:25:20 [495:19306]: setpgid(19306, 19306) returned 0
> > 03/28/2018 16:25:20 [495:19306]: do_core_binding: "binding" parameter
> > not found in config file
> > 03/28/2018 16:25:20 [495:19306]: no prolog script to start
> >
> > Shepherd pe_hostfile:
> > sim.company.com 1 shor...@sim.company.com UNDEFINED
> >
> > Furthermore, the spool/sim/messages has the following messages:
> >
> > 03/28/2018 21:15:36|  main|sim|E|shepherd of job 134.1 died through
> > signal = 11
> > 03/28/2018 21:15:36|  main|sim|E|abnormal termination of shepherd for
> > job 134.1: no "exit_status" file
> > 03/28/2018 21:15:36|  main|sim|E|can't open file
> > active_jobs/134.1/error: No such file or directory
> >
> > If I initialize hostname to a space (" "), the job will run just fine.
> >
> > Moreover, if I assign a var to something like "-l vl" to reserve a 
> > consumable
> resource, SGE complains thusly:
> >
> > qsub: invalid option argument "-l vl"
> 
> Looks like TCL will give the argument as one, but SGE expects two. Separating
> them as "-l" and "vl" might work, on the command line the splitting is done
> by the shell where $foo will be split but "$foo" won't and raise an error 
> too. I
> have no clue whether TCL has an option `eval` the expression to split the
> options.

I'm not sure I understand: It does appear as if the "-l" and "vl" are separated 
by a space character.  Is that not enough?

Thanks,

-- 
Mun

> 
> -- Reuti
> 
> 
> > But if I "hard code" that option in the qsub command, the job will run
> correctly.
> >
> > I've tried to scrub 'hostname' of non-printable characters, and to strip and
> (unseen) white space, but nothing seems to work.  I don't know what gets
> embedded into the TCL strings that SGE seems to dislike.  It doesn't help that
> I'm relatively new to TCL.
> >
> > Any suggestions would be appreciated.
> >
> > Regards,
> >
> > --
> > Mun
> > ___
> > users mailing list
> > users@gridengine.org
> > https://gridengine.org/mailman/listinfo/users


___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


[gridengine users] Weird interaction between TCL and SGE

2018-03-28 Thread Mun Johl
HI,

I'm updating some of our TCL scripts to submit jobs to grid and I've run into 
an issue I can't explain.  In the TCL script I initialize a variable thusly:

set hostname ""

and that gets passed into the qsub command as follows:

set status [catch {exec qsub -b yes -cwd -sync n -V -q short.q -l vl $hostname 
tclsh $sim} result]

SGE v8.1.9 is not happy with that for some reason, and I get nasty emails such 
as the following:

===

failed before prolog: shepherd exited with exit status 7: before prolog 
Shepherd trace:

03/28/2018 16:25:20 [495:19306]: shepherd called with uid = 0, euid = 495

03/28/2018 16:25:20 [495:19306]: starting up 8.1.9

03/28/2018 16:25:20 [495:19306]: setpgid(19306, 19306) returned 0

03/28/2018 16:25:20 [495:19306]: do_core_binding: "binding" parameter not found 
in config file

03/28/2018 16:25:20 [495:19306]: no prolog script to start



Shepherd pe_hostfile:

sim.company.com 1 shor...@sim.company.com 
UNDEFINED

Furthermore, the spool/sim/messages has the following messages:

03/28/2018 21:15:36|  main|sim|E|shepherd of job 134.1 died through signal = 11
03/28/2018 21:15:36|  main|sim|E|abnormal termination of shepherd for job 
134.1: no "exit_status" file
03/28/2018 21:15:36|  main|sim|E|can't open file active_jobs/134.1/error: No 
such file or directory

If I initialize hostname to a space (" "), the job will run just fine.

Moreover, if I assign a var to something like "-l vl" to reserve a consumable 
resource, SGE complains thusly:

qsub: invalid option argument "-l vl"

But if I "hard code" that option in the qsub command, the job will run 
correctly.

I've tried to scrub 'hostname' of non-printable characters, and to strip and 
(unseen) white space, but nothing seems to work.  I don't know what gets 
embedded into the TCL strings that SGE seems to dislike.  It doesn't help that 
I'm relatively new to TCL.

Any suggestions would be appreciated.

Regards,

--
Mun
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] How to specify relative priority of execution hosts?

2018-03-27 Thread Mun Johl
Hi Reuti,

Thanks for the quick reply.

> -Original Message-
> Hi,
> 
> Am 27.03.2018 um 22:37 schrieb Mun Johl:
> 
> > Hi,
> >
> > A couple of our execution hosts also serve as login servers.  Therefore, we
> only want them to take a job if all other execution hosts' slots are full.  
> How
> would I go about configuring SGE v8.1.9 in that manner?
> 
> You can give these login machines a higher sequence number in the queue
> configuration like:
> 
> $ qconf -sq foo
> qname foo
> hostlist  @allhosts
> seq_no0,[@loginmachines=100],[special=105]
> 
> $ qconf -ssconf
> ...
> queue_sort_method seqno

I have implemented this configuration and it *appears* to be working as 
advertised.  Although, I can't actually fill the slots of the non-login 
machines  just yet.

Thanks very much!

Best regards,

-- 
Mun

> 
> -- Reuti

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


[gridengine users] How to specify relative priority of execution hosts?

2018-03-27 Thread Mun Johl
Hi,

A couple of our execution hosts also serve as login servers.  Therefore, we 
only want them to take a job if all other execution hosts' slots are full.  How 
would I go about configuring SGE v8.1.9 in that manner?

Thanks,

--
Mun
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Is it possible to nohup a command within a script dispatched via qsub?

2018-03-23 Thread Mun Johl
Hi William, Reuti,

> > Am 23.03.2018 um 09:33 schrieb William Hay <w@ucl.ac.uk>:
> >
> > On Fri, Mar 23, 2018 at 12:27:48AM +0100, Reuti wrote:
> >> Hi,
> >>
> >> Am 22.03.2018 um 20:51 schrieb Mun Johl:
> >>
> >>> Hi,
> >>>
> >>> I?m using SGE v8.1.9 on RHEL6.8 .  In my script that I submit via qsub
> (let?s call it scriptA), I have a gxmessage (gxmessage is similar to xmessage,
> postnote, etc) statement which pops up a small status window notifying the
> user of the results of the qsub job.
> >>
> >> Is SGE and your job running local on your workstation only? I wonder how
> the gxmessage could display something on the terminal of the user when the
> job runs on an exechost in the cluster and was submitted at some time in the
> past.
> >>
> >>
> >>> However, I don?t want the gxmessage to exit when scriptA terminates.
> So far, I have not figured out a what to satisfy my wants.  That is, when
> scriptA terminates, so does gxmessage.  nohup  does not help because
> gxmessage gets a SIGKILL.
> >>
> >> SGE kills the complete process group when the jobs ends (or is canceled),
> not just a single process. One might circumvent this with a `setsid foobar &`
> command. The `nohup` isn't necessary here.
> >>
> >> As a second measure to kill orphaned processes one can use the
> additional group id, which is attached to all SGE processes. Although it would
> be counterproductive in your case as it would kill the leftover process 
> despite
> the newly created process group. This would need to set:
> >>
> >> $ qconf -sconf
> >> #global:
> >> ...
> >> execd_params ENABLE_ADDGRP_KILL=TRUE
> >>
> > According to https://arc.liv.ac.uk/repos/darcs/sge/NEWS
> > ENABLE_ADDGRP_KILL defaults to on after SoGE 8.1.7 so it probably needs
> to be explicitly set false.
> >
> > As this is about notifying the user of a completed job I'm wondering
> > if an alternative might be to write a mail compatible wrapper for
> gxmessage and specify  that as the mailer in sge_conf.
> > The wrapper might need to be somewhat smart to distinguish different
> uses of mailer by SGE though.
> 
> Indeed, this might be an option. I use a prolog and epilog to write certain
> settings to persistent file to be send by the mail wrapper then. It's 
> important
> to realize, that the email from the exechost is send *after* the job left the
> exechost already and usually there are no traces about any settings of it.

I have a workaround solution in place, but an epilog sort of thing is a good 
idea.  I'll keep that in mind in case I have problems with my workaround.

Thanks and regards,

-- 
Mun

> 
> Latest additions to my wrapper even send the last 1MB of the output text file
> as attachment in the email, while the file to look for is specified in a 
> context
> variable which will be saved in the persistent file.
> 
> -- Reuti

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Is it possible to nohup a command within a script dispatched via qsub?

2018-03-22 Thread Mun Johl
Hi Reuti,

Thanks for your reply.  See my comments below.

> Hi,
> 
> Am 22.03.2018 um 20:51 schrieb Mun Johl:
> 
> > Hi,
> >
> > I'm using SGE v8.1.9 on RHEL6.8 .  In my script that I submit via qsub 
> > (let's
> call it scriptA), I have a gxmessage (gxmessage is similar to xmessage,
> postnote, etc) statement which pops up a small status window notifying the
> user of the results of the qsub job.
> 
> Is SGE and your job running local on your workstation only? I wonder how
> the gxmessage could display something on the terminal of the user when the
> job runs on an exechost in the cluster and was submitted at some time in the
> past.

The job is dispatched to a remote server.  We essentially set the DISPLAY of 
the running env on the qhost back to our VNC session's DISPLAY.

> >  However, I don't want the gxmessage to exit when scriptA terminates.  So
> far, I have not figured out a what to satisfy my wants.  That is, when scriptA
> terminates, so does gxmessage.  nohup  does not help because gxmessage
> gets a SIGKILL.
> 
> SGE kills the complete process group when the jobs ends (or is canceled), not
> just a single process. One might circumvent this with a `setsid foobar &`
> command. The `nohup` isn't necessary here.
> 
> As a second measure to kill orphaned processes one can use the additional
> group id, which is attached to all SGE processes. Although it would be
> counterproductive in your case as it would kill the leftover process despite
> the newly created process group. This would need to set:
> 
> $ qconf -sconf
> #global:
> ...
> execd_params ENABLE_ADDGRP_KILL=TRUE
> 
> You can use:
> 
> $ ps -e f -o pid,ppid,pgrp,session,command
> 
> on the exechost to investigate this (f w/o dash, it's not a typo)
> 
> 
> > Is it a "feature" of SGE to ensure all child processes are dead when the
> qsub job terminates?
> 
> Yes, this is a feature of SGE and usually highly welcome.

Thanks for the clarification and explanation.  Now I know the reason for the 
behavior  and that helps very much.

Regards,

-- 
Mun

> 
> -- Reuti
> 
> 
> >  Is there a way I can allow scriptA to terminate but leave behind a child
> process on a qhost?
> >
> > I have a workaround for this issue, but at this point I really want to
> understand what SGE is doing and if there are better solutions than what I
> have in mind for my workaround.
> >
> > Thanks,
> >
> > --
> > Mun
> > ___
> > users mailing list
> > users@gridengine.org
> > https://gridengine.org/mailman/listinfo/users


___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


[gridengine users] Is it possible to nohup a command within a script dispatched via qsub?

2018-03-22 Thread Mun Johl
Hi,

I'm using SGE v8.1.9 on RHEL6.8 .  In my script that I submit via qsub (let's 
call it scriptA), I have a gxmessage (gxmessage is similar to xmessage, 
postnote, etc) statement which pops up a small status window notifying the user 
of the results of the qsub job.  However, I don't want the gxmessage to exit 
when scriptA terminates.  So far, I have not figured out a what to satisfy my 
wants.  That is, when scriptA terminates, so does gxmessage.  nohup  does not 
help because gxmessage gets a SIGKILL.

Is it a "feature" of SGE to ensure all child processes are dead when the qsub 
job terminates?  Is there a way I can allow scriptA to terminate but leave 
behind a child process on a qhost?

I have a workaround for this issue, but at this point I really want to 
understand what SGE is doing and if there are better solutions than what I have 
in mind for my workaround.

Thanks,

--
Mun
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] gridengine rpm complaining about perl(XML::Simple) even though it's installed

2018-03-20 Thread Mun Johl
Hi William,

Thanks for the reply.  Arnau had mentioned the same thing and installing via 
yum was in fact the solution.

Regards,

--
Mun


-Original Message-
From: William Hay <w@ucl.ac.uk> 
Sent: Monday, March 19, 2018 1:59 AM
To: Mun Johl <mun.j...@kazan-networks.com>
Cc: users@gridengine.org
Subject: Re: [gridengine users] gridengine rpm complaining about 
perl(XML::Simple) even though it's installed

[EXTERNAL EMAIL]
This email was received from outside the organization.


___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] gridengine rpm complaining about perl(XML::Simple) even though it's installed

2018-03-16 Thread Mun Johl
Hi all,

Please disregard my previous reply.  Apparently yum had a conflict when I tried 
to install XML::Simple and I hadn’t noticed that message.  I’ll try again after 
resolving the conflict.

Regards,

--
Mun


From: Mun
Sent: Friday, March 16, 2018 10:38 AM
To: users@gridengine.org
Subject: RE: [gridengine users] gridengine rpm complaining about 
perl(XML::Simple) even though it's installed

Hi Arnau,

Thanks for the good information; however, my problem persists.

Explanation: I couldn’t find the perl-XML-Simple package in our repos; but 
thanks to the link you provided I was able to download and install what I 
“think” was the correct package:

$ yum localinstall perl-XML-Simple-2.18-3.el6.rfx.noarch.rpm

However, when I attempted to install gridengine-8.1.9-1.el6.x86_64.rpm, I again 
got the same error as before:

Error: Package: gridengine-8.1.9-1.el6.x86_64 (/gridengine-8.1.9-1.el6.x86_64)
   Requires: perl(XML::Simple)

Any assistance would be appreciated.

Regards,

--
Mun



This email was received from outside the organization.

Yum know nothing about cpanm installations.
You need to install perl(XML::Simple) with yum not with cpanm.

yum install perl-XML-Simple.noarch

if it's not available in your repos you could try 
https://rpmfind.net/linux/rpm2html/search.php?query=perl-XML-Simple , but it 
should be part of your base repos.

HTH,
Arnau

2018-03-15 23:19 GMT+01:00 Mun Johl 
<mun.j...@kazan-networks.com<mailto:mun.j...@kazan-networks.com>>:
Hi,

I am trying to install gridengine-8.1.9-1.el6.x86_64.rpm on a RedHat EL6 
system.  The yum command exits with the following error:

Error: Package: gridengine-8.1.9-1.el6.x86_64 (/gridengine-8.1.9-1.el6.x86_64)
   Requires: perl(XML::Simple)

However, I have installed the XML::Simple package via the following command:

$ cpanm XML::Simple

I have verified via perldoc that XML::Simple is in fact installed; so I’m at a 
loss as to why yum still is unhappy.

Any suggestions would be greatly appreciated.

Regards,

--
Mun

___
users mailing list
users@gridengine.org<mailto:users@gridengine.org>
https://gridengine.org/mailman/listinfo/users

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] gridengine rpm complaining about perl(XML::Simple) even though it's installed

2018-03-16 Thread Mun Johl
Hi Arnau,

Thanks for the good information; however, my problem persists.

Explanation: I couldn’t find the perl-XML-Simple package in our repos; but 
thanks to the link you provided I was able to download and install what I 
“think” was the correct package:

$ yum localinstall perl-XML-Simple-2.18-3.el6.rfx.noarch.rpm

However, when I attempted to install gridengine-8.1.9-1.el6.x86_64.rpm, I again 
got the same error as before:

Error: Package: gridengine-8.1.9-1.el6.x86_64 (/gridengine-8.1.9-1.el6.x86_64)
   Requires: perl(XML::Simple)

Any assistance would be appreciated.

Regards,

--
Mun



This email was received from outside the organization.

Yum know nothing about cpanm installations.
You need to install perl(XML::Simple) with yum not with cpanm.

yum install perl-XML-Simple.noarch

if it's not available in your repos you could try 
https://rpmfind.net/linux/rpm2html/search.php?query=perl-XML-Simple , but it 
should be part of your base repos.

HTH,
Arnau

2018-03-15 23:19 GMT+01:00 Mun Johl 
<mun.j...@kazan-networks.com<mailto:mun.j...@kazan-networks.com>>:
Hi,

I am trying to install gridengine-8.1.9-1.el6.x86_64.rpm on a RedHat EL6 
system.  The yum command exits with the following error:

Error: Package: gridengine-8.1.9-1.el6.x86_64 (/gridengine-8.1.9-1.el6.x86_64)
   Requires: perl(XML::Simple)

However, I have installed the XML::Simple package via the following command:

$ cpanm XML::Simple

I have verified via perldoc that XML::Simple is in fact installed; so I’m at a 
loss as to why yum still is unhappy.

Any suggestions would be greatly appreciated.

Regards,

--
Mun

___
users mailing list
users@gridengine.org<mailto:users@gridengine.org>
https://gridengine.org/mailman/listinfo/users

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


[gridengine users] gridengine rpm complaining about perl(XML::Simple) even though it's installed

2018-03-15 Thread Mun Johl
Hi,

I am trying to install gridengine-8.1.9-1.el6.x86_64.rpm on a RedHat EL6 
system.  The yum command exits with the following error:

Error: Package: gridengine-8.1.9-1.el6.x86_64 (/gridengine-8.1.9-1.el6.x86_64)
   Requires: perl(XML::Simple)

However, I have installed the XML::Simple package via the following command:

$ cpanm XML::Simple

I have verified via perldoc that XML::Simple is in fact installed; so I'm at a 
loss as to why yum still is unhappy.

Any suggestions would be greatly appreciated.

Regards,

--
Mun
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] I'm getting an "Unable to initialize env" error; but our simultaneous ECs should be small

2017-11-26 Thread Mun Johl
Hi William,

Thanks for your reply.  See my comments below.


On Thu, Nov 23, 2017 at 2:53 AM, William Hay <w@ucl.ac.uk> wrote:

> On Wed, Nov 22, 2017 at 09:53:17AM -0800, Mun Johl wrote:
> >Hi,
> >Periodically I am seeing the following error:
> >
> >  Unable to initialize environment because of error: cannot register
> event
> >  client. Only 100 event clients are allowed in the system
> >
> >The error first showed up a few days ago but stated "950 event
> clients are
> >allowed".  Because MAX_DYN_EC was not set in my config, I equated it
> to
> >100.
> I am not sure what you mean by "I equated it to 100"?  Did you set it to
> 100
> after getting  the error?  IIRC the default is 1000.
> ​​
>
>
​Yes, after getting the error I tried to check what MAX_DYN_EC parameter
was set to but it was not set in our configuration.  I assumed it was
implicitly set to 950 based on the original error message.  However, that
value is *way* larger than I would ever expect in our configuration and
thus was perplexed how we could have that many event clients; therefore, I
wasn't sure if MAX_DYN_EC was actually set to 950 or if the error message
was incorrect.  So I set MAX_DYN_EC to 100 via 'qconf -mconf' as a test.
Note that 100 is roughly an order of magnitude larger than I would expect
we need.  Currently, we have very few qsub jobs running at any given time.

One other note is that grid has been working fine for months and this error
just showed up a couple of weeks ago.  Although, we may be seeing our
consumable resources being exhausted more frequently as of late.  Not that
should result in the error I'm seeing, but just another piece of data.


> >However, our sim ring is fairly small at this point and we shouldn't
> be
> >getting anywhere near 100 outstanding qsub's (let alone 950).
> Therefore,
> >I'm wondering what other factors could result in this error?
> >For example, could a slow network or slow grid master result in this
> >error?
> >Any suggestions on how I can get to root cause would be most
> appreciated.
> >Thanks,
>
> Are you actually using qsub?  IIRC when using DRMAA it is possible to leak
> event clients
> (ie the event client is created when a job is qsub'd but isn't
> automatically freed when
> the job terminates only when the client program does) if you launch
> multiple jobs from
> the same process.
>
> If you are using qsub -sync y check that the qsub processes are actually
> being
> reaped (ie there aren't a bunch of zombie qsubs hanging around).
>

​We're using 'qsub -sync y' and I don't see any zombie qsubs on our grid
hosts.  But perhaps I should start a cron job to periodically  check the
number of qsubs that are active.

>
> Also check that you aren't short of filehandles (ie ulimit) either where
> the submit
> program runs or where the qmaster lives.
>

​ulimit -n reports 1024 on our qmaster and the execution hosts.  However,
'sysctl fs.file-nr' outputs:

​fs.file-nr = 4736   0   13065172

So I'm a little confused as to why the number of file handles reported by
the sysctl command exceeds the ulimit value.

Regards,

-- 
Mun
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


[gridengine users] I'm getting an "Unable to initialize env" error; but our simultaneous ECs should be small

2017-11-22 Thread Mun Johl
Hi,

Periodically I am seeing the following error:

Unable to initialize environment because of error: cannot register event
client. Only 100 event clients are allowed in the system


The error first showed up a few days ago but stated "950 event clients are
allowed".  Because MAX_DYN_EC was not set in my config, I equated it to 100.

However, our sim ring is fairly small at this point and we shouldn't be
getting anywhere near 100 outstanding qsub's (let alone 950).  Therefore,
I'm wondering what other factors could result in this error?

For example, could a slow network or slow grid master result in this error?

Any suggestions on how I can get to root cause would be most appreciated.

Thanks,

-- 
Mun
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


[gridengine users] qmon and big icons results in a missing qmon icon

2017-05-01 Thread Mun Johl
Hi,

I've enabled big icons in SGE 8.1.9 by uncommenting the following in the
Qmon resources file:

Qmon*pixmapFilePath:  %R/qmon/PIXMAPS/big/%N.xpm:%R/qmon/PIXMAPS/%N.xpm

However, when I launch qmon, qmon is missing the icon for "Resource Quota
Configuration", and the following messages are displayed in the terminal:

Warning: XmtParseXmtImage: image contains too many colors
Warning: Cannot convert string "toolbar_rqs" to type Pixmap

I'm running on CentOS7.3 .  Any ideas on how I can correct this issue?
Small icons were correctly, but I like the big icons.

Thanks,

-- 
Mun
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Are iptable changes required on a CentOS7 installation of Son of Grid?

2017-04-11 Thread Mun Johl
Hi Chris,

Thanks for the reply.
Please see my comments below.


On Tue, Apr 11, 2017 at 5:55 PM, Christopher Heiny <
christopherhe...@gmail.com> wrote:

> On Tue, 2017-04-11 at 17:34 -0700, Mun Johl wrote:
> > Just to show how much I know, apparently our machines are running
> > firewalld; not iptables.  But I still have the same queries, only now
> > they are related to firewalld :)
>
>
> Here's what I do:
>
> firewall-cmd --permanent --add-port=6444/tcp
> firewall-cmd --permanent --add-port=6445/tcp
> firewall-cmd --reload
>


Hmm, that didn't help; thus, I have something even worse to deal with.
BTW, I executed the firewall-cmd commands on both the qmaster and the
execution host.  FYI, here is the error I get when the execution host tries
to qping the qmaster:

​% qping SGEMASTER 6444 qmaster 1
endpoint SGEMASTER.company.com/qmaster/1 at port 6444: can't find connection
got select error: No route to host
got select error: closing "SGEMASTER.company.com/qmaster/1"

I believe IT has a bridge configured on the SGEMASTER; therefore, I need to
discuss that aspect of the network with our IT folks to see if that may be
impeding my success in some fashion.

Thanks, again.  At least now I know it's not due to the firewalld.

Regards,

-- 
Mun

​

>
> This could probably be consolidated, but I prefer one-at-a-time for
> easier debugging when things go wrong.
>
> Chris
> ___
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users
>
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Are iptable changes required on a CentOS7 installation of Son of Grid?

2017-04-11 Thread Mun Johl
Hi all,

Just to show how much I know, apparently our machines are running
firewalld; not iptables.  But I still have the same queries, only now they
are related to firewalld :)

Regards,

-- 
Mun


On Tue, Apr 11, 2017 at 4:13 PM, Mun Johl <m...@apeirondata.com> wrote:

> Hi,
>
> I'm installing Son of Grid Engine (SGE) v8.1.9-1.el6 on a handful of
> CentOS7 systems.  I got the qmaster installed, but when I attempted to
> install an execution host on a different machine I got the following error
> during the "Checking hostname resolving" step:
>
>error: commlib error: got select error (No route to host)
> unable to send message to qmaster using port 6444 on host "
> qmaster.company.com": got send error
>
> I've confirmed that the qmaster daemon is running.  Therefore, I'm
> wondering if I need to do any type of iptable updates?  I'm not too
> familiar with iptables; but I don't see anything in them related to ports
> 6444 or 6445 in our iptables.  Should I see references to the qmaster/execd
> ports in the iptables?  If so, how can I correctly populate the iptables on
> the qmaster, execution hosts, and submission hosts?
>
> Regards,
>
> --
> Mun
>
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


[gridengine users] Are iptable changes required on a CentOS7 installation of Son of Grid?

2017-04-11 Thread Mun Johl
Hi,

I'm installing Son of Grid Engine (SGE) v8.1.9-1.el6 on a handful of
CentOS7 systems.  I got the qmaster installed, but when I attempted to
install an execution host on a different machine I got the following error
during the "Checking hostname resolving" step:

   error: commlib error: got select error (No route to host)
unable to send message to qmaster using port 6444 on host "
qmaster.company.com": got send error

I've confirmed that the qmaster daemon is running.  Therefore, I'm
wondering if I need to do any type of iptable updates?  I'm not too
familiar with iptables; but I don't see anything in them related to ports
6444 or 6445 in our iptables.  Should I see references to the qmaster/execd
ports in the iptables?  If so, how can I correctly populate the iptables on
the qmaster, execution hosts, and submission hosts?

Regards,

-- 
Mun
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Recommended installation instructions for CentOS 7?

2017-03-16 Thread Mun Johl
Hi Reuti,

Please see my inline comments below.

On Thu, Mar 16, 2017 at 3:16 AM, Reuti <re...@staff.uni-marburg.de> wrote:

>
> > Am 16.03.2017 um 00:50 schrieb Mun Johl <m...@apeirondata.com>:
> >
> > Hi Reuti,
> >
> > Thanks for your reply.
> >
> > I downloaded the sources--GE2011.11p1.tar.gz--from:
> > https://sourceforge.net/projects/gridscheduler/files
>
> These are quite old, please have a look here:
>
> https://arc.liv.ac.uk/trac/SGE


​Thanks for that link!  I downloaded gridengine-8.1.9-1.el6.x86_64.rpm
<http://arc.liv.ac.uk/downloads/SGE/releases/8.1.9/gridengine-8.1.9-1.el6.x86_64.rpm>​
for our CentOS 7 systems and will give it a shot.

Best regards,

-- 
Mun



>
> -- Reuti
>
>
> > I also have a tarball named "ge2011.11.tar.gz" which I downloaded a few
> > months ago.  It's the binaries for Linux but unfortunately I don't recall
> > from where I downloaded it (sorry).  And so far I have not been able to
> > find said tarball on the web again.  I'll let you know if I discover the
> > source.
> >
> > Best regards,
> >
> > --
> > Mun
> >
> >
> > On Wed, Mar 15, 2017 at 4:02 PM, Reuti <re...@staff.uni-marburg.de>
> wrote:
> >
> >> -BEGIN PGP SIGNED MESSAGE-
> >> Hash: SHA1
> >>
> >> Hi,
> >>
> >> Am 15.03.2017 um 23:04 schrieb Mun Johl:
> >>
> >>> Sorry for this basic question: I need to do a fresh installation of OGS
> >> onto some CentOS 7.3 systems and I was wondering if there is a
> recommended
> >> installation guide available online?  I searched the web, but there were
> >> several hits for different grid engines, and different configurations,
> etc.
> >> which made it difficult for me to pick which guide to follow.
> >>>
> >>> Our installation will start out very simple: no clusters, no HA, just a
> >> few hosts, etc.
> >>
> >> - From where did you download the OGS or possibly SoGE?
> >>
> >> - -- Reuti
> >>
> >> -BEGIN PGP SIGNATURE-
> >> Comment: GPGTools - https://gpgtools.org
> >>
> >> iEYEARECAAYFAljJyCsACgkQo/GbGkBRnRoIMQCgycPDhMLGZwopePaLNfc21aLc
> >> n/oAoOAOY2pMvMKRaOODiZR1pbtHHjv7
> >> =4yuu
> >> -END PGP SIGNATURE-
>
>
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Recommended installation instructions for CentOS 7?

2017-03-15 Thread Mun Johl
Hi Reuti,

Thanks for your reply.

I downloaded the sources--GE2011.11p1.tar.gz--from:
https://sourceforge.net/projects/gridscheduler/files

I also have a tarball named "ge2011.11.tar.gz" which I downloaded a few
months ago.  It's the binaries for Linux but unfortunately I don't recall
from where I downloaded it (sorry).  And so far I have not been able to
find said tarball on the web again.  I'll let you know if I discover the
source.

Best regards,

-- 
Mun


On Wed, Mar 15, 2017 at 4:02 PM, Reuti <re...@staff.uni-marburg.de> wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Hi,
>
> Am 15.03.2017 um 23:04 schrieb Mun Johl:
>
> > Sorry for this basic question: I need to do a fresh installation of OGS
> onto some CentOS 7.3 systems and I was wondering if there is a recommended
> installation guide available online?  I searched the web, but there were
> several hits for different grid engines, and different configurations, etc.
> which made it difficult for me to pick which guide to follow.
> >
> > Our installation will start out very simple: no clusters, no HA, just a
> few hosts, etc.
>
> - From where did you download the OGS or possibly SoGE?
>
> - -- Reuti
>
> -BEGIN PGP SIGNATURE-
> Comment: GPGTools - https://gpgtools.org
>
> iEYEARECAAYFAljJyCsACgkQo/GbGkBRnRoIMQCgycPDhMLGZwopePaLNfc21aLc
> n/oAoOAOY2pMvMKRaOODiZR1pbtHHjv7
> =4yuu
> -END PGP SIGNATURE-
>
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


[gridengine users] Recommended installation instructions for CentOS 7?

2017-03-15 Thread Mun Johl
Hi,

Sorry for this basic question: I need to do a fresh installation of OGS
onto some CentOS 7.3 systems and I was wondering if there is a recommended
installation guide available online?  I searched the web, but there were
several hits for different grid engines, and different configurations, etc.
which made it difficult for me to pick which guide to follow.

Our installation will start out very simple: no clusters, no HA, just a few
hosts, etc.

Thanks very much,

-- 
Mun
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Having trouble installing SGE on a new execution host

2017-01-03 Thread Mun Johl
Hi Reuti,

On Mon, Jan 2, 2017 at 1:52 AM, Reuti  wrote:

>
> > Most of the cases there is no need to "install" anything on an
> additional exechost when you have already a working cluster.
> >
> > ​[Mun] Really?  I was basically trying to follow the old "Sun N1 Grid
> Engine 6.1 Installation Guide" instructions to install an Exection Host
> from the following URL:
> > https://docs.oracle.com/cd/E19957-01/820-0697/i999062/index.html
> > ​
> >
> > - Prepare a proper /etc/hosts or NIS or alike on the new machine, so
> that all machines in the cluster are known for it (and also the old
> machines should be able to reference the new one)
> > - Mount /opt/sge or /usr/sge on the new exechost
> >
> > ​[Mun] When SGE was initially installed, a common mount was not used.
> SGE_ROOT is local to each host.  It doesn't "feel" right to copy
> $SGE_ROOT/default from a working host to the new host; but I don't know how
> to get that directory on the new host otherwise.
>
> Just copy the complete $SGE_ROOT then to the same location as on the other
> exyechosts to the new node. Especially the $SGE_ROOT/default/common
> contains the settings how to address the qmaster, i.e. its name. This is
> the idea to share at least this directory branch from the complete
> installation.
>
> The spool directory inside $SGE_ROOT/default can be skipped, as it will be
> recreated during startup of the exechost (i.e. one for this particular
> node).
>
> If this is a new installation of Linux, it might be necessary to adjust
> $SGE_ROOT/util/arch, so that also newer Linux kernels are covered.
>
> ​[Mun] Your instructions worked perfectly.  Thanks very much for the help!

Regards,

-- 
Mun
​

> -- Reuti
>
>
> > - Copy $SGE_ROOT/default/common/sgeexecd to /etc/init.d
> >
> > Depending on the startup of services you need either:
> >
> > # /etc/init.d/sgeexecd start
> > # chkconfig --add sgeexecd
> >
> > or
> >
> > # systemctl daemon-reload
> > # systemctl start sgeexecd.service
> > # systemctl enable sgeexecd.service
> >
> > BTW: Is tmpdir in the queue definition just /tmp or do you need an
> additional /scratch or alike on the new machine too?
> >
> > ​[Mun] I don't understand this question, sorry.​  Are you referring to
> the SGE queues?
> >
> > Regards,
> >
> > --
> > Mun
> >
> >
> > -- Reuti
>
>
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Having trouble installing SGE on a new execution host

2017-01-01 Thread Mun Johl
Hi,

Thanks for your reply.
See my inline comments below.

On Sun, Jan 1, 2017 at 3:49 PM, Reuti <re...@staff.uni-marburg.de
<https://mail.google.com/mail/?view=cm=1=1=re...@staff.uni-marburg.de>
> wrote:

> Hi,
>
> Am 02.01.2017 um 00:05 schrieb Mun Johl:
>
> > Hi,
> >
> > Someone had installed SGE on our servers over a year ago (that person is
> now gone).  However, we now need to install SGE on a new execution host so
> I downloaded the ge2011.11.tar tar-ball.
>
> Was this the version which was installed on the other machines?
>

​[Mun] Yes.
​

>
>
> >  After setting up the SGE_ROOT var, etc. I ran 'install_execd'.  When I
> was queried about the Grid Engine cell, I selected [default] and the
> following error was displayed:
> >
> >Obviously there was no qmaster installation yet!
> >Call >install_qmaster<
> >on the machine which shall run the Grid Engine qmaster
> >
> > However, our qmaster _is_ installed and running.
> >
> > I noticed the ge2011.11.tar tar-ball did not include a 'default'
> directory, which it seems the installation script is trying to access.
> There was nothing in the instructions that I found indicating I am to setup
> that directory ahead of time.  I had assumed the installation process would
> setup that directory.
> >
> > How can I properly setup the 'default' directory so that I can correctly
> install the execution host?
>
> Most of the cases there is no need to "install" anything on an additional
> exechost when you have already a working cluster.
>

​[Mun] Really?  I was basically trying to follow the old "Sun N1 Grid
Engine 6.1 Installation Guide" instructions to install an Exection Host
from the following URL:
https://docs.oracle.com/cd/E19957-01/820-0697/i999062/index.html
​

>
> - Prepare a proper /etc/hosts or NIS or alike on the new machine, so that
> all machines in the cluster are known for it (and also the old machines
> should be able to reference the new one)
> - Mount /opt/sge or /usr/sge on the new exechost
>

​[Mun] When SGE was initially installed, a common mount was not used.
SGE_ROOT is local to each host.  It doesn't "feel" right to copy
$SGE_ROOT/default from a working host to the new host; but I don't know how
to get that directory on the new host otherwise.
​

> - Copy $SGE_ROOT/default/common/sgeexecd to /etc/init.d
>
> Depending on the startup of services you need either:
>
> # /etc/init.d/sgeexecd start
> # chkconfig --add sgeexecd
>
> or
>
> # systemctl daemon-reload
> # systemctl start sgeexecd.service
> # systemctl enable sgeexecd.service
>
> BTW: Is tmpdir in the queue definition just /tmp or do you need an
> additional /scratch or alike on the new machine too?
>

​[Mun] I don't understand this question, sorry.​  Are you referring to the
SGE queues?

Regards,

-- 
Mun


> -- Reuti
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


[gridengine users] Having trouble installing SGE on a new execution host

2017-01-01 Thread Mun Johl
Hi,

Someone had installed SGE on our servers over a year ago (that person is
now gone).  However, we now need to install SGE on a new execution host so
I downloaded the ge2011.11.tar tar-ball.  After setting up the SGE_ROOT
var, etc. I ran 'install_execd'.  When I was queried about the Grid Engine
cell, I selected [default] and the following error was displayed:

   Obviously there was no qmaster installation yet!
   Call >install_qmaster<
   on the machine which shall run the Grid Engine qmaster

However, our qmaster _is_ installed and running.

I noticed the ge2011.11.tar tar-ball did not include a 'default' directory,
which it seems the installation script is trying to access.  There was
nothing in the instructions that I found indicating I am to setup that
directory ahead of time.  I had assumed the installation process would
setup that directory.

How can I properly setup the 'default' directory so that I can correctly
install the execution host?

Thanks and best regards,

-- 
Mun
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Is round-robin scheduling per user possible?

2016-08-15 Thread Mun Johl
Hi all,

I just realized I forgot to mention one very importance criterion for our
SGE setup: Not only do we want equal priority per user, but we also want
the fairness policy to be per queue.  That is, if I launch one job on
first.q--which is otherwise unused; and then I launch 10 jobs on second.q
which already has jobs queued by three other people, the expectation is
that one out of four jobs dispatched from second.q will be mine--even
though the job on first.q is still running.

Is that how SGE will work with the Functional Policy?

Thanks,

-- 
Mun



On Fri, Aug 12, 2016 at 10:51 PM, Mun Johl <m...@apeirondata.com> wrote:

> Hi all,
>
> I just thought I'd get a sanity check from you all before I actually make
> the SGE changes.  As a reminder, we basically want to implement a fairness
> scheme solely based on user.  So if User-A launches 10 jobs, and User-B
> launches 10 jobs--and let's say there is only one execution host--then we
> want User-A and User-B's jobs to dispatched in a round-robin fashion.
>
> Based on the information you folks have provided, it seems the Functional
> Policy is the most appropriate for us.  The Share Tree Policy's historical
> usage characteristic would probably not be appreciated by some folks I'm
> afraid (unless the lifetime parameter could be set to something on the
> order of minutes--which the documentation seems to state is not possible).
>
> Regarding how to actually make this change: I was planning to use the
> Policy Configuration tool in QMON and simply selecting the "Functional
> Policy" button.  We don't have too many users at this point so I plan to
> add them manually and then set the number of total functional tickets to be
> 200 per user (e.g.).  And I will Clear the "Share Functional Ticket" button
> so that each job has the same relative priority.
>
> Am I close?  Anything I've missed or misunderstood?
>
> Thanks,
>
> --
> Mun
>
>
> On Thu, Aug 11, 2016 at 4:40 PM, Fotis Georgatos <kefalo...@gmail.com>
> wrote:
>
>> And if you need a tool to keep an eye on the action as it happens, have a
>> look at qtop:
>>
>> http://github.com/qtop/qtop
>>
>> Policy application monitoring at your fingertips.
>>
>> Enjoy, F.
>>
>> On Thursday, August 11, 2016, Mun Johl <m...@apeirondata.com> wrote:
>>
>>> Hi Sean, Christopher,
>>>
>>> Thanks for the link, Sean.  I'll definitely give that a read and see if
>>> I can figure out the details (I'm kind of new to configuring SGE).  And
>>> thanks to both of you for your feedback.
>>>
>>> Regards,
>>>
>>> --
>>> Mun
>>>
>>>
>>> On Thu, Aug 11, 2016 at 5:03 AM, Sean Smith <sean.sm...@softmachines.com
>>> > wrote:
>>>
>>>> I will add that I just didn't provide a link we use this in production
>>>> at Soft Machines and it works well.
>>>>
>>>> Sean
>>>> --
>>>> *From:* Christopher Heiny [christopherhe...@gmail.com]
>>>> *Sent:* Thursday, August 11, 2016 4:33 AM
>>>> *To:* Sean Smith
>>>> *Cc:* users@gridengine.org; Mun Johl
>>>> *Subject:* Re: [gridengine users] Is round-robin scheduling per user
>>>> possible?
>>>>
>>>> On Aug 10, 2016 11:49 PM, "Sean Smith" <sean.sm...@softmachines.com>
>>>> wrote:
>>>> >
>>>> > I would recommend reading this.
>>>> >
>>>> > http://arc.liv.ac.uk/SGE/howto/sge-configs.html#_fair_share
>>>>
>>>> Hi Mun,
>>>>
>>>> We use the fair share approach that Sean links to provide round robin
>>>> scheduling within queues at our site.
>>>>
>>>> Cheers,
>>>> Chris
>>>>
>>>> >
>>>> > Sean
>>>> > 
>>>> > From: users-boun...@gridengine.org [users-boun...@gridengine.org] on
>>>> behalf of Mun Johl [m...@apeirondata.com]
>>>> > Sent: Wednesday, August 10, 2016 8:27 PM
>>>> > To: users@gridengine.org
>>>> > Subject: [gridengine users] Is round-robin scheduling per user
>>>> possible?
>>>> >
>>>> > Hi,
>>>> >
>>>> > First a simple question: How do I change the priority of a queue?  I
>>>> thought that would be trivial via qmon; but I couldn't find where the
>>>> priority could be changed (at least, not in our setup).
>>>> >
>>>> 

Re: [gridengine users] Is round-robin scheduling per user possible?

2016-08-12 Thread Mun Johl
Hi all,

I just thought I'd get a sanity check from you all before I actually make
the SGE changes.  As a reminder, we basically want to implement a fairness
scheme solely based on user.  So if User-A launches 10 jobs, and User-B
launches 10 jobs--and let's say there is only one execution host--then we
want User-A and User-B's jobs to dispatched in a round-robin fashion.

Based on the information you folks have provided, it seems the Functional
Policy is the most appropriate for us.  The Share Tree Policy's historical
usage characteristic would probably not be appreciated by some folks I'm
afraid (unless the lifetime parameter could be set to something on the
order of minutes--which the documentation seems to state is not possible).

Regarding how to actually make this change: I was planning to use the
Policy Configuration tool in QMON and simply selecting the "Functional
Policy" button.  We don't have too many users at this point so I plan to
add them manually and then set the number of total functional tickets to be
200 per user (e.g.).  And I will Clear the "Share Functional Ticket" button
so that each job has the same relative priority.

Am I close?  Anything I've missed or misunderstood?

Thanks,

-- 
Mun


On Thu, Aug 11, 2016 at 4:40 PM, Fotis Georgatos <kefalo...@gmail.com>
wrote:

> And if you need a tool to keep an eye on the action as it happens, have a
> look at qtop:
>
> http://github.com/qtop/qtop
>
> Policy application monitoring at your fingertips.
>
> Enjoy, F.
>
> On Thursday, August 11, 2016, Mun Johl <m...@apeirondata.com> wrote:
>
>> Hi Sean, Christopher,
>>
>> Thanks for the link, Sean.  I'll definitely give that a read and see if I
>> can figure out the details (I'm kind of new to configuring SGE).  And
>> thanks to both of you for your feedback.
>>
>> Regards,
>>
>> --
>> Mun
>>
>>
>> On Thu, Aug 11, 2016 at 5:03 AM, Sean Smith <sean.sm...@softmachines.com>
>> wrote:
>>
>>> I will add that I just didn't provide a link we use this in production
>>> at Soft Machines and it works well.
>>>
>>> Sean
>>> --
>>> *From:* Christopher Heiny [christopherhe...@gmail.com]
>>> *Sent:* Thursday, August 11, 2016 4:33 AM
>>> *To:* Sean Smith
>>> *Cc:* users@gridengine.org; Mun Johl
>>> *Subject:* Re: [gridengine users] Is round-robin scheduling per user
>>> possible?
>>>
>>> On Aug 10, 2016 11:49 PM, "Sean Smith" <sean.sm...@softmachines.com>
>>> wrote:
>>> >
>>> > I would recommend reading this.
>>> >
>>> > http://arc.liv.ac.uk/SGE/howto/sge-configs.html#_fair_share
>>>
>>> Hi Mun,
>>>
>>> We use the fair share approach that Sean links to provide round robin
>>> scheduling within queues at our site.
>>>
>>> Cheers,
>>> Chris
>>>
>>> >
>>> > Sean
>>> > 
>>> > From: users-boun...@gridengine.org [users-boun...@gridengine.org] on
>>> behalf of Mun Johl [m...@apeirondata.com]
>>> > Sent: Wednesday, August 10, 2016 8:27 PM
>>> > To: users@gridengine.org
>>> > Subject: [gridengine users] Is round-robin scheduling per user
>>> possible?
>>> >
>>> > Hi,
>>> >
>>> > First a simple question: How do I change the priority of a queue?  I
>>> thought that would be trivial via qmon; but I couldn't find where the
>>> priority could be changed (at least, not in our setup).
>>> >
>>> > Now, my real question is this: Is it possible to configure SGE such
>>> that jobs are dispatched to execution hosts on a round-robin per user
>>> configuration?  For example, if User-A queues up 10 jobs, and then User-B
>>> queues up 10 jobs to the same queue; can SGE be configured to dispatch each
>>> of the 20 jobs to execution hosts in a round-robin fashion per user?  If
>>> so, how?
>>> >
>>> > If the above is not possible, then I'm thinking of creating a queue
>>> per user.  In that scenario, is it possible to configure SGE to dispatch to
>>> execution hosts in a round-robin fashion per queue?
>>> >
>>> > Many thanks,
>>> >
>>> > --
>>> > Mun
>>> >
>>> > ___
>>> > users mailing list
>>> > users@gridengine.org
>>> > https://gridengine.org/mailman/listinfo/users
>>> >
>>>
>>
>>
>
> --
> echo "sysadmin know better bash than english"|sed s/min/mins/ \
>   | sed 's/better bash/bash better/' # signal detected in a CERN forum
>
>
>
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Is round-robin scheduling per user possible?

2016-08-11 Thread Mun Johl
Hi Fotis,

Thanks for the suggestion; I will definitely look into qtop.

Regards,

-- 
Mun


On Thu, Aug 11, 2016 at 4:40 PM, Fotis Georgatos <kefalo...@gmail.com>
wrote:

> And if you need a tool to keep an eye on the action as it happens, have a
> look at qtop:
>
> http://github.com/qtop/qtop
>
> Policy application monitoring at your fingertips.
>
> Enjoy, F.
>
> On Thursday, August 11, 2016, Mun Johl <m...@apeirondata.com> wrote:
>
>> Hi Sean, Christopher,
>>
>> Thanks for the link, Sean.  I'll definitely give that a read and see if I
>> can figure out the details (I'm kind of new to configuring SGE).  And
>> thanks to both of you for your feedback.
>>
>> Regards,
>>
>> --
>> Mun
>>
>>
>> On Thu, Aug 11, 2016 at 5:03 AM, Sean Smith <sean.sm...@softmachines.com>
>> wrote:
>>
>>> I will add that I just didn't provide a link we use this in production
>>> at Soft Machines and it works well.
>>>
>>> Sean
>>> ----------
>>> *From:* Christopher Heiny [christopherhe...@gmail.com]
>>> *Sent:* Thursday, August 11, 2016 4:33 AM
>>> *To:* Sean Smith
>>> *Cc:* users@gridengine.org; Mun Johl
>>> *Subject:* Re: [gridengine users] Is round-robin scheduling per user
>>> possible?
>>>
>>> On Aug 10, 2016 11:49 PM, "Sean Smith" <sean.sm...@softmachines.com>
>>> wrote:
>>> >
>>> > I would recommend reading this.
>>> >
>>> > http://arc.liv.ac.uk/SGE/howto/sge-configs.html#_fair_share
>>>
>>> Hi Mun,
>>>
>>> We use the fair share approach that Sean links to provide round robin
>>> scheduling within queues at our site.
>>>
>>> Cheers,
>>> Chris
>>>
>>> >
>>> > Sean
>>> > 
>>> > From: users-boun...@gridengine.org [users-boun...@gridengine.org] on
>>> behalf of Mun Johl [m...@apeirondata.com]
>>> > Sent: Wednesday, August 10, 2016 8:27 PM
>>> > To: users@gridengine.org
>>> > Subject: [gridengine users] Is round-robin scheduling per user
>>> possible?
>>> >
>>> > Hi,
>>> >
>>> > First a simple question: How do I change the priority of a queue?  I
>>> thought that would be trivial via qmon; but I couldn't find where the
>>> priority could be changed (at least, not in our setup).
>>> >
>>> > Now, my real question is this: Is it possible to configure SGE such
>>> that jobs are dispatched to execution hosts on a round-robin per user
>>> configuration?  For example, if User-A queues up 10 jobs, and then User-B
>>> queues up 10 jobs to the same queue; can SGE be configured to dispatch each
>>> of the 20 jobs to execution hosts in a round-robin fashion per user?  If
>>> so, how?
>>> >
>>> > If the above is not possible, then I'm thinking of creating a queue
>>> per user.  In that scenario, is it possible to configure SGE to dispatch to
>>> execution hosts in a round-robin fashion per queue?
>>> >
>>> > Many thanks,
>>> >
>>> > --
>>> > Mun
>>> >
>>> > ___
>>> > users mailing list
>>> > users@gridengine.org
>>> > https://gridengine.org/mailman/listinfo/users
>>> >
>>>
>>
>>
>
> --
> echo "sysadmin know better bash than english"|sed s/min/mins/ \
>   | sed 's/better bash/bash better/' # signal detected in a CERN forum
>
>
>
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Is round-robin scheduling per user possible?

2016-08-11 Thread Mun Johl
Hi Sean, Christopher,

Thanks for the link, Sean.  I'll definitely give that a read and see if I
can figure out the details (I'm kind of new to configuring SGE).  And
thanks to both of you for your feedback.

Regards,

-- 
Mun


On Thu, Aug 11, 2016 at 5:03 AM, Sean Smith <sean.sm...@softmachines.com>
wrote:

> I will add that I just didn't provide a link we use this in production at
> Soft Machines and it works well.
>
> Sean
> --
> *From:* Christopher Heiny [christopherhe...@gmail.com]
> *Sent:* Thursday, August 11, 2016 4:33 AM
> *To:* Sean Smith
> *Cc:* users@gridengine.org; Mun Johl
> *Subject:* Re: [gridengine users] Is round-robin scheduling per user
> possible?
>
> On Aug 10, 2016 11:49 PM, "Sean Smith" <sean.sm...@softmachines.com>
> wrote:
> >
> > I would recommend reading this.
> >
> > http://arc.liv.ac.uk/SGE/howto/sge-configs.html#_fair_share
>
> Hi Mun,
>
> We use the fair share approach that Sean links to provide round robin
> scheduling within queues at our site.
>
> Cheers,
> Chris
>
> >
> > Sean
> > 
> > From: users-boun...@gridengine.org [users-boun...@gridengine.org] on
> behalf of Mun Johl [m...@apeirondata.com]
> > Sent: Wednesday, August 10, 2016 8:27 PM
> > To: users@gridengine.org
> > Subject: [gridengine users] Is round-robin scheduling per user possible?
> >
> > Hi,
> >
> > First a simple question: How do I change the priority of a queue?  I
> thought that would be trivial via qmon; but I couldn't find where the
> priority could be changed (at least, not in our setup).
> >
> > Now, my real question is this: Is it possible to configure SGE such that
> jobs are dispatched to execution hosts on a round-robin per user
> configuration?  For example, if User-A queues up 10 jobs, and then User-B
> queues up 10 jobs to the same queue; can SGE be configured to dispatch each
> of the 20 jobs to execution hosts in a round-robin fashion per user?  If
> so, how?
> >
> > If the above is not possible, then I'm thinking of creating a queue per
> user.  In that scenario, is it possible to configure SGE to dispatch to
> execution hosts in a round-robin fashion per queue?
> >
> > Many thanks,
> >
> > --
> > Mun
> >
> > ___
> > users mailing list
> > users@gridengine.org
> > https://gridengine.org/mailman/listinfo/users
> >
>
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


[gridengine users] Is round-robin scheduling per user possible?

2016-08-10 Thread Mun Johl
Hi,

First a simple question: How do I change the priority of a queue?  I
thought that would be trivial via qmon; but I couldn't find where the
priority could be changed (at least, not in our setup).

Now, my real question is this: Is it possible to configure SGE such that
jobs are dispatched to execution hosts on a round-robin per user
configuration?  For example, if User-A queues up 10 jobs, and then User-B
queues up 10 jobs to the same queue; can SGE be configured to dispatch each
of the 20 jobs to execution hosts in a round-robin fashion per user?  If
so, how?

If the above is not possible, then I'm thinking of creating a queue per
user.  In that scenario, is it possible to configure SGE to dispatch to
execution hosts in a round-robin fashion per queue?

Many thanks,

-- 
Mun
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Why is .cshrc read automatically

2016-05-10 Thread Mun Johl
Hi Jesse,

Thanks very much for your reply.
See my inline comments below.


On Tue, May 10, 2016 at 1:37 PM, Jesse Connell <jess...@bu.edu> wrote:

> Hi Mun,
>
> What's the shell set to for the queue?  If I remember right csh is the
> default (and it'll use the default if you don't specify -S).  For example:
>
> For our setup "qconf -sconf | grep shell" shows:
>
> shell_start_mode posix_compliant
> login_shells sh,bash,ksh,csh,tcsh
>

*​[Mun] I get the same results.​ *


> and "qconf -sq somequeue | grep shell" shows:
>
> shell /bin/bash
> shell_start_mode  posix_compliant
>

*​[Mun] Yikes!  Here are my results:*

*shell /bin/csh*
*shell_start_mode  posix_compliant*

*Okay, I'm new to grid config; so does the above imply that the CSH startup
file will be automatically read (if found) for every qsub job?*
​


>
> We recently switched the shell from /bin/csh to /bin/bash on each queue
> since so few were using csh anyway.  This still tries to source
> .bash_profile, since it'll be a login shell either way.  (Is that the part
> that's problematic, though, as opposed to csh vs. bash?)
>

*​[Mun] Yes (if I understand correctly).  That is, I don't want the shell
startup file(s) to be read re​gardless of the configured shell for the
queue.  I want qsub to leverage the calling env's variables (thus, I use
'qsub -V').  Is that possible?*

*Thanks again for the informative reply, Jesse.*

*-- *
*Mun*


> Jesse
>
>
> On 05/10/2016 01:30 PM, Mun Johl wrote:
>
>> Hi,
>>
>> I'm running OGS/GE 2011.11p1 on CentOS 6.5 .
>>
>> I have noticed that for some reason ~/.cshrc is read automatically for
>> every qsub command I issue.  My default environment is actually Bash; I
>> only put a .cshrc in my home dir to test something for a peer and then I
>> noticed qsub would read the .cshrc file automatically (which negatively
>> affected my jobs).
>>
>> I do not have a .sge_request file in my home dir nor in the dir from
>> where I launch qsub.  And my qsub options do not include -S .
>>
>> I've tried various SGE DEBUG LEVELs, but none that I've tried have left
>> breadcrumbs as to why ~/.cshrc was being read.
>>
>> This is driving me crazy, so any insight would be helpful.  Yes, I
>> _could_ just remove the .cshrc file, but others here actually use
>> csh/tcsh and could be affected by this as well.  Therefore, I'd like to
>> root cause the issue if possible.
>>
>> Thanks,
>>
>> --
>> Mun
>>
>>
>>
>> ___
>> users mailing list
>> users@gridengine.org
>> https://gridengine.org/mailman/listinfo/users
>>
>> ___
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users
>
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


[gridengine users] Why is .cshrc read automatically

2016-05-10 Thread Mun Johl
Hi,

I'm running OGS/GE 2011.11p1 on CentOS 6.5 .

I have noticed that for some reason ~/.cshrc is read automatically for
every qsub command I issue.  My default environment is actually Bash; I
only put a .cshrc in my home dir to test something for a peer and then I
noticed qsub would read the .cshrc file automatically (which negatively
affected my jobs).

I do not have a .sge_request file in my home dir nor in the dir from where
I launch qsub.  And my qsub options do not include -S .

I've tried various SGE DEBUG LEVELs, but none that I've tried have left
breadcrumbs as to why ~/.cshrc was being read.

This is driving me crazy, so any insight would be helpful.  Yes, I _could_
just remove the .cshrc file, but others here actually use csh/tcsh and
could be affected by this as well.  Therefore, I'd like to root cause the
issue if possible.

Thanks,

-- 
Mun
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users