Re: [gridengine users] Configurations during kickstart

2014-05-01 Thread Chris Dagdigian

Random ideas:

1. try disabling the log redirects to see if anything ends up in the
standard kickstart log?

2. SGE is unusually sensitive to hostname and DNS resolution. Is your
kickstart environment giving the node the same IP address during
provisioning as it has when running? Does your kickstart environment
have reverse DNS lookup working so that a lookup on the IP returns the
proper hostname?

3. qconf requires communication with the qmaster, it looks like you are
defining ENV vars that point only to the bin directory rather than
setting up the full SGE environment during the kickstart. Consider
sourcing the SGE init scripts or at least setting SGE_ROOT and SGE_CELL
values so that the SGE binaries can navigate to
$SGE_ROOT/$SGE_CELL/act_qmaster so that it knows what host to be
communicating with

Regards,
Chris


Michael Stauffer wrote:
 Hi,
 
 I'm trying to get some resource configurations in place during
 kickstart. I have the following in my kickstart file
 replace-partition.xml. The file is run during kickstart: I can see
 output to text files when I add debugging info.
 
 This code runs correctly if I run it in a shell once the node is up.
 
 The issue seems to be that qhost and qconf aren't outputting anything
 when they run. Is that to be expected? Here's what I have added:
 
 post
 
   snipped the default stuff for this post...
 
 # Here's the code as I'd like it to work:
 # This code gets reached. I can output these env vars and the
 #  values are correct.
 export SGEBIN=$SGE_ROOT/bin/$SGE_ARCH
 export NODE=$(/bin/hostname -s)
 export MEMFREE=`$SGEBIN/qhost -F mem_total -h $NODE|tail -n
 1|cut -d: -f3 | cut -d= -f2`
 $SGEBIN/qconf -mattr exechost complex_values h_vmem=$MEMFREE
 $NODE 2gt;amp;1 gt; /root/qconf_complex_setup.log
 $SGEBIN/qconf -mattr exechost complex_values s_vmem=$MEMFREE
 $NODE 2gt;amp;1 gt;gt; /root/qconf_complex_setup.log
 
 /post
 
 Thanks!
 
 -M
 
 ___
 users mailing list
 users@gridengine.org
 https://gridengine.org/mailman/listinfo/users
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Configurations during kickstart

2014-05-01 Thread Reuti
Hi,

Am 01.05.2014 um 19:58 schrieb Michael Stauffer:

 I'm trying to get some resource configurations in place during kickstart. I 
 have the following in my kickstart file replace-partition.xml. The file is 
 run during kickstart: I can see output to text files when I add debugging 
 info.
 
 This code runs correctly if I run it in a shell once the node is up.
 
 The issue seems to be that qhost and qconf aren't outputting anything when 
 they run. Is that to be expected? Here's what I have added:

 post
 
   snipped the default stuff for this post...
 
 # Here's the code as I'd like it to work:
 # This code gets reached. I can output these env vars and the
 #  values are correct.
 export SGEBIN=$SGE_ROOT/bin/$SGE_ARCH
 export NODE=$(/bin/hostname -s)
 export MEMFREE=`$SGEBIN/qhost -F mem_total -h $NODE|tail -n 1|cut -d: 
 -f3 | cut -d= -f2`
 $SGEBIN/qconf -mattr exechost complex_values h_vmem=$MEMFREE $NODE 
 2gt;amp;1 gt; /root/qconf_complex_setup.log
 $SGEBIN/qconf -mattr exechost complex_values s_vmem=$MEMFREE $NODE 
 2gt;amp;1 gt;gt;

Might be intended, but this syntax will put the error to the default output and 
only the default output in the logfile. In case you want to capture both it 
needs to be written as:

qconf ... /root/qconf_complex_setup.log 21

-- Reuti


 /root/qconf_complex_setup.log
 
 /post
 
 Thanks!
 
 -M
 
 ___
 users mailing list
 users@gridengine.org
 https://gridengine.org/mailman/listinfo/users


___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Configurations during kickstart

2014-05-01 Thread Michael Stauffer

 Random ideas:

 1. try disabling the log redirects to see if anything ends up in the
 standard kickstart log?


OK I'll try this. Have to wait for a host to free up to try a reinstall
again.


 2. SGE is unusually sensitive to hostname and DNS resolution. Is your
 kickstart environment giving the node the same IP address during
 provisioning as it has when running? Does your kickstart environment
 have reverse DNS lookup working so that a lookup on the IP returns the
 proper hostname?


I'll dump tests in the kickstart file and check.
Don't know how to check the last bit - you mean a lookup on the IP by the
execute host as it's booting?


 3. qconf requires communication with the qmaster, it looks like you are
 defining ENV vars that point only to the bin directory rather than
 setting up the full SGE environment during the kickstart. Consider
 sourcing the SGE init scripts or at least setting SGE_ROOT and SGE_CELL
 values so that the SGE binaries can navigate to
 $SGE_ROOT/$SGE_CELL/act_qmaster so that it knows what host to be
 communicating with


I source /etc/profile.d/sge-binaries.sh at the begin of my code. Should I
need something else than that? In any case I'm dumping relevent env vars in
the kickstart now to check them.

Thanks

-M

Regards,
 Chris


 Michael Stauffer wrote:
  Hi,
 
  I'm trying to get some resource configurations in place during
  kickstart. I have the following in my kickstart file
  replace-partition.xml. The file is run during kickstart: I can see
  output to text files when I add debugging info.
 
  This code runs correctly if I run it in a shell once the node is up.
 
  The issue seems to be that qhost and qconf aren't outputting anything
  when they run. Is that to be expected? Here's what I have added:
 
  post
 
snipped the default stuff for this post...
 
  # Here's the code as I'd like it to work:
  # This code gets reached. I can output these env vars and the
  #  values are correct.
  export SGEBIN=$SGE_ROOT/bin/$SGE_ARCH
  export NODE=$(/bin/hostname -s)
  export MEMFREE=`$SGEBIN/qhost -F mem_total -h $NODE|tail -n
  1|cut -d: -f3 | cut -d= -f2`
  $SGEBIN/qconf -mattr exechost complex_values h_vmem=$MEMFREE
  $NODE 2gt;amp;1 gt; /root/qconf_complex_setup.log
  $SGEBIN/qconf -mattr exechost complex_values s_vmem=$MEMFREE
  $NODE 2gt;amp;1 gt;gt; /root/qconf_complex_setup.log
 
  /post
 
  Thanks!
 
  -M
 
  ___
  users mailing list
  users@gridengine.org
  https://gridengine.org/mailman/listinfo/users

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Configurations during kickstart

2014-05-01 Thread Michael Stauffer
On Thu, May 1, 2014 at 2:26 PM, Reuti re...@staff.uni-marburg.de wrote:

 Hi,

 Am 01.05.2014 um 19:58 schrieb Michael Stauffer:

  I'm trying to get some resource configurations in place during
 kickstart. I have the following in my kickstart file
 replace-partition.xml. The file is run during kickstart: I can see output
 to text files when I add debugging info.
 
  This code runs correctly if I run it in a shell once the node is up.
 
  The issue seems to be that qhost and qconf aren't outputting anything
 when they run. Is that to be expected? Here's what I have added:

  post
 
snipped the default stuff for this post...
 
  # Here's the code as I'd like it to work:
  # This code gets reached. I can output these env vars and the
  #  values are correct.
  export SGEBIN=$SGE_ROOT/bin/$SGE_ARCH
  export NODE=$(/bin/hostname -s)
  export MEMFREE=`$SGEBIN/qhost -F mem_total -h $NODE|tail -n
 1|cut -d: -f3 | cut -d= -f2`
  $SGEBIN/qconf -mattr exechost complex_values h_vmem=$MEMFREE
 $NODE 2gt;amp;1 gt; /root/qconf_complex_setup.log
  $SGEBIN/qconf -mattr exechost complex_values s_vmem=$MEMFREE
 $NODE 2gt;amp;1 gt;gt;

 Might be intended, but this syntax will put the error to the default
 output and only the default output in the logfile. In case you want to
 capture both it needs to be written as:

 qconf ... /root/qconf_complex_setup.log 21

 -- Reuti


Thanks, that was a mistake.

-M
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Configurations during kickstart

2014-05-01 Thread Michael Stauffer
On Thu, May 1, 2014 at 2:27 PM, Jesse Becker becke...@mail.nih.gov wrote:

 On Thu, May 01, 2014 at 01:58:04PM -0400, Michael Stauffer wrote:

 I'm trying to get some resource configurations in place during kickstart.
 I
 have the following in my kickstart file replace-partition.xml. The file
 is run during kickstart: I can see output to text files when I add
 debugging info.


 I've recently been doing something similar with our system provisioning,
 although not directly in kickstart (we aren't using Rocks either, but I
 don't think that's the problem).


  This code runs correctly if I run it in a shell once the node is up.

 The issue seems to be that qhost and qconf aren't outputting anything when
 they run. Is that to be expected? Here's what I have added:


 I think the reason is one of timing.

 Working backwards, you want to do this:

 4. configure exechost settings with information reported by qhost
 3. for qhost to report info, sge_execd must be running on the node
 2. for sge_execd to start, the node must be added via 'qconf -ae'
 1. something needs to watch for new nodes, and trigger 'qconf -ae'

 I forget exactly when Rocks automagically adds nodes to SGE (the qconf
 -ae' bit, but I bet it hasn't happened yet.  Thus, sge_execd can't
 start, so qhost can't report host info, so qconf -mattr fails.

 A few possible solutions:

 1 .You might be able to somehow force this part of the %post script to
 run after the master adds the new node.  Maybe part of the firstboot
 service?

 2. Create a service that watches for new nodes, and configures them
 accordingly.

 3. Have a cronjob that periodically configures *all* hosts (even old
 nodes, to catch HW changes).

 (we've opted for something between options 2 and 3--we look at all
 nodes, all the time, but only update new ones).


Thanks. I've implemented option 3 for the time being. New hosts are rarely
added or rebooted here so a periodic cron job will probably be just fine.

-M





  post

  snipped the default stuff for this post...

# Here's the code as I'd like it to work:
# This code gets reached. I can output these env vars and the
#  values are correct.
export SGEBIN=$SGE_ROOT/bin/$SGE_ARCH
export NODE=$(/bin/hostname -s)
export MEMFREE=`$SGEBIN/qhost -F mem_total -h $NODE|tail -n 1|cut
 -d: -f3 | cut -d= -f2`
$SGEBIN/qconf -mattr exechost complex_values h_vmem=$MEMFREE $NODE
 2gt;amp;1 gt; /root/qconf_complex_setup.log
$SGEBIN/qconf -mattr exechost complex_values s_vmem=$MEMFREE $NODE
 2gt;amp;1 gt;gt; /root/qconf_complex_setup.log

 /post

 Thanks!

 -M


  ___
 users mailing list
 users@gridengine.org
 https://gridengine.org/mailman/listinfo/users



 --
 Jesse Becker (Contractor)

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Configurations during kickstart

2014-05-01 Thread Prentice Bisbal

On 05/01/2014 02:26 PM, Reuti wrote:

Hi,

Am 01.05.2014 um 19:58 schrieb Michael Stauffer:


I'm trying to get some resource configurations in place during kickstart. I have the 
following in my kickstart file replace-partition.xml. The file is run during 
kickstart: I can see output to text files when I add debugging info.

This code runs correctly if I run it in a shell once the node is up.

The issue seems to be that qhost and qconf aren't outputting anything when they 
run. Is that to be expected? Here's what I have added:
post

   snipped the default stuff for this post...

 # Here's the code as I'd like it to work:
 # This code gets reached. I can output these env vars and the
 #  values are correct.
 export SGEBIN=$SGE_ROOT/bin/$SGE_ARCH
 export NODE=$(/bin/hostname -s)
 export MEMFREE=`$SGEBIN/qhost -F mem_total -h $NODE|tail -n 1|cut -d: 
-f3 | cut -d= -f2`
 $SGEBIN/qconf -mattr exechost complex_values h_vmem=$MEMFREE $NODE 
2gt;amp;1 gt; /root/qconf_complex_setup.log
 $SGEBIN/qconf -mattr exechost complex_values s_vmem=$MEMFREE $NODE 2gt;amp;1 
gt;gt;

Might be intended, but this syntax will put the error to the default output and 
only the default output in the logfile. In case you want to capture both it 
needs to be written as:

qconf ... /root/qconf_complex_setup.log 21


You can also use eliminate the redirection all together and use the 
--log option to the postinstall script in your kickstart file, like this:


%post --log=/root/post-install.log

# do something here
# do something else here

% end


___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] Configurations during kickstart

2014-05-01 Thread Prentice Bisbal

On 05/01/2014 02:47 PM, Michael Stauffer wrote:


Random ideas:

1. try disabling the log redirects to see if anything ends up in the
standard kickstart log?


OK I'll try this. Have to wait for a host to free up to try a 
reinstall again.


2. SGE is unusually sensitive to hostname and DNS resolution. Is your
kickstart environment giving the node the same IP address during
provisioning as it has when running? Does your kickstart environment
have reverse DNS lookup working so that a lookup on the IP returns the
proper hostname?


I'll dump tests in the kickstart file and check.
Don't know how to check the last bit - you mean a lookup on the IP by 
the execute host as it's booting?


Here's tip for trouble-shooting kickstart installs:

Depending on where you want to do your debugging (before or after the 
installation) add something like sleep 1000 to your pre- or 
post-install script. Then from the console, use ALT+F1, ALT+F2, etc., to 
access get a root prompt and run some commands from the command-line. 
You can also cd  to /tmp and look at the logs there, as well as the 
kickstart file that the install is working from. This is much easier and 
quicker than changing kickstart file, reboot, test, change kickstart 
file, reboot, test,...


3. qconf requires communication with the qmaster, it looks like
you are
defining ENV vars that point only to the bin directory rather than
setting up the full SGE environment during the kickstart. Consider
sourcing the SGE init scripts or at least setting SGE_ROOT and
SGE_CELL
values so that the SGE binaries can navigate to
$SGE_ROOT/$SGE_CELL/act_qmaster so that it knows what host to be
communicating with


I source /etc/profile.d/sge-binaries.sh at the begin of my code. 
Should I need something else than that? In any case I'm dumping 
relevent env vars in the kickstart now to check them.


Just for the record, I tried doing this a few years ago with SGE 6.2u5, 
and for whatever reason, I couldn't get the inst_sge script to ever work 
correctly in the post-install environment. After a few days of fighting 
with it, I configured everything BUT sge and then used Cluster SSH to 
run ./inst_sge on all 64 hosts simultaneously, in auto-mode with no 
interaction, obviously.


potentially dumb question: Are you running inst_sge first, to make sure 
the host is configured and 'installed' properly before running those 
qconf commands?


Prentice





___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users