Re: [gridengine users] Configurations during kickstart
Random ideas: 1. try disabling the log redirects to see if anything ends up in the standard kickstart log? 2. SGE is unusually sensitive to hostname and DNS resolution. Is your kickstart environment giving the node the same IP address during provisioning as it has when running? Does your kickstart environment have reverse DNS lookup working so that a lookup on the IP returns the proper hostname? 3. qconf requires communication with the qmaster, it looks like you are defining ENV vars that point only to the bin directory rather than setting up the full SGE environment during the kickstart. Consider sourcing the SGE init scripts or at least setting SGE_ROOT and SGE_CELL values so that the SGE binaries can navigate to $SGE_ROOT/$SGE_CELL/act_qmaster so that it knows what host to be communicating with Regards, Chris Michael Stauffer wrote: Hi, I'm trying to get some resource configurations in place during kickstart. I have the following in my kickstart file replace-partition.xml. The file is run during kickstart: I can see output to text files when I add debugging info. This code runs correctly if I run it in a shell once the node is up. The issue seems to be that qhost and qconf aren't outputting anything when they run. Is that to be expected? Here's what I have added: post snipped the default stuff for this post... # Here's the code as I'd like it to work: # This code gets reached. I can output these env vars and the # values are correct. export SGEBIN=$SGE_ROOT/bin/$SGE_ARCH export NODE=$(/bin/hostname -s) export MEMFREE=`$SGEBIN/qhost -F mem_total -h $NODE|tail -n 1|cut -d: -f3 | cut -d= -f2` $SGEBIN/qconf -mattr exechost complex_values h_vmem=$MEMFREE $NODE 2gt;amp;1 gt; /root/qconf_complex_setup.log $SGEBIN/qconf -mattr exechost complex_values s_vmem=$MEMFREE $NODE 2gt;amp;1 gt;gt; /root/qconf_complex_setup.log /post Thanks! -M ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Configurations during kickstart
Hi, Am 01.05.2014 um 19:58 schrieb Michael Stauffer: I'm trying to get some resource configurations in place during kickstart. I have the following in my kickstart file replace-partition.xml. The file is run during kickstart: I can see output to text files when I add debugging info. This code runs correctly if I run it in a shell once the node is up. The issue seems to be that qhost and qconf aren't outputting anything when they run. Is that to be expected? Here's what I have added: post snipped the default stuff for this post... # Here's the code as I'd like it to work: # This code gets reached. I can output these env vars and the # values are correct. export SGEBIN=$SGE_ROOT/bin/$SGE_ARCH export NODE=$(/bin/hostname -s) export MEMFREE=`$SGEBIN/qhost -F mem_total -h $NODE|tail -n 1|cut -d: -f3 | cut -d= -f2` $SGEBIN/qconf -mattr exechost complex_values h_vmem=$MEMFREE $NODE 2gt;amp;1 gt; /root/qconf_complex_setup.log $SGEBIN/qconf -mattr exechost complex_values s_vmem=$MEMFREE $NODE 2gt;amp;1 gt;gt; Might be intended, but this syntax will put the error to the default output and only the default output in the logfile. In case you want to capture both it needs to be written as: qconf ... /root/qconf_complex_setup.log 21 -- Reuti /root/qconf_complex_setup.log /post Thanks! -M ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Configurations during kickstart
Random ideas: 1. try disabling the log redirects to see if anything ends up in the standard kickstart log? OK I'll try this. Have to wait for a host to free up to try a reinstall again. 2. SGE is unusually sensitive to hostname and DNS resolution. Is your kickstart environment giving the node the same IP address during provisioning as it has when running? Does your kickstart environment have reverse DNS lookup working so that a lookup on the IP returns the proper hostname? I'll dump tests in the kickstart file and check. Don't know how to check the last bit - you mean a lookup on the IP by the execute host as it's booting? 3. qconf requires communication with the qmaster, it looks like you are defining ENV vars that point only to the bin directory rather than setting up the full SGE environment during the kickstart. Consider sourcing the SGE init scripts or at least setting SGE_ROOT and SGE_CELL values so that the SGE binaries can navigate to $SGE_ROOT/$SGE_CELL/act_qmaster so that it knows what host to be communicating with I source /etc/profile.d/sge-binaries.sh at the begin of my code. Should I need something else than that? In any case I'm dumping relevent env vars in the kickstart now to check them. Thanks -M Regards, Chris Michael Stauffer wrote: Hi, I'm trying to get some resource configurations in place during kickstart. I have the following in my kickstart file replace-partition.xml. The file is run during kickstart: I can see output to text files when I add debugging info. This code runs correctly if I run it in a shell once the node is up. The issue seems to be that qhost and qconf aren't outputting anything when they run. Is that to be expected? Here's what I have added: post snipped the default stuff for this post... # Here's the code as I'd like it to work: # This code gets reached. I can output these env vars and the # values are correct. export SGEBIN=$SGE_ROOT/bin/$SGE_ARCH export NODE=$(/bin/hostname -s) export MEMFREE=`$SGEBIN/qhost -F mem_total -h $NODE|tail -n 1|cut -d: -f3 | cut -d= -f2` $SGEBIN/qconf -mattr exechost complex_values h_vmem=$MEMFREE $NODE 2gt;amp;1 gt; /root/qconf_complex_setup.log $SGEBIN/qconf -mattr exechost complex_values s_vmem=$MEMFREE $NODE 2gt;amp;1 gt;gt; /root/qconf_complex_setup.log /post Thanks! -M ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Configurations during kickstart
On Thu, May 1, 2014 at 2:26 PM, Reuti re...@staff.uni-marburg.de wrote: Hi, Am 01.05.2014 um 19:58 schrieb Michael Stauffer: I'm trying to get some resource configurations in place during kickstart. I have the following in my kickstart file replace-partition.xml. The file is run during kickstart: I can see output to text files when I add debugging info. This code runs correctly if I run it in a shell once the node is up. The issue seems to be that qhost and qconf aren't outputting anything when they run. Is that to be expected? Here's what I have added: post snipped the default stuff for this post... # Here's the code as I'd like it to work: # This code gets reached. I can output these env vars and the # values are correct. export SGEBIN=$SGE_ROOT/bin/$SGE_ARCH export NODE=$(/bin/hostname -s) export MEMFREE=`$SGEBIN/qhost -F mem_total -h $NODE|tail -n 1|cut -d: -f3 | cut -d= -f2` $SGEBIN/qconf -mattr exechost complex_values h_vmem=$MEMFREE $NODE 2gt;amp;1 gt; /root/qconf_complex_setup.log $SGEBIN/qconf -mattr exechost complex_values s_vmem=$MEMFREE $NODE 2gt;amp;1 gt;gt; Might be intended, but this syntax will put the error to the default output and only the default output in the logfile. In case you want to capture both it needs to be written as: qconf ... /root/qconf_complex_setup.log 21 -- Reuti Thanks, that was a mistake. -M ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Configurations during kickstart
On Thu, May 1, 2014 at 2:27 PM, Jesse Becker becke...@mail.nih.gov wrote: On Thu, May 01, 2014 at 01:58:04PM -0400, Michael Stauffer wrote: I'm trying to get some resource configurations in place during kickstart. I have the following in my kickstart file replace-partition.xml. The file is run during kickstart: I can see output to text files when I add debugging info. I've recently been doing something similar with our system provisioning, although not directly in kickstart (we aren't using Rocks either, but I don't think that's the problem). This code runs correctly if I run it in a shell once the node is up. The issue seems to be that qhost and qconf aren't outputting anything when they run. Is that to be expected? Here's what I have added: I think the reason is one of timing. Working backwards, you want to do this: 4. configure exechost settings with information reported by qhost 3. for qhost to report info, sge_execd must be running on the node 2. for sge_execd to start, the node must be added via 'qconf -ae' 1. something needs to watch for new nodes, and trigger 'qconf -ae' I forget exactly when Rocks automagically adds nodes to SGE (the qconf -ae' bit, but I bet it hasn't happened yet. Thus, sge_execd can't start, so qhost can't report host info, so qconf -mattr fails. A few possible solutions: 1 .You might be able to somehow force this part of the %post script to run after the master adds the new node. Maybe part of the firstboot service? 2. Create a service that watches for new nodes, and configures them accordingly. 3. Have a cronjob that periodically configures *all* hosts (even old nodes, to catch HW changes). (we've opted for something between options 2 and 3--we look at all nodes, all the time, but only update new ones). Thanks. I've implemented option 3 for the time being. New hosts are rarely added or rebooted here so a periodic cron job will probably be just fine. -M post snipped the default stuff for this post... # Here's the code as I'd like it to work: # This code gets reached. I can output these env vars and the # values are correct. export SGEBIN=$SGE_ROOT/bin/$SGE_ARCH export NODE=$(/bin/hostname -s) export MEMFREE=`$SGEBIN/qhost -F mem_total -h $NODE|tail -n 1|cut -d: -f3 | cut -d= -f2` $SGEBIN/qconf -mattr exechost complex_values h_vmem=$MEMFREE $NODE 2gt;amp;1 gt; /root/qconf_complex_setup.log $SGEBIN/qconf -mattr exechost complex_values s_vmem=$MEMFREE $NODE 2gt;amp;1 gt;gt; /root/qconf_complex_setup.log /post Thanks! -M ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users -- Jesse Becker (Contractor) ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Configurations during kickstart
On 05/01/2014 02:26 PM, Reuti wrote: Hi, Am 01.05.2014 um 19:58 schrieb Michael Stauffer: I'm trying to get some resource configurations in place during kickstart. I have the following in my kickstart file replace-partition.xml. The file is run during kickstart: I can see output to text files when I add debugging info. This code runs correctly if I run it in a shell once the node is up. The issue seems to be that qhost and qconf aren't outputting anything when they run. Is that to be expected? Here's what I have added: post snipped the default stuff for this post... # Here's the code as I'd like it to work: # This code gets reached. I can output these env vars and the # values are correct. export SGEBIN=$SGE_ROOT/bin/$SGE_ARCH export NODE=$(/bin/hostname -s) export MEMFREE=`$SGEBIN/qhost -F mem_total -h $NODE|tail -n 1|cut -d: -f3 | cut -d= -f2` $SGEBIN/qconf -mattr exechost complex_values h_vmem=$MEMFREE $NODE 2gt;amp;1 gt; /root/qconf_complex_setup.log $SGEBIN/qconf -mattr exechost complex_values s_vmem=$MEMFREE $NODE 2gt;amp;1 gt;gt; Might be intended, but this syntax will put the error to the default output and only the default output in the logfile. In case you want to capture both it needs to be written as: qconf ... /root/qconf_complex_setup.log 21 You can also use eliminate the redirection all together and use the --log option to the postinstall script in your kickstart file, like this: %post --log=/root/post-install.log # do something here # do something else here % end ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Configurations during kickstart
On 05/01/2014 02:47 PM, Michael Stauffer wrote: Random ideas: 1. try disabling the log redirects to see if anything ends up in the standard kickstart log? OK I'll try this. Have to wait for a host to free up to try a reinstall again. 2. SGE is unusually sensitive to hostname and DNS resolution. Is your kickstart environment giving the node the same IP address during provisioning as it has when running? Does your kickstart environment have reverse DNS lookup working so that a lookup on the IP returns the proper hostname? I'll dump tests in the kickstart file and check. Don't know how to check the last bit - you mean a lookup on the IP by the execute host as it's booting? Here's tip for trouble-shooting kickstart installs: Depending on where you want to do your debugging (before or after the installation) add something like sleep 1000 to your pre- or post-install script. Then from the console, use ALT+F1, ALT+F2, etc., to access get a root prompt and run some commands from the command-line. You can also cd to /tmp and look at the logs there, as well as the kickstart file that the install is working from. This is much easier and quicker than changing kickstart file, reboot, test, change kickstart file, reboot, test,... 3. qconf requires communication with the qmaster, it looks like you are defining ENV vars that point only to the bin directory rather than setting up the full SGE environment during the kickstart. Consider sourcing the SGE init scripts or at least setting SGE_ROOT and SGE_CELL values so that the SGE binaries can navigate to $SGE_ROOT/$SGE_CELL/act_qmaster so that it knows what host to be communicating with I source /etc/profile.d/sge-binaries.sh at the begin of my code. Should I need something else than that? In any case I'm dumping relevent env vars in the kickstart now to check them. Just for the record, I tried doing this a few years ago with SGE 6.2u5, and for whatever reason, I couldn't get the inst_sge script to ever work correctly in the post-install environment. After a few days of fighting with it, I configured everything BUT sge and then used Cluster SSH to run ./inst_sge on all 64 hosts simultaneously, in auto-mode with no interaction, obviously. potentially dumb question: Are you running inst_sge first, to make sure the host is configured and 'installed' properly before running those qconf commands? Prentice ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users