Re: Win 7 reservations sometimes fails with DCHP error

2012-04-24 Thread Andy Kurth
On Thu, Apr 19, 2012 at 7:43 PM, Dmitri Chebotarov dcheb...@gmu.edu wrote:
 Hi

 I'm looking for some help on this strangle problem.
 We are testing Win 7 images on VCL system and sometimes reservation would 
 fail with this error (below).
 It only happens sometimes for different Win7 images.
 There is no pattern to the problem.

 For some reason Win 7 cannot obtain IP address from DHCP during reservation.
 But after I get this error and check on the image - it has valid public IP 
 address.

We are seeing the same problem with our Windows 7/2008 images.  The
first private network adapter gets installed and obtains an IP address
pretty quickly when the computer boots.  The computer then begins
responding to SSH and vcld continues to perform the post-load steps.
It gets tripped up because the 2nd public network adapter isn't
installed or initialized yet.  I verified this by watching a Windows
2008 VM load.  When the root account automatically logged in, I closed
the black command boxes from the console in order to prevent root from
automatically being logged off.  I then ran ipconfig over and over
again.  The private adapter had an address.  Windows was still
installing the 2nd adapter (the installing driver popups appeared in
the system tray).  When the 2nd appeared in the ipconfig output it had
a 169 address for at least a minute then finally obtained the correct
address.  The delay between the 1st adapter getting a valid IP address
and the 2nd was over 2 minutes.

 In attempt to cure it I have added a patch to Windows.pm module to issue 
 'ipconfig /release' before 'ipconfig /renew'. I've tested code and it runs 
 OK, but it's not in production yet. Before patch goes to production I would 
 like to ask if someone has/had the same issue with Win 7 images.

This should help.  Please create a Jira issue and submit a patch.

Also, whenever running 'ipconfig /renew' on the public adapter, the
DHCP mode of the management node should be checked first to make sure
it isn't static:
if ($self-data-get_management_node_public_ip_configuration() =~ /static/i)

This check isn't in the current code but it should be added.  The
reason has to do with VCL installations using static public IPs.
Windows has a hard-coded 60 second timeout on ipconfig /renew if no
DHCP servers respond.  This will slow every reservation.

I'd also like code added to the wait_for_response subroutine to try to
detect if devices are still being installed after the computer
responds to SSH.  There should be a drvinst.exe process running.  The
code can check for this and loop until there are no drvinst.exe
processes running.  This wouldn't guarantee that everything is done
being installed.  There may be periods in between different devices
installs when no drvinst.exe process is running.  There may be other
ways of detecting when device installation is complete via WMI or by
checking the setupapi.log file.

Also also, the failed to update private IP address message is a
typo.  It should say public.  This was corrected in trunk a couple
months ago.

Thanks,
Andy

 The error message below. Thank you for your help.

 --
 Dmitri Chebotarov
 Virtual Computing Lab Systems Engineer, TSD - Ent Servers  Messaging
 223 Aquia Building, Ffx, MSN: 1B5
 Phone: (703) 993-6175
 Fax: (703) 993-3404


 Forwarded message:

 From: r...@www.vcl.gmu.edu
 Date: Thursday, April 19, 2012 3:56:50
 Subject: PROBLEM -- 
 9141:9049|new|State.pm|vmguest-vcl12gt;BC3-4|vmwarewin7-Windows732bitSPSS19creating64-v5|sthomp17

 reservation failed on vmguest-vcl12: failed to update private IP address
 
 time: 2012-04-19 15:56:27
 caller: State.pm:reservation_failed(213)
 ( 0) State.pm, reservation_failed (line: 213)
 (-1) new.pm, reserve_computer (line: 894)
 (-2) new.pm, process (line: 363)
 (-3) vcld, make_new_child (line: 568)
 (-4) vcld, main (line: 346)
 
 management node: www.vcl.gmu.edu (http://www.vcl.gmu.edu)
 reservation PID: 1191
 parent vcld PID: 30919

 request ID: 9141
 reservation ID: 9049
 request state/laststate: new/new
 request start time: 2012-04-19 15:45:00
 request end time: 2012-04-19 20:00:00
 for imaging: no
 log ID: 10048

 computer: vmguest-vcl12
 computer id: 25
 computer type: virtualmachine
 computer eth0 MAC address: 00:50:56:03:51:1c
 computer eth1 MAC address: 00:50:56:03:51:1d
 computer private IP address: 10.3.6.48
 computer public IP address: 129.174.25.26
 computer in block allocation: no
 provisioning module: VCL::Module::Provisioning::VMware::VMware

 vm host: BC3-4
 vm host ID: 2
 vm host computer ID: 8
 vm profile: VMware ESX - network storage
 vm profile VM path: /vmfs/volumes/Running-Images
 vm profile repository path: 0
 vm profile datastore path: /vmfs/volumes/Image-Library/baseprod
 vm profile disk type: networkdisk

 image: vmwarewin7-Windows732bitSPSS19creating64-v5
 image display name: Windows 7 

Win 7 reservations sometimes fails with DCHP error

2012-04-19 Thread Dmitri Chebotarov
Hi 

I'm looking for some help on this strangle problem. 
We are testing Win 7 images on VCL system and sometimes reservation would fail 
with this error (below). 
It only happens sometimes for different Win7 images.
There is no pattern to the problem. 

For some reason Win 7 cannot obtain IP address from DHCP during reservation. 
But after I get this error and check on the image - it has valid public IP 
address. 

In attempt to cure it I have added a patch to Windows.pm module to issue 
'ipconfig /release' before 'ipconfig /renew'. I've tested code and it runs OK, 
but it's not in production yet. Before patch goes to production I would like to 
ask if someone has/had the same issue with Win 7 images. 

The error message below. Thank you for your help. 

--
Dmitri Chebotarov
Virtual Computing Lab Systems Engineer, TSD - Ent Servers  Messaging
223 Aquia Building, Ffx, MSN: 1B5
Phone: (703) 993-6175
Fax: (703) 993-3404


Forwarded message:

 From: r...@www.vcl.gmu.edu
 Date: Thursday, April 19, 2012 3:56:50 
 Subject: PROBLEM -- 
 9141:9049|new|State.pm|vmguest-vcl12gt;BC3-4|vmwarewin7-Windows732bitSPSS19creating64-v5|sthomp17
 
 reservation failed on vmguest-vcl12: failed to update private IP address
 
 time: 2012-04-19 15:56:27
 caller: State.pm:reservation_failed(213)
 ( 0) State.pm, reservation_failed (line: 213)
 (-1) new.pm, reserve_computer (line: 894)
 (-2) new.pm, process (line: 363)
 (-3) vcld, make_new_child (line: 568)
 (-4) vcld, main (line: 346)
 
 management node: www.vcl.gmu.edu (http://www.vcl.gmu.edu)
 reservation PID: 1191
 parent vcld PID: 30919
 
 request ID: 9141
 reservation ID: 9049
 request state/laststate: new/new
 request start time: 2012-04-19 15:45:00
 request end time: 2012-04-19 20:00:00
 for imaging: no
 log ID: 10048
 
 computer: vmguest-vcl12
 computer id: 25
 computer type: virtualmachine
 computer eth0 MAC address: 00:50:56:03:51:1c
 computer eth1 MAC address: 00:50:56:03:51:1d
 computer private IP address: 10.3.6.48
 computer public IP address: 129.174.25.26
 computer in block allocation: no
 provisioning module: VCL::Module::Provisioning::VMware::VMware
 
 vm host: BC3-4
 vm host ID: 2
 vm host computer ID: 8
 vm profile: VMware ESX - network storage
 vm profile VM path: /vmfs/volumes/Running-Images
 vm profile repository path: 0
 vm profile datastore path: /vmfs/volumes/Image-Library/baseprod
 vm profile disk type: networkdisk
 
 image: vmwarewin7-Windows732bitSPSS19creating64-v5
 image display name: Windows 7 32bit: SPSS 19.0.0
 image ID: 64
 image revision ID: 109
 image size: 61440 MB
 use Sysprep: yes
 root access: yes
 image owner ID: 13
 image owner affiliation: GMUGuest
 image revision date created: 2012-03-13 10:42:37
 image revision production: yes
 OS module: VCL::Module::OS::Windows::Version_6::7
 image revision comments: Created Default profile
 
 user: sthomp17
 user name: Steele B Thompson
 user ID: 1601
 user affiliation: Mason
 
 RECENT LOG ENTRIES FOR THIS PROCESS:
 2012-04-19 15:55:39|1191|9141:9049|new|utils.pm:run_command(9001)|$? is set 
 to -1, setting exit status to 0, Perl bug likely encountered
 2012-04-19 
 15:55:47|1191|9141:9049|new|OS.pm:is_ssh_responding(401)|vmguest-vcl12 is 
 responding to SSH, port 22: open, port 24: closed
 2012-04-19 15:55:47|1191|9141:9049|new|VMware.pm:node_status(969)|VM 
 vmguest-vcl12 is responding to SSH
 2012-04-19 
 15:55:53|1191|9141:9049|new|OS.pm:get_currentimage_txt_contents(157)|retrieved
  currentimage.txt contents from vmguest-vcl12
 2012-04-19 
 15:55:53|1191|9141:9049|new|OS.pm:get_currentimage_txt_contents(169)|found 8 
 lines in currentimage.txt on vmguest-vcl12
 2012-04-19 
 15:55:53|1191|9141:9049|new|OS.pm:get_current_image_name(197)|retrieved 
 currentimage.txt contents from vmguest-vcl12
 2012-04-19 15:55:53|1191|9141:9049|new|OS.pm:get_current_image_name(212)|name 
 of image currently loaded on vmguest-vcl12: 
 vmwarewin7-Windows732bitSPSS19creating64-v5
 2012-04-19 
 15:55:53|1191|9141:9049|new|VMware.pm:node_status(990)|currentimage.txt image 
 (vmwarewin7-Windows732bitSPSS19creating64-v5) matches requested image name 
 (vmwarewin7-Windows732bitSPSS19creating64-v5) on VM vmguest-vcl12
 2012-04-19 
 15:55:58|1191|9141:9049|new|OS.pm:get_vcld_post_load_status(697)|vcld post 
 load tasks have run on vmguest-vcl12: vcld_post_load=success (Thu Apr 19 
 11:54:08 2012)
 2012-04-19 15:55:58|1191|9141:9049|new|VMware.pm:node_status(1051)|OS module 
 post_load tasks have been completed on VM vmguest-vcl12
 2012-04-19 15:55:58|1191|9141:9049|new|new.pm:reload_image(475)|node_status 
 returned a hash reference
 2012-04-19 15:55:58|1191|9141:9049|new|new.pm:reload_image(480)|node_status 
 hash reference contains key {status}=READY
 2012-04-19