After Action Report RE: How'd this for a bad day? AKA bad me

2010-11-01 Thread David Lum
Now that the dust has settled, we know what happened. Our tech didn't 
completely disconnect the SAN connections (he unplugged them, but not far 
enough) when installing ESX v3.5 on a new physical host and it formatted a SAN 
drive instead of the local drive. If we had known this before powering off the 
VM's we could have VMotioned them to the other SAN, but at the time we didn't 
know this.

I still shouldn't have had all my eggs on one SAN (and now don't), and version 
4 of ESX doesn't allow this without having to click on some very prominent are 
you sure!?!?! boxes, whereas apparently v3.5 just throws it wherever and 
apparently making it easy to shoot yourself in the foot.

Dave

From: Brian Desmond [mailto:br...@briandesmond.com]
Sent: Friday, October 08, 2010 2:13 PM
To: NT System Admin Issues
Subject: RE: How'd this for a bad day? AKA bad me

Sounds like you should home the redundant sets of VMs on different SAN 
volumes/whatever?

Thanks,
Brian Desmond
br...@briandesmond.com

c - 312.731.3132


From: David Lum [mailto:david@nwea.org]
Sent: Friday, October 08, 2010 11:51 AM
To: NT System Admin Issues
Subject: How'd this for a bad day? AKA bad me

I have 7 production systems running on 3 different ESX boxes in an ESX cluster, 
and 2 different logical SAN volumes (sorry am not SAN savvy, I just know I have 
two different SAN volumes to choose from when making a VM).

Today, a SAN blows up and takes out half - our SharePoint server (heavily 
used), a Terminal Server , and an internal occasionally-used web server 
(Namescape rDirectory). Then somehow, when I was told to power down the other 4 
VM's so our VMWare guy could reboot a vCenter server, 3 of the 4 remaining VM's 
decided to go AWOL (a combination of missing and disconnected). That took 
out my other two Terminal Servers and another lightly used internal web server.

Did I mention I don't have the normal backups for these things because 
...well...I'm an idiot and didn't confirm our backup guy installed backup 
software on these servers as I stood them up (process error on my part since I 
should confirm it's on there). None of these store data - they all talk to a 
backend SQL and the Terminal Servers are used to run apps that are slow if they 
run the same apps over VPN. SharePoint we got back quick because we do have a 
staging equivalent of it, so it was repoint to a config and content DB, DNS 
change, and done.

I do have copious notes on how I built the others and can rebuild from scratch 
easily enough (I just finished the three TS boxes), but dude...six servers at 
once?

The most frustrating part was discovering that the 4 systems that had been 
powered off could have been migrated before power off and there would have 
been no issue with them - the power down nuked 'em.

Oh, and the lone surviving server - the PGP Universal Server that manages the 
encrypted machines. (Yes, the PGP machines will still boot w/out the server up, 
but still, I've been on this server 50% of my time over the last two weeks!).

Dave

~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to 
listmana...@lyris.sunbeltsoftware.commailto:listmana...@lyris.sunbeltsoftware.com
with the body: unsubscribe ntsysadmin

~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to 
listmana...@lyris.sunbeltsoftware.commailto:listmana...@lyris.sunbeltsoftware.com
with the body: unsubscribe ntsysadmin

~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to listmana...@lyris.sunbeltsoftware.com
with the body: unsubscribe ntsysadmin

How'd this for a bad day? AKA bad me

2010-10-08 Thread David Lum
I have 7 production systems running on 3 different ESX boxes in an ESX cluster, 
and 2 different logical SAN volumes (sorry am not SAN savvy, I just know I have 
two different SAN volumes to choose from when making a VM).

Today, a SAN blows up and takes out half - our SharePoint server (heavily 
used), a Terminal Server , and an internal occasionally-used web server 
(Namescape rDirectory). Then somehow, when I was told to power down the other 4 
VM's so our VMWare guy could reboot a vCenter server, 3 of the 4 remaining VM's 
decided to go AWOL (a combination of missing and disconnected). That took 
out my other two Terminal Servers and another lightly used internal web server.

Did I mention I don't have the normal backups for these things because 
...well...I'm an idiot and didn't confirm our backup guy installed backup 
software on these servers as I stood them up (process error on my part since I 
should confirm it's on there). None of these store data - they all talk to a 
backend SQL and the Terminal Servers are used to run apps that are slow if they 
run the same apps over VPN. SharePoint we got back quick because we do have a 
staging equivalent of it, so it was repoint to a config and content DB, DNS 
change, and done.

I do have copious notes on how I built the others and can rebuild from scratch 
easily enough (I just finished the three TS boxes), but dude...six servers at 
once?

The most frustrating part was discovering that the 4 systems that had been 
powered off could have been migrated before power off and there would have 
been no issue with them - the power down nuked 'em.

Oh, and the lone surviving server - the PGP Universal Server that manages the 
encrypted machines. (Yes, the PGP machines will still boot w/out the server up, 
but still, I've been on this server 50% of my time over the last two weeks!).

Dave

~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to listmana...@lyris.sunbeltsoftware.com
with the body: unsubscribe ntsysadmin

Re: How'd this for a bad day? AKA bad me

2010-10-08 Thread Andrew S. Baker
Yes, process failures can be deadly...

Also, it is more important in this day and age of massive consolidation to
make sure that your backups and DR are effective, because cascading failures
can take out much more of your infrastructure than ever before.


*ASB *(My XeeSM Profile) http://XeeSM.com/AndrewBaker
*Exploiting Technology for Business Advantage...*
* *



On Fri, Oct 8, 2010 at 5:51 AM, David Lum david@nwea.org wrote:

  I have 7 production systems running on 3 different ESX boxes in an ESX
 cluster, and 2 different logical SAN volumes (sorry am not SAN savvy, I just
 know I have two different SAN volumes to choose from when making a VM).



 Today, a SAN blows up and takes out half – our SharePoint server (heavily
 used), a Terminal Server , and an internal occasionally-used web server
 (Namescape rDirectory). Then somehow, when I was told to power down the
 other 4 VM’s so our VMWare guy could reboot a vCenter server, 3 of the 4
 remaining VM’s decided to go AWOL (a combination of “missing” and
 “disconnected”). That took out my other two Terminal Servers and another
 lightly used internal web server.



 Did I mention I don’t have the normal backups for these things because
 …well…I’m an idiot and didn’t confirm our backup guy installed backup
 software on these servers as I stood them up (process error on my part since
 I should confirm it’s on there). None of these store data – they all talk to
 a backend SQL and the Terminal Servers are used to run apps that are slow if
 they run the same apps over VPN. SharePoint we got back quick because we do
 have a staging equivalent of it, so it was repoint to a config and content
 DB, DNS change, and done.



 I do have copious notes on how I built the others and can rebuild from
 scratch easily enough (I just finished the three TS boxes), but dude…six
 servers at once?



 The most frustrating part was discovering that the 4 systems that had been
 powered off could have been “migrated” before power off and there would have
 been no issue with them – the power down nuked ‘em.



 Oh, and the lone surviving server – the PGP Universal Server that manages
 the encrypted machines. (Yes, the PGP machines will still boot w/out the
 server up, but still, I’ve been on this server 50% of my time over the last
 two weeks!).



 Dave


~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to listmana...@lyris.sunbeltsoftware.com
with the body: unsubscribe ntsysadmin

RE: How'd this for a bad day? AKA bad me

2010-10-08 Thread John Aldrich
All I can say is OUCH! :-( 



From: David Lum [mailto:david@nwea.org] 
Sent: Friday, October 08, 2010 5:51 AM
To: NT System Admin Issues
Subject: How'd this for a bad day? AKA bad me

I have 7 production systems running on 3 different ESX boxes in an ESX
cluster, and 2 different logical SAN volumes (sorry am not SAN savvy, I just
know I have two different SAN volumes to choose from when making a VM).

Today, a SAN blows up and takes out half – our SharePoint server (heavily
used), a Terminal Server , and an internal occasionally-used web server
(Namescape rDirectory). Then somehow, when I was told to power down the
other 4 VM’s so our VMWare guy could reboot a vCenter server, 3 of the 4
remaining VM’s decided to go AWOL (a combination of “missing” and
“disconnected”). That took out my other two Terminal Servers and another
lightly used internal web server.

Did I mention I don’t have the normal backups for these things because
…well…I’m an idiot and didn’t confirm our backup guy installed backup
software on these servers as I stood them up (process error on my part since
I should confirm it’s on there). None of these store data – they all talk to
a backend SQL and the Terminal Servers are used to run apps that are slow if
they run the same apps over VPN. SharePoint we got back quick because we do
have a staging equivalent of it, so it was repoint to a config and content
DB, DNS change, and done.

I do have copious notes on how I built the others and can rebuild from
scratch easily enough (I just finished the three TS boxes), but dude…six
servers at once?

The most frustrating part was discovering that the 4 systems that had been
powered off could have been “migrated” before power off and there would have
been no issue with them – the power down nuked ‘em.

Oh, and the lone surviving server – the PGP Universal Server that manages
the encrypted machines. (Yes, the PGP machines will still boot w/out the
server up, but still, I’ve been on this server 50% of my time over the last
two weeks!). 

Dave
~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

---
To manage subscriptions click here:
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to listmana...@lyris.sunbeltsoftware.com
with the body: unsubscribe ntsysadmin


~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to listmana...@lyris.sunbeltsoftware.com
with the body: unsubscribe ntsysadmin



RE: How'd this for a bad day? AKA bad me

2010-10-08 Thread Paul Hutchings
Being slightly serious for a moment, it's a pretty good illustration of how 
something like a SAN in isolation is no use :-)

-Original Message-
From: John Aldrich [mailto:jaldr...@blueridgecarpet.com] 
Sent: 08 October 2010 13:43
To: NT System Admin Issues
Subject: RE: How'd this for a bad day? AKA bad me

All I can say is OUCH! :-( 



From: David Lum [mailto:david@nwea.org] 
Sent: Friday, October 08, 2010 5:51 AM
To: NT System Admin Issues
Subject: How'd this for a bad day? AKA bad me

I have 7 production systems running on 3 different ESX boxes in an ESX
cluster, and 2 different logical SAN volumes (sorry am not SAN savvy, I just
know I have two different SAN volumes to choose from when making a VM).

Today, a SAN blows up and takes out half - our SharePoint server (heavily
used), a Terminal Server , and an internal occasionally-used web server
(Namescape rDirectory). Then somehow, when I was told to power down the
other 4 VM's so our VMWare guy could reboot a vCenter server, 3 of the 4
remaining VM's decided to go AWOL (a combination of missing and
disconnected). That took out my other two Terminal Servers and another
lightly used internal web server.

Did I mention I don't have the normal backups for these things because
...well...I'm an idiot and didn't confirm our backup guy installed backup
software on these servers as I stood them up (process error on my part since
I should confirm it's on there). None of these store data - they all talk to
a backend SQL and the Terminal Servers are used to run apps that are slow if
they run the same apps over VPN. SharePoint we got back quick because we do
have a staging equivalent of it, so it was repoint to a config and content
DB, DNS change, and done.

I do have copious notes on how I built the others and can rebuild from
scratch easily enough (I just finished the three TS boxes), but dude...six
servers at once?

The most frustrating part was discovering that the 4 systems that had been
powered off could have been migrated before power off and there would have
been no issue with them - the power down nuked 'em.

Oh, and the lone surviving server - the PGP Universal Server that manages
the encrypted machines. (Yes, the PGP machines will still boot w/out the
server up, but still, I've been on this server 50% of my time over the last
two weeks!). 

Dave
~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

---
To manage subscriptions click here:
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to listmana...@lyris.sunbeltsoftware.com
with the body: unsubscribe ntsysadmin


~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to listmana...@lyris.sunbeltsoftware.com
with the body: unsubscribe ntsysadmin


--
MIRA Ltd

Watling Street, Nuneaton, Warwickshire, CV10 0TU, England
Registered in England and Wales No. 402570
VAT Registration  GB 114 5409 96

The contents of this e-mail are confidential and are solely for the use of the 
intended recipient.  If you receive this e-mail in error, please delete it and 
notify us either by e-mail, telephone or fax.  You should not copy, forward or 
otherwise disclose the content of the e-mail as this is prohibited.

~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to listmana...@lyris.sunbeltsoftware.com
with the body: unsubscribe ntsysadmin



RE: How'd this for a bad day? AKA bad me

2010-10-08 Thread John Aldrich
Yep. Good point. :-)  VERY good point!


-Original Message-
From: Paul Hutchings [mailto:paul.hutchi...@mira.co.uk] 
Sent: Friday, October 08, 2010 8:55 AM
To: NT System Admin Issues
Subject: RE: How'd this for a bad day? AKA bad me

Being slightly serious for a moment, it's a pretty good illustration of how
something like a SAN in isolation is no use :-)

-Original Message-
From: John Aldrich [mailto:jaldr...@blueridgecarpet.com] 
Sent: 08 October 2010 13:43
To: NT System Admin Issues
Subject: RE: How'd this for a bad day? AKA bad me

All I can say is OUCH! :-( 



From: David Lum [mailto:david@nwea.org] 
Sent: Friday, October 08, 2010 5:51 AM
To: NT System Admin Issues
Subject: How'd this for a bad day? AKA bad me

I have 7 production systems running on 3 different ESX boxes in an ESX
cluster, and 2 different logical SAN volumes (sorry am not SAN savvy, I just
know I have two different SAN volumes to choose from when making a VM).

Today, a SAN blows up and takes out half - our SharePoint server (heavily
used), a Terminal Server , and an internal occasionally-used web server
(Namescape rDirectory). Then somehow, when I was told to power down the
other 4 VM's so our VMWare guy could reboot a vCenter server, 3 of the 4
remaining VM's decided to go AWOL (a combination of missing and
disconnected). That took out my other two Terminal Servers and another
lightly used internal web server.

Did I mention I don't have the normal backups for these things because
...well...I'm an idiot and didn't confirm our backup guy installed backup
software on these servers as I stood them up (process error on my part since
I should confirm it's on there). None of these store data - they all talk to
a backend SQL and the Terminal Servers are used to run apps that are slow if
they run the same apps over VPN. SharePoint we got back quick because we do
have a staging equivalent of it, so it was repoint to a config and content
DB, DNS change, and done.

I do have copious notes on how I built the others and can rebuild from
scratch easily enough (I just finished the three TS boxes), but dude...six
servers at once?

The most frustrating part was discovering that the 4 systems that had been
powered off could have been migrated before power off and there would have
been no issue with them - the power down nuked 'em.

Oh, and the lone surviving server - the PGP Universal Server that manages
the encrypted machines. (Yes, the PGP machines will still boot w/out the
server up, but still, I've been on this server 50% of my time over the last
two weeks!). 

Dave
~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

---
To manage subscriptions click here:
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to listmana...@lyris.sunbeltsoftware.com
with the body: unsubscribe ntsysadmin


~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

---
To manage subscriptions click here:
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to listmana...@lyris.sunbeltsoftware.com
with the body: unsubscribe ntsysadmin


--
MIRA Ltd

Watling Street, Nuneaton, Warwickshire, CV10 0TU, England
Registered in England and Wales No. 402570
VAT Registration  GB 114 5409 96

The contents of this e-mail are confidential and are solely for the use of
the intended recipient.  If you receive this e-mail in error, please delete
it and notify us either by e-mail, telephone or fax.  You should not copy,
forward or otherwise disclose the content of the e-mail as this is
prohibited.

~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

---
To manage subscriptions click here:
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to listmana...@lyris.sunbeltsoftware.com
with the body: unsubscribe ntsysadmin



~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to listmana...@lyris.sunbeltsoftware.com
with the body: unsubscribe ntsysadmin



Re: How'd this for a bad day? AKA bad me

2010-10-08 Thread Jeff Bunting
Why do you need to power down VMs to reboot vCenter?  vCenter might be the
problem with the missing VMs.  VMWare support might be able to help you with
those.

Jeff

On Fri, Oct 8, 2010 at 5:51 AM, David Lum david@nwea.org wrote:

  I have 7 production systems running on 3 different ESX boxes in an ESX
 cluster, and 2 different logical SAN volumes (sorry am not SAN savvy, I just
 know I have two different SAN volumes to choose from when making a VM).



 Today, a SAN blows up and takes out half – our SharePoint server (heavily
 used), a Terminal Server , and an internal occasionally-used web server
 (Namescape rDirectory). Then somehow, when I was told to power down the
 other 4 VM’s so our VMWare guy could reboot a vCenter server, 3 of the 4
 remaining VM’s decided to go AWOL (a combination of “missing” and
 “disconnected”). That took out my other two Terminal Servers and another
 lightly used internal web server.



 Did I mention I don’t have the normal backups for these things because
 …well…I’m an idiot and didn’t confirm our backup guy installed backup
 software on these servers as I stood them up (process error on my part since
 I should confirm it’s on there). None of these store data – they all talk to
 a backend SQL and the Terminal Servers are used to run apps that are slow if
 they run the same apps over VPN. SharePoint we got back quick because we do
 have a staging equivalent of it, so it was repoint to a config and content
 DB, DNS change, and done.



 I do have copious notes on how I built the others and can rebuild from
 scratch easily enough (I just finished the three TS boxes), but dude…six
 servers at once?



 The most frustrating part was discovering that the 4 systems that had been
 powered off could have been “migrated” before power off and there would have
 been no issue with them – the power down nuked ‘em.



 Oh, and the lone surviving server – the PGP Universal Server that manages
 the encrypted machines. (Yes, the PGP machines will still boot w/out the
 server up, but still, I’ve been on this server 50% of my time over the last
 two weeks!).



 Dave

 ~ Finally, powerful endpoint security that ISN'T a resource hog! ~
 ~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

 ---
 To manage subscriptions click here:
 http://lyris.sunbelt-software.com/read/my_forums/
 or send an email to listmana...@lyris.sunbeltsoftware.com
 with the body: unsubscribe ntsysadmin


~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to listmana...@lyris.sunbeltsoftware.com
with the body: unsubscribe ntsysadmin

Re: How'd this for a bad day? AKA bad me

2010-10-08 Thread Jonathan Link
+1  I'm just getting caught up on emails this morning.  vCenter reboot
shouldn't necessitate a reboot of a host server.



On Fri, Oct 8, 2010 at 9:34 AM, Jeff Bunting bunting.j...@gmail.com wrote:

 Why do you need to power down VMs to reboot vCenter?  vCenter might be the
 problem with the missing VMs.  VMWare support might be able to help you with
 those.

 Jeff

  On Fri, Oct 8, 2010 at 5:51 AM, David Lum david@nwea.org wrote:

  I have 7 production systems running on 3 different ESX boxes in an ESX
 cluster, and 2 different logical SAN volumes (sorry am not SAN savvy, I just
 know I have two different SAN volumes to choose from when making a VM).



 Today, a SAN blows up and takes out half – our SharePoint server (heavily
 used), a Terminal Server , and an internal occasionally-used web server
 (Namescape rDirectory). Then somehow, when I was told to power down the
 other 4 VM’s so our VMWare guy could reboot a vCenter server, 3 of the 4
 remaining VM’s decided to go AWOL (a combination of “missing” and
 “disconnected”). That took out my other two Terminal Servers and another
 lightly used internal web server.



 Did I mention I don’t have the normal backups for these things because
 …well…I’m an idiot and didn’t confirm our backup guy installed backup
 software on these servers as I stood them up (process error on my part since
 I should confirm it’s on there). None of these store data – they all talk to
 a backend SQL and the Terminal Servers are used to run apps that are slow if
 they run the same apps over VPN. SharePoint we got back quick because we do
 have a staging equivalent of it, so it was repoint to a config and content
 DB, DNS change, and done.



 I do have copious notes on how I built the others and can rebuild from
 scratch easily enough (I just finished the three TS boxes), but dude…six
 servers at once?



 The most frustrating part was discovering that the 4 systems that had been
 powered off could have been “migrated” before power off and there would have
 been no issue with them – the power down nuked ‘em.



 Oh, and the lone surviving server – the PGP Universal Server that manages
 the encrypted machines. (Yes, the PGP machines will still boot w/out the
 server up, but still, I’ve been on this server 50% of my time over the last
 two weeks!).



 Dave

 ~ Finally, powerful endpoint security that ISN'T a resource hog! ~
 ~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

 ---
 To manage subscriptions click here:
 http://lyris.sunbelt-software.com/read/my_forums/
 or send an email to listmana...@lyris.sunbeltsoftware.com
 with the body: unsubscribe ntsysadmin


 ~ Finally, powerful endpoint security that ISN'T a resource hog! ~
 ~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

 ---
 To manage subscriptions click here:
 http://lyris.sunbelt-software.com/read/my_forums/
 or send an email to listmana...@lyris.sunbeltsoftware.com
 with the body: unsubscribe ntsysadmin


~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to listmana...@lyris.sunbeltsoftware.com
with the body: unsubscribe ntsysadmin

RE: How'd this for a bad day? AKA bad me

2010-10-08 Thread David Lum
I don't know the exact details (and don't remember at the moment), my guess is 
they needed to do something SAN side - I just now heard one SAN store is what 
died. Today is gonna bite..

From: Jeff Bunting [mailto:bunting.j...@gmail.com]
Sent: Friday, October 08, 2010 6:35 AM
To: NT System Admin Issues
Subject: Re: How'd this for a bad day? AKA bad me

Why do you need to power down VMs to reboot vCenter?  vCenter might be the 
problem with the missing VMs.  VMWare support might be able to help you with 
those.

Jeff
On Fri, Oct 8, 2010 at 5:51 AM, David Lum 
david@nwea.orgmailto:david@nwea.org wrote:
I have 7 production systems running on 3 different ESX boxes in an ESX cluster, 
and 2 different logical SAN volumes (sorry am not SAN savvy, I just know I have 
two different SAN volumes to choose from when making a VM).

Today, a SAN blows up and takes out half - our SharePoint server (heavily 
used), a Terminal Server , and an internal occasionally-used web server 
(Namescape rDirectory). Then somehow, when I was told to power down the other 4 
VM's so our VMWare guy could reboot a vCenter server, 3 of the 4 remaining VM's 
decided to go AWOL (a combination of missing and disconnected). That took 
out my other two Terminal Servers and another lightly used internal web server.

Did I mention I don't have the normal backups for these things because 
...well...I'm an idiot and didn't confirm our backup guy installed backup 
software on these servers as I stood them up (process error on my part since I 
should confirm it's on there). None of these store data - they all talk to a 
backend SQL and the Terminal Servers are used to run apps that are slow if they 
run the same apps over VPN. SharePoint we got back quick because we do have a 
staging equivalent of it, so it was repoint to a config and content DB, DNS 
change, and done.

I do have copious notes on how I built the others and can rebuild from scratch 
easily enough (I just finished the three TS boxes), but dude...six servers at 
once?

The most frustrating part was discovering that the 4 systems that had been 
powered off could have been migrated before power off and there would have 
been no issue with them - the power down nuked 'em.

Oh, and the lone surviving server - the PGP Universal Server that manages the 
encrypted machines. (Yes, the PGP machines will still boot w/out the server up, 
but still, I've been on this server 50% of my time over the last two weeks!).

Dave

~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to 
listmana...@lyris.sunbeltsoftware.commailto:listmana...@lyris.sunbeltsoftware.com
with the body: unsubscribe ntsysadmin


~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to 
listmana...@lyris.sunbeltsoftware.commailto:listmana...@lyris.sunbeltsoftware.com
with the body: unsubscribe ntsysadmin

~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to listmana...@lyris.sunbeltsoftware.com
with the body: unsubscribe ntsysadmin

RE: How'd this for a bad day? AKA bad me

2010-10-08 Thread Raper, Jonathan - Eagle
+1 from here as well. A vCenter reboot should not require a host reboot. If it 
did, that would (IMHO) be a huge problem in the design and purpose behind 
VMware. Talk to VMware. If your maintenance is not current, get current.

On a related note, YESTERDAY, one of our storage groups on our SAN ran out of 
space (fortunately I'm not in or over the group responsible for that anymore!), 
and thus took down a number of systems, all part of our core electronic medical 
record system, eClinicalWorks, all virtual... We were without that app for more 
than 6 hours, and are still dealing with database replication issues today as a 
result

TGIF!

Jonathan L. Raper, A+, MCSA, MCSE
Technology Coordinator
Eagle Physicians  Associates, PA
jra...@eaglemds.comBLOCKED::mailto:%20jra...@eaglemds.com
www.eaglemds.comBLOCKED::http://www.eaglemds.com/


From: Jonathan Link [mailto:jonathan.l...@gmail.com]
Sent: Friday, October 08, 2010 9:40 AM
To: NT System Admin Issues
Subject: Re: How'd this for a bad day? AKA bad me

+1  I'm just getting caught up on emails this morning.  vCenter reboot 
shouldn't necessitate a reboot of a host server.



On Fri, Oct 8, 2010 at 9:34 AM, Jeff Bunting 
bunting.j...@gmail.commailto:bunting.j...@gmail.com wrote:
Why do you need to power down VMs to reboot vCenter?  vCenter might be the 
problem with the missing VMs.  VMWare support might be able to help you with 
those.

Jeff
On Fri, Oct 8, 2010 at 5:51 AM, David Lum 
david@nwea.orgmailto:david@nwea.org wrote:
I have 7 production systems running on 3 different ESX boxes in an ESX cluster, 
and 2 different logical SAN volumes (sorry am not SAN savvy, I just know I have 
two different SAN volumes to choose from when making a VM).

Today, a SAN blows up and takes out half - our SharePoint server (heavily 
used), a Terminal Server , and an internal occasionally-used web server 
(Namescape rDirectory). Then somehow, when I was told to power down the other 4 
VM's so our VMWare guy could reboot a vCenter server, 3 of the 4 remaining VM's 
decided to go AWOL (a combination of missing and disconnected). That took 
out my other two Terminal Servers and another lightly used internal web server.

Did I mention I don't have the normal backups for these things because 
...well...I'm an idiot and didn't confirm our backup guy installed backup 
software on these servers as I stood them up (process error on my part since I 
should confirm it's on there). None of these store data - they all talk to a 
backend SQL and the Terminal Servers are used to run apps that are slow if they 
run the same apps over VPN. SharePoint we got back quick because we do have a 
staging equivalent of it, so it was repoint to a config and content DB, DNS 
change, and done.

I do have copious notes on how I built the others and can rebuild from scratch 
easily enough (I just finished the three TS boxes), but dude...six servers at 
once?

The most frustrating part was discovering that the 4 systems that had been 
powered off could have been migrated before power off and there would have 
been no issue with them - the power down nuked 'em.

Oh, and the lone surviving server - the PGP Universal Server that manages the 
encrypted machines. (Yes, the PGP machines will still boot w/out the server up, 
but still, I've been on this server 50% of my time over the last two weeks!).

Dave

~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to 
listmana...@lyris.sunbeltsoftware.commailto:listmana...@lyris.sunbeltsoftware.com
with the body: unsubscribe ntsysadmin


~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to 
listmana...@lyris.sunbeltsoftware.commailto:listmana...@lyris.sunbeltsoftware.com
with the body: unsubscribe ntsysadmin


~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to 
listmana...@lyris.sunbeltsoftware.commailto:listmana...@lyris.sunbeltsoftware.com
with the body: unsubscribe ntsysadmin


Any medical information contained in this electronic message is CONFIDENTIAL 
and privileged. It is unlawful for unauthorized persons to view, copy, 
disclose, or disseminate CONFIDENTIAL information. This electronic message may 
contain information that is confidential and/or legally privileged. It is 
intended only for the use of the individual(s) and/or entity named as 
recipients in the message. If you are not an intended recipient of this 
message

Re: How'd this for a bad day? AKA bad me

2010-10-08 Thread Ben Scott
On Fri, Oct 8, 2010 at 5:51 AM, David Lum david@nwea.org wrote:
 I have 7 production systems ...

  Oh, boy.  Fun.  I've had days like that.  Not many, fortunately (and
knock on wood).  Hope  you get it all sorted out in time for the
weekend!

  Today I find myself having to arbitrate a pooch screw regarding
important procedures, and thus get everyone's story and try and make
sense of it all.  I feel like I'm playing the cop in a police
interrogation scene.  I much prefer dealing with recalcitrant machines
than people.

-- Ben

~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to listmana...@lyris.sunbeltsoftware.com
with the body: unsubscribe ntsysadmin


Re: How'd this for a bad day? AKA bad me

2010-10-08 Thread Jonathan Link
Machines are recalcitrant, they're just misunderstood.

On Fri, Oct 8, 2010 at 12:15 PM, Ben Scott mailvor...@gmail.com wrote:

 On Fri, Oct 8, 2010 at 5:51 AM, David Lum david@nwea.org wrote:
  I have 7 production systems ...

  Oh, boy.  Fun.  I've had days like that.  Not many, fortunately (and
 knock on wood).  Hope  you get it all sorted out in time for the
 weekend!

  Today I find myself having to arbitrate a pooch screw regarding
 important procedures, and thus get everyone's story and try and make
 sense of it all.  I feel like I'm playing the cop in a police
 interrogation scene.  I much prefer dealing with recalcitrant machines
 than people.

 -- Ben

 ~ Finally, powerful endpoint security that ISN'T a resource hog! ~
 ~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

 ---
 To manage subscriptions click here:
 http://lyris.sunbelt-software.com/read/my_forums/
 or send an email to listmana...@lyris.sunbeltsoftware.com
 with the body: unsubscribe ntsysadmin


~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to listmana...@lyris.sunbeltsoftware.com
with the body: unsubscribe ntsysadmin

Re: How'd this for a bad day? AKA bad me

2010-10-08 Thread Steven Peck
If the systems are still actually on the LUNs, then you should be able to
reconnect them and bring them up.  Rebooting vCenter should not have had
anything to do with shutting down guests but rebooting the SAN might
possibly have been required to address it's fire.

From vCenter just reconnect to the ESX hosts, and then start connecting to
the guests.  Frankly I'd get on hold with VMware now.  They are pretty good
at getting this sort of thing sorted out so rebuilding shouldn't be
necessary unless the data on the SAN went poof.

Steven Peck
http://www.blkmtn.org
.

On Fri, Oct 8, 2010 at 9:20 AM, Jonathan Link jonathan.l...@gmail.comwrote:

 Machines are recalcitrant, they're just misunderstood.


 On Fri, Oct 8, 2010 at 12:15 PM, Ben Scott mailvor...@gmail.com wrote:

 On Fri, Oct 8, 2010 at 5:51 AM, David Lum david@nwea.org wrote:
  I have 7 production systems ...

  Oh, boy.  Fun.  I've had days like that.  Not many, fortunately (and
 knock on wood).  Hope  you get it all sorted out in time for the
 weekend!

  Today I find myself having to arbitrate a pooch screw regarding
 important procedures, and thus get everyone's story and try and make
 sense of it all.  I feel like I'm playing the cop in a police
 interrogation scene.  I much prefer dealing with recalcitrant machines
 than people.

 -- Ben

 ~ Finally, powerful endpoint security that ISN'T a resource hog! ~
 ~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

 ---
 To manage subscriptions click here:
 http://lyris.sunbelt-software.com/read/my_forums/
 or send an email to listmana...@lyris.sunbeltsoftware.com
 with the body: unsubscribe ntsysadmin


 ~ Finally, powerful endpoint security that ISN'T a resource hog! ~
 ~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

 ---
 To manage subscriptions click here:
 http://lyris.sunbelt-software.com/read/my_forums/
 or send an email to listmana...@lyris.sunbeltsoftware.com
 with the body: unsubscribe ntsysadmin


~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to listmana...@lyris.sunbeltsoftware.com
with the body: unsubscribe ntsysadmin

Re: How'd this for a bad day? AKA bad me

2010-10-08 Thread Andrew S. Baker
Your not is AWOL



*ASB *
* *
On Fri, Oct 8, 2010 at 12:20 PM, Jonathan Link jonathan.l...@gmail.comwrote:

 Machines are recalcitrant, they're just misunderstood.


 On Fri, Oct 8, 2010 at 12:15 PM, Ben Scott mailvor...@gmail.com wrote:

 On Fri, Oct 8, 2010 at 5:51 AM, David Lum david@nwea.org wrote:
  I have 7 production systems ...

  Oh, boy.  Fun.  I've had days like that.  Not many, fortunately (and
 knock on wood).  Hope  you get it all sorted out in time for the
 weekend!

  Today I find myself having to arbitrate a pooch screw regarding
 important procedures, and thus get everyone's story and try and make
 sense of it all.  I feel like I'm playing the cop in a police
 interrogation scene.  I much prefer dealing with recalcitrant machines
 than people.

 -- Ben



~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to listmana...@lyris.sunbeltsoftware.com
with the body: unsubscribe ntsysadmin

Re: How'd this for a bad day? AKA bad me

2010-10-08 Thread Jonathan Link
That's not the only thing...

On Fri, Oct 8, 2010 at 12:32 PM, Andrew S. Baker asbz...@gmail.com wrote:

 Your not is AWOL



 *ASB *
 * *
   On Fri, Oct 8, 2010 at 12:20 PM, Jonathan Link 
 jonathan.l...@gmail.comwrote:

 Machines are recalcitrant, they're just misunderstood.


 On Fri, Oct 8, 2010 at 12:15 PM, Ben Scott mailvor...@gmail.com wrote:

 On Fri, Oct 8, 2010 at 5:51 AM, David Lum david@nwea.org wrote:
  I have 7 production systems ...

  Oh, boy.  Fun.  I've had days like that.  Not many, fortunately (and
 knock on wood).  Hope  you get it all sorted out in time for the
 weekend!

  Today I find myself having to arbitrate a pooch screw regarding
 important procedures, and thus get everyone's story and try and make
 sense of it all.  I feel like I'm playing the cop in a police
 interrogation scene.  I much prefer dealing with recalcitrant machines
 than people.

 -- Ben

 ~ Finally, powerful endpoint security that ISN'T a resource hog! ~
 ~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

 ---
 To manage subscriptions click here:
 http://lyris.sunbelt-software.com/read/my_forums/
 or send an email to listmana...@lyris.sunbeltsoftware.com
 with the body: unsubscribe ntsysadmin


~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to listmana...@lyris.sunbeltsoftware.com
with the body: unsubscribe ntsysadmin

Re: How'd this for a bad day? AKA bad me

2010-10-08 Thread Andrew S. Baker
I've said it before, but I will say it again.

In a highly virtualized, heavily consolidated world, we need more planning,
more thinking and more time for effective execution.

Cutting corners will become more and more painful, and will bite more and
more organizations.

Hopefully, enough near misses will teach enough entities to do the right
thing.   That's just my optimism speaking, however.

It will be incumbent on each technology professional to advocate or fight
for the right solutions, or have an excellent exit strategy planned out. :)


*ASB *(My XeeSM Profile) http://XeeSM.com/AndrewBaker
*Exploiting Technology for Business Advantage...*
* *
On Fri, Oct 8, 2010 at 11:27 AM, Raper, Jonathan - Eagle 
jra...@eaglemds.com wrote:

  +1 from here as well. A vCenter reboot should not require a host reboot.
 If it did, that would (IMHO) be a huge problem in the design and purpose
 behind VMware. Talk to VMware. If your maintenance is not current, get
 current.



 On a related note, YESTERDAY, one of our storage groups on our SAN ran out
 of space (fortunately I’m not in or over the group responsible for that
 anymore!), and thus took down a number of systems, all part of our core
 electronic medical record system, eClinicalWorks, all virtual… We were
 without that app for more than 6 hours, and are still dealing with database
 replication issues today as a result….



 TGIF!

 Jonathan L. Raper, A+, MCSA, MCSE
 Technology Coordinator
 Eagle Physicians  Associates, PA*
 *jra...@eaglemds.com*
 *www.eaglemds.com
   --

 *From:* Jonathan Link [mailto:jonathan.l...@gmail.com]
 *Sent:* Friday, October 08, 2010 9:40 AM

 *To:* NT System Admin Issues
 *Subject:* Re: How'd this for a bad day? AKA bad me



 +1  I'm just getting caught up on emails this morning.  vCenter reboot
 shouldn't necessitate a reboot of a host server.





 On Fri, Oct 8, 2010 at 9:34 AM, Jeff Bunting bunting.j...@gmail.com
 wrote:

 Why do you need to power down VMs to reboot vCenter?  vCenter might be the
 problem with the missing VMs.  VMWare support might be able to help you with
 those.

 Jeff

 On Fri, Oct 8, 2010 at 5:51 AM, David Lum david@nwea.org wrote:

  I have 7 production systems running on 3 different ESX boxes in an ESX
 cluster, and 2 different logical SAN volumes (sorry am not SAN savvy, I just
 know I have two different SAN volumes to choose from when making a VM).



 Today, a SAN blows up and takes out half – our SharePoint server (heavily
 used), a Terminal Server , and an internal occasionally-used web server
 (Namescape rDirectory). Then somehow, when I was told to power down the
 other 4 VM’s so our VMWare guy could reboot a vCenter server, 3 of the 4
 remaining VM’s decided to go AWOL (a combination of “missing” and
 “disconnected”). That took out my other two Terminal Servers and another
 lightly used internal web server.



 Did I mention I don’t have the normal backups for these things because
 …well…I’m an idiot and didn’t confirm our backup guy installed backup
 software on these servers as I stood them up (process error on my part since
 I should confirm it’s on there). None of these store data – they all talk to
 a backend SQL and the Terminal Servers are used to run apps that are slow if
 they run the same apps over VPN. SharePoint we got back quick because we do
 have a staging equivalent of it, so it was repoint to a config and content
 DB, DNS change, and done.



 I do have copious notes on how I built the others and can rebuild from
 scratch easily enough (I just finished the three TS boxes), but dude…six
 servers at once?



 The most frustrating part was discovering that the 4 systems that had been
 powered off could have been “migrated” before power off and there would have
 been no issue with them – the power down nuked ‘em.



 Oh, and the lone surviving server – the PGP Universal Server that manages
 the encrypted machines. (Yes, the PGP machines will still boot w/out the
 server up, but still, I’ve been on this server 50% of my time over the last
 two weeks!).



 Dave




~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to listmana...@lyris.sunbeltsoftware.com
with the body: unsubscribe ntsysadmin

RE: How'd this for a bad day? AKA bad me

2010-10-08 Thread David Lum
Yeah I seem to run into this kind of I should change my career event once 
every five years or so, although this event isn't nearly as stressful as being 
at a client (these down systems are at %dayjob%) and having a RAID5 card die 
and thinking I don't even know how the RAID volumes were configured, this 
setup pre-dated me..., this on their primary SBS server.

The worst in my 15 years was P2V-ing a different customer's SBS server with 
Hyper-V, then about two months later when I rebooted the host, SCVMM (MS's 
fancy VM manager) tells me No virtual machines found...

Current status of my disaster: I have 5 of 6 servers back up and 95%+ back to 
normal, not too bad for 12 hours of work...or is it? The last server is low on 
the critical list, I believe I will not suffer a heart attack this day.

Dave

-Original Message-
From: Ben Scott [mailto:mailvor...@gmail.com] 
Sent: Friday, October 08, 2010 9:16 AM
To: NT System Admin Issues
Subject: Re: How'd this for a bad day? AKA bad me

On Fri, Oct 8, 2010 at 5:51 AM, David Lum david@nwea.org wrote:
 I have 7 production systems ...

  Oh, boy.  Fun.  I've had days like that.  Not many, fortunately (and
knock on wood).  Hope  you get it all sorted out in time for the
weekend!

  Today I find myself having to arbitrate a pooch screw regarding
important procedures, and thus get everyone's story and try and make
sense of it all.  I feel like I'm playing the cop in a police
interrogation scene.  I much prefer dealing with recalcitrant machines
than people.

-- Ben

~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to listmana...@lyris.sunbeltsoftware.com
with the body: unsubscribe ntsysadmin



~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to listmana...@lyris.sunbeltsoftware.com
with the body: unsubscribe ntsysadmin



RE: How'd this for a bad day? AKA bad me

2010-10-08 Thread Raper, Jonathan - Eagle
Just be glad it didn't happen on a Monday! Terrible way to start off a week!

Jonathan L. Raper, A+, MCSA, MCSE
Technology Coordinator
Eagle Physicians  Associates, PA
jra...@eaglemds.com
www.eaglemds.com


-Original Message-
From: David Lum [mailto:david@nwea.org]
Sent: Friday, October 08, 2010 12:54 PM
To: NT System Admin Issues
Subject: RE: How'd this for a bad day? AKA bad me

Yeah I seem to run into this kind of I should change my career event once 
every five years or so, although this event isn't nearly as stressful as being 
at a client (these down systems are at %dayjob%) and having a RAID5 card die 
and thinking I don't even know how the RAID volumes were configured, this 
setup pre-dated me..., this on their primary SBS server.

The worst in my 15 years was P2V-ing a different customer's SBS server with 
Hyper-V, then about two months later when I rebooted the host, SCVMM (MS's 
fancy VM manager) tells me No virtual machines found...

Current status of my disaster: I have 5 of 6 servers back up and 95%+ back to 
normal, not too bad for 12 hours of work...or is it? The last server is low on 
the critical list, I believe I will not suffer a heart attack this day.

Dave

-Original Message-
From: Ben Scott [mailto:mailvor...@gmail.com]
Sent: Friday, October 08, 2010 9:16 AM
To: NT System Admin Issues
Subject: Re: How'd this for a bad day? AKA bad me

On Fri, Oct 8, 2010 at 5:51 AM, David Lum david@nwea.org wrote:
 I have 7 production systems ...

  Oh, boy.  Fun.  I've had days like that.  Not many, fortunately (and
knock on wood).  Hope  you get it all sorted out in time for the
weekend!

  Today I find myself having to arbitrate a pooch screw regarding
important procedures, and thus get everyone's story and try and make
sense of it all.  I feel like I'm playing the cop in a police
interrogation scene.  I much prefer dealing with recalcitrant machines
than people.

-- Ben

~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to listmana...@lyris.sunbeltsoftware.com
with the body: unsubscribe ntsysadmin



~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to listmana...@lyris.sunbeltsoftware.com
with the body: unsubscribe ntsysadmin


Any medical information contained in this electronic message is CONFIDENTIAL 
and privileged. It is unlawful for unauthorized persons to view, copy, 
disclose, or disseminate CONFIDENTIAL information. This electronic message may 
contain information that is confidential and/or legally privileged. It is 
intended only for the use of the individual(s) and/or entity named as 
recipients in the message. If you are not an intended recipient of this 
message, please notify the sender immediately and delete this material from 
your computer. Do not deliver, distribute or copy this message, and do not 
disclose its contents or take any action in reliance on the information that it 
contains.

~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to listmana...@lyris.sunbeltsoftware.com
with the body: unsubscribe ntsysadmin



RE: How'd this for a bad day? AKA bad me

2010-10-08 Thread Terry Dickson
Amen

-Original Message-
From: Andrew S. Baker [mailto:asbz...@gmail.com] 
Sent: Friday, October 08, 2010 11:36 AM
To: NT System Admin Issues
Subject: Re: How'd this for a bad day? AKA bad me

I've said it before, but I will say it again.


In a highly virtualized, heavily consolidated world, we need more planning, 
more thinking and more time for effective execution.

Cutting corners will become more and more painful, and will bite more and more 
organizations.


Hopefully, enough near misses will teach enough entities to do the right thing. 
  That's just my optimism speaking, however.


It will be incumbent on each technology professional to advocate or fight for 
the right solutions, or have an excellent exit strategy planned out. :)


ASB (My XeeSM Profile) http://XeeSM.com/AndrewBaker Exploiting Technology for 
Business Advantage...
 

On Fri, Oct 8, 2010 at 11:27 AM, Raper, Jonathan - Eagle jra...@eaglemds.com 
wrote:


+1 from here as well. A vCenter reboot should not require a host 
reboot. If it did, that would (IMHO) be a huge problem in the design and 
purpose behind VMware. Talk to VMware. If your maintenance is not current, get 
current.

 

On a related note, YESTERDAY, one of our storage groups on our SAN ran 
out of space (fortunately I'm not in or over the group responsible for that 
anymore!), and thus took down a number of systems, all part of our core 
electronic medical record system, eClinicalWorks, all virtual... We were 
without that app for more than 6 hours, and are still dealing with database 
replication issues today as a result

 

TGIF!

Jonathan L. Raper, A+, MCSA, MCSE
Technology Coordinator
Eagle Physicians  Associates, PA
jra...@eaglemds.com
www.eaglemds.com 





From: Jonathan Link [mailto:jonathan.l...@gmail.com] 
Sent: Friday, October 08, 2010 9:40 AM



To: NT System Admin Issues
Subject: Re: How'd this for a bad day? AKA bad me



 

+1  I'm just getting caught up on emails this morning.  vCenter reboot 
shouldn't necessitate a reboot of a host server.



 

On Fri, Oct 8, 2010 at 9:34 AM, Jeff Bunting bunting.j...@gmail.com 
wrote:

Why do you need to power down VMs to reboot vCenter?  vCenter might be 
the problem with the missing VMs.  VMWare support might be able to help you 
with those.

Jeff

On Fri, Oct 8, 2010 at 5:51 AM, David Lum david@nwea.org wrote:

I have 7 production systems running on 3 different ESX boxes in 
an ESX cluster, and 2 different logical SAN volumes (sorry am not SAN savvy, I 
just know I have two different SAN volumes to choose from when making a VM).

 

Today, a SAN blows up and takes out half - our SharePoint 
server (heavily used), a Terminal Server , and an internal occasionally-used 
web server (Namescape rDirectory). Then somehow, when I was told to power down 
the other 4 VM's so our VMWare guy could reboot a vCenter server, 3 of the 4 
remaining VM's decided to go AWOL (a combination of missing and 
disconnected). That took out my other two Terminal Servers and another 
lightly used internal web server.

 

Did I mention I don't have the normal backups for these things 
because ...well...I'm an idiot and didn't confirm our backup guy installed 
backup software on these servers as I stood them up (process error on my part 
since I should confirm it's on there). None of these store data - they all talk 
to a backend SQL and the Terminal Servers are used to run apps that are slow if 
they run the same apps over VPN. SharePoint we got back quick because we do 
have a staging equivalent of it, so it was repoint to a config and content DB, 
DNS change, and done.

 

I do have copious notes on how I built the others and can 
rebuild from scratch easily enough (I just finished the three TS boxes), but 
dude...six servers at once?

 

The most frustrating part was discovering that the 4 systems 
that had been powered off could have been migrated before power off and there 
would have been no issue with them - the power down nuked 'em.

 

Oh, and the lone surviving server - the PGP Universal Server 
that manages the encrypted machines. (Yes, the PGP machines will still boot 
w/out the server up, but still, I've been on this server 50% of my time over 
the last two weeks!). 

 

Dave




~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~ 
http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to listmana

Root cause of: RE: How'd this for a bad day? AKA bad me

2010-10-08 Thread David Lum
So, the root cause: ESX 3.5 OS was installed onto SAN volume that contained my 
VM's. The install of that OS (effectively) removes pointers that VM's need when 
they boot up. Best practice is to disconnect the SAN links when installing this 
version of the OS so this doesn't happen. In fact our SE did this but 
apparently didn't disconnect one far enough. If we had left the VM's running we 
could have used a VM converter to move them to a different storage location.

ESX 4.0 doesn't allow this activity.

Our SE feels really about out the work he created for me - personally I'm just 
really happy he's a stand up guy and explained what happened. You do this stuff 
long enough and something like this eventually happens - it's called 
experience.

Dave

From: Andrew S. Baker [mailto:asbz...@gmail.com]
Sent: Friday, October 08, 2010 9:36 AM
To: NT System Admin Issues
Subject: Re: How'd this for a bad day? AKA bad me

I've said it before, but I will say it again.

In a highly virtualized, heavily consolidated world, we need more planning, 
more thinking and more time for effective execution.

Cutting corners will become more and more painful, and will bite more and more 
organizations.

Hopefully, enough near misses will teach enough entities to do the right thing. 
  That's just my optimism speaking, however.

It will be incumbent on each technology professional to advocate or fight for 
the right solutions, or have an excellent exit strategy planned out. :)

ASB (My XeeSM Profile)http://XeeSM.com/AndrewBaker
Exploiting Technology for Business Advantage...

On Fri, Oct 8, 2010 at 11:27 AM, Raper, Jonathan - Eagle 
jra...@eaglemds.commailto:jra...@eaglemds.com wrote:
+1 from here as well. A vCenter reboot should not require a host reboot. If it 
did, that would (IMHO) be a huge problem in the design and purpose behind 
VMware. Talk to VMware. If your maintenance is not current, get current.

On a related note, YESTERDAY, one of our storage groups on our SAN ran out of 
space (fortunately I'm not in or over the group responsible for that anymore!), 
and thus took down a number of systems, all part of our core electronic medical 
record system, eClinicalWorks, all virtual... We were without that app for more 
than 6 hours, and are still dealing with database replication issues today as a 
result

TGIF!

Jonathan L. Raper, A+, MCSA, MCSE
Technology Coordinator
Eagle Physicians  Associates, PA
jra...@eaglemds.com
www.eaglemds.com


From: Jonathan Link 
[mailto:jonathan.l...@gmail.commailto:jonathan.l...@gmail.com]
Sent: Friday, October 08, 2010 9:40 AM

To: NT System Admin Issues
Subject: Re: How'd this for a bad day? AKA bad me

+1  I'm just getting caught up on emails this morning.  vCenter reboot 
shouldn't necessitate a reboot of a host server.



On Fri, Oct 8, 2010 at 9:34 AM, Jeff Bunting 
bunting.j...@gmail.commailto:bunting.j...@gmail.com wrote:
Why do you need to power down VMs to reboot vCenter?  vCenter might be the 
problem with the missing VMs.  VMWare support might be able to help you with 
those.

Jeff
On Fri, Oct 8, 2010 at 5:51 AM, David Lum 
david@nwea.orgmailto:david@nwea.org wrote:
I have 7 production systems running on 3 different ESX boxes in an ESX cluster, 
and 2 different logical SAN volumes (sorry am not SAN savvy, I just know I have 
two different SAN volumes to choose from when making a VM).

Today, a SAN blows up and takes out half - our SharePoint server (heavily 
used), a Terminal Server , and an internal occasionally-used web server 
(Namescape rDirectory). Then somehow, when I was told to power down the other 4 
VM's so our VMWare guy could reboot a vCenter server, 3 of the 4 remaining VM's 
decided to go AWOL (a combination of missing and disconnected). That took 
out my other two Terminal Servers and another lightly used internal web server.

Did I mention I don't have the normal backups for these things because 
...well...I'm an idiot and didn't confirm our backup guy installed backup 
software on these servers as I stood them up (process error on my part since I 
should confirm it's on there). None of these store data - they all talk to a 
backend SQL and the Terminal Servers are used to run apps that are slow if they 
run the same apps over VPN. SharePoint we got back quick because we do have a 
staging equivalent of it, so it was repoint to a config and content DB, DNS 
change, and done.

I do have copious notes on how I built the others and can rebuild from scratch 
easily enough (I just finished the three TS boxes), but dude...six servers at 
once?

The most frustrating part was discovering that the 4 systems that had been 
powered off could have been migrated before power off and there would have 
been no issue with them - the power down nuked 'em.

Oh, and the lone surviving server - the PGP Universal Server that manages the 
encrypted machines. (Yes, the PGP machines will still boot w/out the server up, 
but still, I've been

Re: Root cause of: RE: How'd this for a bad day? AKA bad me

2010-10-08 Thread Kurt Buff
Experience may not be the best teacher, but it is the most expensive one...

On Fri, Oct 8, 2010 at 13:34, David Lum david@nwea.org wrote:
 So, the root cause: ESX 3.5 OS was installed onto SAN volume that contained
 my VM’s. The install of that OS (effectively) removes pointers that VM’s
 need when they boot up. Best practice is to disconnect the SAN links when
 installing this version of the OS so this doesn’t happen. In fact our SE did
 this but apparently didn’t disconnect one far enough. If we had left the
 VM’s running we could have used a VM converter to move them to a different
 storage location.



 ESX 4.0 doesn’t allow this activity.



 Our SE feels really about out the work he created for me – personally I’m
 just really happy he’s a stand up guy and explained what happened. You do
 this stuff long enough and something like this eventually happens – it’s
 called “experience”.



 Dave



 From: Andrew S. Baker [mailto:asbz...@gmail.com]
 Sent: Friday, October 08, 2010 9:36 AM
 To: NT System Admin Issues
 Subject: Re: How'd this for a bad day? AKA bad me



 I've said it before, but I will say it again.



 In a highly virtualized, heavily consolidated world, we need more planning,
 more thinking and more time for effective execution.

 Cutting corners will become more and more painful, and will bite more and
 more organizations.



 Hopefully, enough near misses will teach enough entities to do the right
 thing.   That's just my optimism speaking, however.



 It will be incumbent on each technology professional to advocate or fight
 for the right solutions, or have an excellent exit strategy planned out. :)

 ASB (My XeeSM Profile)
 Exploiting Technology for Business Advantage...


 On Fri, Oct 8, 2010 at 11:27 AM, Raper, Jonathan - Eagle
 jra...@eaglemds.com wrote:

 +1 from here as well. A vCenter reboot should not require a host reboot. If
 it did, that would (IMHO) be a huge problem in the design and purpose behind
 VMware. Talk to VMware. If your maintenance is not current, get current.



 On a related note, YESTERDAY, one of our storage groups on our SAN ran out
 of space (fortunately I’m not in or over the group responsible for that
 anymore!), and thus took down a number of systems, all part of our core
 electronic medical record system, eClinicalWorks, all virtual… We were
 without that app for more than 6 hours, and are still dealing with database
 replication issues today as a result….



 TGIF!

 Jonathan L. Raper, A+, MCSA, MCSE
 Technology Coordinator
 Eagle Physicians  Associates, PA
 jra...@eaglemds.com
 www.eaglemds.com

 

 From: Jonathan Link [mailto:jonathan.l...@gmail.com]
 Sent: Friday, October 08, 2010 9:40 AM

 To: NT System Admin Issues
 Subject: Re: How'd this for a bad day? AKA bad me



 +1  I'm just getting caught up on emails this morning.  vCenter reboot
 shouldn't necessitate a reboot of a host server.



 On Fri, Oct 8, 2010 at 9:34 AM, Jeff Bunting bunting.j...@gmail.com wrote:

 Why do you need to power down VMs to reboot vCenter?  vCenter might be the
 problem with the missing VMs.  VMWare support might be able to help you with
 those.

 Jeff

 On Fri, Oct 8, 2010 at 5:51 AM, David Lum david@nwea.org wrote:

 I have 7 production systems running on 3 different ESX boxes in an ESX
 cluster, and 2 different logical SAN volumes (sorry am not SAN savvy, I just
 know I have two different SAN volumes to choose from when making a VM).



 Today, a SAN blows up and takes out half – our SharePoint server (heavily
 used), a Terminal Server , and an internal occasionally-used web server
 (Namescape rDirectory). Then somehow, when I was told to power down the
 other 4 VM’s so our VMWare guy could reboot a vCenter server, 3 of the 4
 remaining VM’s decided to go AWOL (a combination of “missing” and
 “disconnected”). That took out my other two Terminal Servers and another
 lightly used internal web server.



 Did I mention I don’t have the normal backups for these things because
 …well…I’m an idiot and didn’t confirm our backup guy installed backup
 software on these servers as I stood them up (process error on my part since
 I should confirm it’s on there). None of these store data – they all talk to
 a backend SQL and the Terminal Servers are used to run apps that are slow if
 they run the same apps over VPN. SharePoint we got back quick because we do
 have a staging equivalent of it, so it was repoint to a config and content
 DB, DNS change, and done.



 I do have copious notes on how I built the others and can rebuild from
 scratch easily enough (I just finished the three TS boxes), but dude…six
 servers at once?



 The most frustrating part was discovering that the 4 systems that had been
 powered off could have been “migrated” before power off and there would have
 been no issue with them – the power down nuked ‘em.



 Oh, and the lone surviving server – the PGP Universal Server that manages
 the encrypted machines. (Yes

RE: How'd this for a bad day? AKA bad me

2010-10-08 Thread Brian Desmond
Sounds like you should home the redundant sets of VMs on different SAN 
volumes/whatever?

Thanks,
Brian Desmond
br...@briandesmond.com

c - 312.731.3132


From: David Lum [mailto:david@nwea.org]
Sent: Friday, October 08, 2010 11:51 AM
To: NT System Admin Issues
Subject: How'd this for a bad day? AKA bad me

I have 7 production systems running on 3 different ESX boxes in an ESX cluster, 
and 2 different logical SAN volumes (sorry am not SAN savvy, I just know I have 
two different SAN volumes to choose from when making a VM).

Today, a SAN blows up and takes out half - our SharePoint server (heavily 
used), a Terminal Server , and an internal occasionally-used web server 
(Namescape rDirectory). Then somehow, when I was told to power down the other 4 
VM's so our VMWare guy could reboot a vCenter server, 3 of the 4 remaining VM's 
decided to go AWOL (a combination of missing and disconnected). That took 
out my other two Terminal Servers and another lightly used internal web server.

Did I mention I don't have the normal backups for these things because 
...well...I'm an idiot and didn't confirm our backup guy installed backup 
software on these servers as I stood them up (process error on my part since I 
should confirm it's on there). None of these store data - they all talk to a 
backend SQL and the Terminal Servers are used to run apps that are slow if they 
run the same apps over VPN. SharePoint we got back quick because we do have a 
staging equivalent of it, so it was repoint to a config and content DB, DNS 
change, and done.

I do have copious notes on how I built the others and can rebuild from scratch 
easily enough (I just finished the three TS boxes), but dude...six servers at 
once?

The most frustrating part was discovering that the 4 systems that had been 
powered off could have been migrated before power off and there would have 
been no issue with them - the power down nuked 'em.

Oh, and the lone surviving server - the PGP Universal Server that manages the 
encrypted machines. (Yes, the PGP machines will still boot w/out the server up, 
but still, I've been on this server 50% of my time over the last two weeks!).

Dave

~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to 
listmana...@lyris.sunbeltsoftware.commailto:listmana...@lyris.sunbeltsoftware.com
with the body: unsubscribe ntsysadmin

~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to listmana...@lyris.sunbeltsoftware.com
with the body: unsubscribe ntsysadmin