>>There’s just no simple solutions (in my experience) to avoid them happening.
Indeed... >>Maybe that means using more than one public cloud vendor… Starts to eat away (or totally devour) the value proposition... :) *ASB **http://XeeMe.com/AndrewBaker* <http://xeeme.com/AndrewBaker>* **Providing Virtual CIO Services (IT Operations & Information Security) for the SMB market…*** On Sun, Feb 24, 2013 at 11:57 PM, Ken Schaefer <k...@adopenstatic.com> wrote: > I agree – these types of SNAFUs shouldn’t happen. There’s just no simple > solutions (in my experience) to avoid them happening. **** > > ** ** > > So try to plan for the contingency that ‘bad stuff’ will happen, and work > out what risks you are prepared to mitigate and what you are prepared to > accept. Maybe that means using more than one public cloud vendor…**** > > ** ** > > Cheers**** > > Ken**** > > ** ** > > *From:* Andrew S. Baker [mailto:asbz...@gmail.com] > *Sent:* Monday, 25 February 2013 3:32 PM > > *To:* NT System Admin Issues > *Subject:* Re: MS Azure cloud evaporates**** > > ** ** > > Hi Ken,**** > > ** ** > > I hear you, and I don't disagree, for the most part. I've suffered a > number of these issues on my own network which I fully manage (so there is > no one else to blame, etc), and having managed different sized > environments, I do appreciate the exponential increase in complexity.**** > > ** ** > > To Ben's point though, if you must fail in large and complex endeavors, at > least try for different types of failures each time -- especially if you > are tying more and more resources to the failure point. **** > > ** ** > > It's kind of dumb to have the same type of failure every few months, with > the only change being the ever-increasing scope of impact from the failure. > **** > > > **** > > **** > > **** > > *ASB > **http://XeeMe.com/AndrewBaker* <http://xeeme.com/AndrewBaker>* > **Providing Virtual CIO Services (IT Operations & Information Security) > for the SMB market…***** > > **** > > ** ** > > On Sun, Feb 24, 2013 at 8:31 PM, Ken Schaefer <k...@adopenstatic.com> > wrote:**** > > Sure. > > But Ford/GM/Toyota sell cars - they're affected by recalls. Boeing sells > planes - they seem to have issues (as does the A380 from Airbus - like the > engine that exploded over Singapore). The FDA requires extensive testing of > drugs in the US market, but still some drugs have unintended consequences > despite the billions spent. > > In large, complex environments, with lots of moving parts, things go > wrong. Language barriers, changing regulations, ambiguous requirements, > staff turnover, in-flight projects - all of these things (in my experience) > make it difficult to develop a solid baseline of what should be in the > environment and what's actually there. Unfortunately, I don't know the > answer to making it all work. Some people point to ITIL, but adding layers > of process and documenting them just leads to lots of out-of-date > documentation in my experience. The process writers can't keep up with the > constant changes in the business. (I'm not saying "don't use ITIL" - that > just leads to a huge mess - but it's not the panacea that some people make > it out to be) > > Cheers > Ken**** > > > > -----Original Message----- > From: Tim Evans [mailto:tev...@sparling.com] > Sent: Monday, 25 February 2013 12:13 PM > To: NT System Admin Issues > Subject: RE: MS Azure cloud evaporates > > I appreciate your thoughts from viewpoint of a large org, but if a company > is selling these services, is it unreasonable to expect that they have this > all worked out, at least as far as it affects the services they are selling? > > ...Tim > > > -----Original Message----- > From: Ken Schaefer [mailto:k...@adopenstatic.com] > Sent: Sunday, February 24, 2013 3:36 PM > To: NT System Admin Issues > Subject: RE: MS Azure cloud evaporates > > Sure - asset lifecycle management is a core ITIL concept. It should be > built into your CMDB. > > But large orgs have tens, if not hundreds of thousands (or millions) of > assets. Everything from certs to software licenses to supplier contracts. > It's a full time job, for probably a small army of people, to put all these > things into a system, and respond to the upcoming renewals. > > But alerting: that's just the first step: some alert comes up that says > "xyz fire suppressant system needs to be re-certified". So what? You need > to have a team to hand this off to, and they need to have a process to > follow to get it done (you don't want Ops people making up stuff on-the-fly > - that leads to SEV1 as well). But the reality probably is, that in the 5 > years since the alert was created, the DCFM team's been through several > re-organisations, several business mergers/demergers have occurred, and > some functions have now been outsourced. So whatever team or position was > responsible for this before is long gone, and no one ever went and updated > this alert. > > So now someone has to go negotiate with various managers to see who should > take this on, who R&R/OPEX budget this is coming out of, etc. And if that > someone hasn't have the right understanding of the time criticality of > getting this job done in time, then stuff will break. > > In large orgs, technology (like getting a warning about something ) is > such a small part of actually getting anything working, or keeping it > running. It's all the other stuff, which is mostly processes and human > interaction where things are always breaking. Now, if you're lucky, then > you never re-organise, and the same people hang around for a long time. > Then you have a good understanding of responsibilities, and people have a > lot of accumulated knowledge of the environment. But that's generally > impossible to accomplish in a 100,000 user environment - statistically, > people will always be coming and going. > > Cheers > Ken > > -----Original Message----- > From: Ben M. Schorr [mailto:b...@rolandschorr.com] > Sent: Monday, 25 February 2013 10:05 AM > To: NT System Admin Issues > Subject: RE: MS Azure cloud evaporates > > I realize we're operating on a MUCH smaller basis but whenever we create a > record or certificate that expires on a schedule we also create a task with > a reminder that pops up 30 days before that expiration so that nothing > should quietly expire on us without us getting some eyeballs on it. > > Seems like having some kind of tickler system would make it a lot less > likely for these kinds of routine tasks to go undone. > > Ben M. Schorr > Chief Executive Officer > Roland Schorr & Tower > www.rolandschorr.com > > -----Original Message----- > From: Ken Schaefer [mailto:k...@adopenstatic.com] > Sent: Sunday, February 24, 2013 3:23 PM > To: NT System Admin Issues > Subject: RE: MS Azure cloud evaporates > > In large orgs, it will be impossible (at least in the near future) to > avoid all issues like this. There's simply too much that isn't automated, > or where the full set of rules aren't loaded into your automation tool, or > the tasks are divided between too many people. Large orgs have SEV1s every > day, and it's not always because of negligence - there's simply too many > interdependencies that are unknown. > > For kicks, who here knows that installing AD creates a self-signed cert > that's the default EFS recovery agent for machine based EFS? And it expires > after three years? Stuff like this just happens in the background and can > break things, simply because the PKI team doesn't know about the cert (not > issued by the CAs), the AD team doesn't manage encryption, and which ever > app team decided to use machine based EFS didn't think to sorry about > recovery agents. And this is just a technical problem - when you start to > throw finance and HR and other areas into the mix, things will always fall > through the gaps. > > Cheers > Ken > > -----Original Message----- > From: Ben Scott [mailto:mailvor...@gmail.com] > Sent: Monday, 25 February 2013 3:13 AM > To: NT System Admin Issues > Subject: Re: MS Azure cloud evaporates > > On Sun, Feb 24, 2013 at 4:47 AM, <sep...@gmail.com> wrote: > > Things happen. I imagine meetings are happening and discussions on > > how to root this out again are occurring. > > Sure. But when the same sort of things keep happening, it stops being > an accident and becomes negligence. > > -- Ben > > > **** > > ** ** > > ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ > ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ > > --- > To manage subscriptions click here: > http://lyris.sunbelt-software.com/read/my_forums/ > or send an email to listmana...@lyris.sunbeltsoftware.com > with the body: unsubscribe ntsysadmin > ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ --- To manage subscriptions click here: http://lyris.sunbelt-software.com/read/my_forums/ or send an email to listmana...@lyris.sunbeltsoftware.com with the body: unsubscribe ntsysadmin