Re: [lopsa-discuss] Knowledge lost in time....

john boris Tue, 25 Jun 2013 09:32:18 -0700

Lawrence
I feel for you about the downtime and breaking your production server. Some
of us don't have the luxury of either a staff to test patches and upgrades
or a platform to test them on. With VMWare and VirtualBox (Insert favorite
Virtual Machine manager here) handling more and more legacy and older OS it
is easier nowadays to spin up a test or development server. But that puts
you into another bind.


1. Finding a VM Manager that works with your OS
2.  Finding Hardware robust enough to handle the VM Manager and multiple
VMs
3. Finding the human resources to spin them up and maintain them.

I have that first hand here. Sometimes I think about putting up a mirror so
I think I have help (As you do with a Parrot). I found a VM that lives well
with SCO and bingo I had a way to test my software and have a way to keep
the legacy app running when the hardware goes on vacation.

So when you don't have the resources you just wing it as best you can and
hope you have everything covered sufficiently so that when it does decide
to go belly up you are covered.




On Mon, Jun 24, 2013 at 7:44 PM, Lawrence K. Chen, P.Eng. <[email protected]>wrote:

>
> ----- Original Message -----
> > > From: [email protected] [mailto:discuss-
> > > [email protected]] On Behalf Of Lawrence K. Chen, P.Eng.
> > >
> > > Good thing...I'd hate to end the uptime streak that this server
> > > has....it had
> > > been up 2525 days.
> >
> > I've said many times before, that you shouldn't be proud of your
> > uptime, because it means you're not applying updates, so you're
> > exposing yourself to bugs & vulnerabilities.  I understand sometimes
> > systems run in a protected environment where that's not much of a
> > concern.  But there are 2-3 other reasons which you've demonstrated:
> >
>
> Yes, I have often argued such things, but since scheduling downtime to
> patch a server requires signoff by a dozen or so groups, and there's no
> guarantee that the patch won't break whatever equally ancient application
> that's on the box....approvals rarely happen.
>
> Back when we had assigned areas of responsibilities...I used to tell the
> people the patching had to occur during the biannual patch weeks. (the week
> between christmas and new years, and a quiet week in July...though that
> quiet week had disappeared when I came up to my first one...)  When that
> area got assigned to another SA, the patching of those servers stopped.
>  Largely because I was the only one left that had been pushing systems to
> get patched.  The other SA that was for it, had quit some time ago.  But, I
> came from an environment where regular patching has saved my systems when
> the rest of the company got knocked out.  Which we would learn about after
> company Internet and phone service is restored....and the messages of the
> impending outage finally reach us.  They would talk about losing 10's
> millions due to the worm, but I could've been working during the time...if
> I didn't have to go around and cleanup other people's computers.
>
> But my hate to end the uptime streak of 2525+ days....is that the reboot
> would likely go one of two ways.  It works, because nothing has changed
> much configuration wise, and no bits have rotted, during the time.  Or it
> disappears.
>
> I once lost our datacenter DNS server by patching it....and rebooting it.
>  It has been set up and managed by now former SA...where it was revealed
> that he had repurposed a imap dev server and only making command line
> changes and running out of tmp space..... so it came up as the original
> purpose with completely no trace of its DNS server existence.  And, tmp
> space isn't backed up by our backup system. (almost caught another group,
> because they were doing daily database backups in to tmp...expecting that
> our night system backup would pick it up....fortunately, I spotted what
> they were doing and had them back up somewhere else....where a week later
> an application update failed and they needed to do a restore.....though it
> wasn't a total save, because they had been doing tar of the database while
> its running backups....)
>
> I spent the rest of the evening/night building a new datacenter DNS server
> from scratch.....later when I talked to him on IRC, he said something about
> the real hardware for that server needed servicing so he had done that
> system as temporary, and I ended an almost 2.5 year uptime for that server.
>
> And, there have been lots of other horror stories in our datacenter along
> these lines.
>
> Meanwhile, I'm getting a chuckle of all the people freaking out about
> Java6 being EOL'd....they have servers here and servers there that need
> that, and probably won't work with java7.  But, most of the critical
> servers mentioned....are on hardware so old they aren't on support, running
> an OS that has never been patched and is also into EOL (extended support is
> available for extra bucks, but since the hardware isn't on support, neither
> is the OS.)  Of course, we've tried to get them to upgrade.  But, given the
> state of their systems...the end of java updates has no impact.
>
> I think about the oldest in production is a Solaris 8 box and an RHEL 2.1
> box....
>
> Up until shortly after the DST change we still had some Solaris 2.6
> servers in production, along with a SunOS 4.1.3 box. (since I remember
> having to figure out the tz tools on the respective systems to make them
> handle the change)
>
> Though lately my $boss has been talking about building out new server
> architectures...where the applications people have been told up front that
> they have to expect that a node can and will be take out from under them,
> for patching, without notice without interruption to their users.  And, to
> also expect that it will be newer when it reappears and that in a week all
> the other nodes will be that level, too.
>
> Its going as if its what they wanted all along... though he doesn't think
> other groups will be up for that kind of thing....
>
> Oddly, I had been telling him about such ideas for years (after coming
> back from a couple of LISA's)....
>
> --
> Who: Lawrence K. Chen, P.Eng. - W0LKC - Senior Unix Systems Administrator
> For: Enterprise Server Technologies (EST) -- & SafeZone Ally
> Snail: Computing and Telecommunications Services (CTS)
> Kansas State University, 109 East Stadium, Manhattan, KS 66506-3102
> Phone: (785) 532-4916 - Fax: (785) 532-3515 - Email: [email protected]
> Web: http://www-personal.ksu.edu/~lkchen - Where: 11 Hale Library
> _______________________________________________
> Discuss mailing list
> [email protected]
> https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss
> This list provided by the League of Professional System Administrators
>  http://lopsa.org/
>



-- 
John J. Boris, Sr.
Online Services
www.onlinesvc.com

_______________________________________________
Discuss mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Re: [lopsa-discuss] Knowledge lost in time....

Reply via email to