Re: [lopsa-discuss] Managing System/Network Admins

Tom Limoncelli Tue, 10 Aug 2010 19:38:04 -0700

On Wed, Aug 11, 2010 at 11:05 AM, Jefferson Cowart <[email protected]> wrote:
> I have recently found myself promoted from being on a team of two
> network/system administrators to managing the team, and I'm looking for
> a few bits of advice.


Try to establish with your manager what "managing the team" means.
Does it mean being the technical leader (leads the team
architecturally and technical strategy) or does the role include HR
functions like raises/promotions, budgets, etc.  It is important to
have a clear division of labor.  Also, if you don't want to only do
the first parts (technical leader) then don't be tempted to fall for
the temptation of the other functions... they are time sinks.

> In addition to any general comments you may have about managing system
> administrators I also have a few specific questions:
>
> * In a small team managing [what I think to be] a fairly diverse
> environment like ours how do you handle [cross-]training/redundancy? I
> know there are a large number of things right now that no one other than
> me knows how to do. I think I have the everyday issues well documented,
> but non-routine issues may cause issues.

My #1 recommendation is to have a wiki of "checklists". People don't
like writing documentation but they are ok with writing checklists.
Some standard checklists to write are:
1. things we do for each newhire.
2. things we do when an employee is terminated.
3. how to (allocate space in the machine room, set up a new server,
deploy a workstation, add to the puppet configuration, etc. etc.)
4. how to respond to the monitoring system when it pages you: For each
page that you can receive, list the reaction steps.  The last step
should always be "if all that failed to fix the problem, escalate to
so-and-so."  If so-and-so feels he/she is getting called in the middle
of the night too much, ask them to improve the checklist.  Encourage
people to write the checklist when they add the alert rule to the
monitoring system.  If someone won't or doesn't write the checklist
for a particular alert it just means they have agreed to be called in
the middle of the night every time.

These checklists will grow and improve over time.  Every time you have
an outage, augment the checklists that would prevent that problem in
the future.

The question was specifically about [cross-]training.
Training: new person is taught the related checklists.  when they
master those, they get a starter project to replace a checklist with
an automated procedure.
Cross-training: make sure everyone can handle the checklists of the
other teams; they don't have to know "why" just the "how".

> * What are metrics that I should be looking at monitoring/measuring for
> the team (both as a whole and individually)? There are obvious things to
> look at in terms of outages and support requests (number handled/time to
> resolution), but I suspect there are other things that make sense to
> look at.

That's the most difficult part of managing people!

Tom

-- 
http://EverythingSysadmin.com  -- my blog
http://www.TomOnTime.com -- my advice

_______________________________________________
Discuss mailing list
[email protected]
http://lopsa.org/cgi-bin/mailman/listinfo/discuss
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Re: [lopsa-discuss] Managing System/Network Admins

Reply via email to