On 08/10/2010 07:33 PM, Tom Limoncelli wrote:
> On Wed, Aug 11, 2010 at 11:05 AM, Jefferson Cowart<[email protected]>  wrote:
>> I have recently found myself promoted from being on a team of two
>> network/system administrators to managing the team, and I'm looking for
>> a few bits of advice.
>
> Try to establish with your manager what "managing the team" means.
> Does it mean being the technical leader (leads the team
> architecturally and technical strategy) or does the role include HR
> functions like raises/promotions, budgets, etc.  It is important to
> have a clear division of labor.  Also, if you don't want to only do
> the first parts (technical leader) then don't be tempted to fall for
> the temptation of the other functions... they are time sinks.

I think it's somewhere in between. I'm not sure how much control I'll 
have into budget and such, but I believe I'll be the one responsible for 
performance evaluations and such. I'll certainly confirm this with my 
manager.

>> In addition to any general comments you may have about managing system
>> administrators I also have a few specific questions:
>>
>> * In a small team managing [what I think to be] a fairly diverse
>> environment like ours how do you handle [cross-]training/redundancy? I
>> know there are a large number of things right now that no one other than
>> me knows how to do. I think I have the everyday issues well documented,
>> but non-routine issues may cause issues.
>
> My #1 recommendation is to have a wiki of "checklists". People don't
> like writing documentation but they are ok with writing checklists.
> Some standard checklists to write are:
> 1. things we do for each newhire.
> 2. things we do when an employee is terminated.
> 3. how to (allocate space in the machine room, set up a new server,
> deploy a workstation, add to the puppet configuration, etc. etc.)
> 4. how to respond to the monitoring system when it pages you: For each
> page that you can receive, list the reaction steps.  The last step
> should always be "if all that failed to fix the problem, escalate to
> so-and-so."  If so-and-so feels he/she is getting called in the middle
> of the night too much, ask them to improve the checklist.  Encourage
> people to write the checklist when they add the alert rule to the
> monitoring system.  If someone won't or doesn't write the checklist
> for a particular alert it just means they have agreed to be called in
> the middle of the night every time.
>
> These checklists will grow and improve over time.  Every time you have
> an outage, augment the checklists that would prevent that problem in
> the future.

We already use a mediawiki install for hosting documentation, so I think 
this can fit in nicely. It has a few checklists on it already, but we 
can certainly improve them.

On that note, any suggestions on how to get people to write [good] 
documentation? Others in our teams seem to be very resistant to writing 
even basic documentation. (There are a couple services we provide that I 
don't have documentation on what system hosts the database.) I'd guess 
I've written 90%+ of the documentation on our wiki. While everyone 
agrees it's a good idea, no one makes the time to write it.

> The question was specifically about [cross-]training.
> Training: new person is taught the related checklists.  when they
> master those, they get a starter project to replace a checklist with
> an automated procedure.
> Cross-training: make sure everyone can handle the checklists of the
> other teams; they don't have to know "why" just the "how".

In addition to that any suggestions on how to handle things along the 
lines of "this person is weak in area xyz; how do I get him up to speed 
on that?"

Thanks again for the advice.

-- 
Thanks
Jefferson Cowart
[email protected]

_______________________________________________
Discuss mailing list
[email protected]
http://lopsa.org/cgi-bin/mailman/listinfo/discuss
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to