On Wed, Aug 11, 2010 at 11:05 AM, Jefferson Cowart <[email protected]> wrote: > I have recently found myself promoted from being on a team of two > network/system administrators to managing the team, and I'm looking for > a few bits of advice.
Try to establish with your manager what "managing the team" means. Does it mean being the technical leader (leads the team architecturally and technical strategy) or does the role include HR functions like raises/promotions, budgets, etc. It is important to have a clear division of labor. Also, if you don't want to only do the first parts (technical leader) then don't be tempted to fall for the temptation of the other functions... they are time sinks. > In addition to any general comments you may have about managing system > administrators I also have a few specific questions: > > * In a small team managing [what I think to be] a fairly diverse > environment like ours how do you handle [cross-]training/redundancy? I > know there are a large number of things right now that no one other than > me knows how to do. I think I have the everyday issues well documented, > but non-routine issues may cause issues. My #1 recommendation is to have a wiki of "checklists". People don't like writing documentation but they are ok with writing checklists. Some standard checklists to write are: 1. things we do for each newhire. 2. things we do when an employee is terminated. 3. how to (allocate space in the machine room, set up a new server, deploy a workstation, add to the puppet configuration, etc. etc.) 4. how to respond to the monitoring system when it pages you: For each page that you can receive, list the reaction steps. The last step should always be "if all that failed to fix the problem, escalate to so-and-so." If so-and-so feels he/she is getting called in the middle of the night too much, ask them to improve the checklist. Encourage people to write the checklist when they add the alert rule to the monitoring system. If someone won't or doesn't write the checklist for a particular alert it just means they have agreed to be called in the middle of the night every time. These checklists will grow and improve over time. Every time you have an outage, augment the checklists that would prevent that problem in the future. The question was specifically about [cross-]training. Training: new person is taught the related checklists. when they master those, they get a starter project to replace a checklist with an automated procedure. Cross-training: make sure everyone can handle the checklists of the other teams; they don't have to know "why" just the "how". > * What are metrics that I should be looking at monitoring/measuring for > the team (both as a whole and individually)? There are obvious things to > look at in terms of outages and support requests (number handled/time to > resolution), but I suspect there are other things that make sense to > look at. That's the most difficult part of managing people! Tom -- http://EverythingSysadmin.comĀ -- my blog http://www.TomOnTime.com -- my advice _______________________________________________ Discuss mailing list [email protected] http://lopsa.org/cgi-bin/mailman/listinfo/discuss This list provided by the League of Professional System Administrators http://lopsa.org/
