Re: NOC Best Practices
Hello Kim I am also interested in NOC best practices, but have found out that it is not easy to find much documented on the subject. I think as most seem to have already answered in your thread, that is because every NOC is a little different from the other. Specially depending on the type of organisation or company they are working for. One of the things we have done in the research and educational community in Europe is to start a Task Force[1] on the topic. The task force has not really kicked off yet, so unfortunately we don't have any answers to your questions yet. I also guess your from a commercial company which might have a little different priorities than we do. That said, maybe looking at our questions and problems, might give you some food for thoughts in regards to what is important for your NOC. Following my link[2] below you can find our Terms of Reference. Basically what we are aiming to investigate and what we initially think is interesting to discuss in regards to a NOC. Not sure if it is helpful for you, but during our initial discussions around the task force we had some presentations about the NOC from different kinds of organisations. You can find the presentation slides on our meeting page[3]. If you are interested in ITIL and operations I can recommend the following two books: IT Service Management Based on ITIL V3, A Pocket Guide The Visible OPS Handbook, Implementing ITIL in 4 practical and auditable steps They are fairly easily read and make some good points. But if you consider implementing ITIL, be aware of the fact that it is easy to overcomplicating things. I would recommend starting out small and only use the things you think makes sense in regards to your organisation. Someone in this thread mentioned e-tom[4] which is published by TMForum. TMForum publish best practices in among other things operations, the downside is that you have to be a member to access most of their published documents. [1] http://www.terena.org/activities/tf-noc/ [2] http://www.terena.org/activities/tf-noc/tf-noc-tor_v3.pdf [3] http://www.terena.org/activities/tf-noc/prep/programme.html [4] http://www.tmforum.org/DocumentsBusiness/BusinessProcessFramework/35431/article.html Best regards Stefan On 2010-07-16 20:34, Kasper Adel wrote: Thanks for all the people that replied off list, asking me to send them responses i will get. I got nothing other than : http://www.nanog.org/meetings/nanog24/abstracts.php?pt=OTM1Jm5hbm9nMjQ=nm=nanog24 and Network Management- Accounting and Performance Strategies - Just the first three chapters Which is useful but i am looking for more stuff from the best people that run the best NOCs in the world. So i'm throwing this out again. I am looking for pointers, suggestions, URLs, documents, donations on what a professional NOC would have on the below topics: 1) Briefly, how they handle their own tickets with vendors or internal 2) How they create a learning environment for their people (Documenting Syslog, lessons learned from problems...etc) 3) Shift to Shift hand over procedures 4) Manual tests they start their day with and what they automate (common stuff) 5) Change management best practices and working with operations/engineering when a change will be implemented Should i be looking for ITIL stuff or its not any good? Thanks, Kim On Wed, Jul 14, 2010 at 8:24 PM, Kasper Adelkarim.a...@gmail.com wrote: Hello Everyone, I am currently working on building a NOC so i'm looking for materials/pointers to Best Practices documented out there. On the top of my head are things like: 1) Documenting Incidents and handling them 2) Documenting Syslog messages 3) Documenting Vendor Software Bugs 4) Shift to Shift Hand over procedures 5) Commonly used scripts for monitoring 6) Frequently testing High Availability 7) Capturing config changes. etc I can see that this is years of experience but i am wondering if any of this was captured some where. Thanks, Kim
Re: NOC Best Practices
On Fri, Jul 16, 2010 at 09:34:53PM +0300, Kasper Adel wrote: Thanks for all the people that replied off list, asking me to send them responses i will get. [snip] Which is useful but i am looking for more stuff from the best people that run the best NOCs in the world. So i'm throwing this out again. I am looking for pointers, suggestions, URLs, documents, donations on what a professional NOC would have on the below topics: A lot, as others have said, depending on the business, staffing, goals, SLA, contracts, etc. 1) Briefly, how they handle their own tickets with vendors or internal Run a proper ticketing system over which you have control (RT and friends rather than locking you into something you have to pay for changes). Don't just by ticket closure rate, judge by succesfully resolving problems. Encourage folks to use the system for tracking projects and keeping notes on work in progress rather than private datastores. Inculcate a culture of open exploration to solve problems rather than rote memorization. This gets you a large way to #2. 2) How they create a learning environment for their people (Documenting Syslog, lessons learned from problems...etc) Mentoring, shoulder surfing. Keep your senior people in the mix of triage response so they don't get dull and cross-pollenate skills. When someone is new, have their probationary period be shadowing the primary on-call the entire time. Your third shift [or whatever spans your maintenance windows] should be the folks who actually wind up executing well-specified maintenances (with guidance as needed) and be the breeding ground of some of your better hands-on folks. 3) Shift to Shift hand over procedures This will depend on your systems for tickets, logbooks, etc. Sole that first and this should become evident. 4) Manual tests they start their day with and what they automate (common stuff) This will vary on the business and what's on-site; I can't advise you to always include the genset is you don't have one. 5) Change management best practices and working with operations/engineering when a change will be implemented Standing maintenance windows (of varying severity if that matters yo your business), clear definition of what needs to be done only duringthose and what can be done anytime [hint: policy tuning shouldn't be restructed to them, and you shouldn't make it so an urgent things like a BGP leak can't be fixed]. Linear rather than parallel workflows for approval, and not too many approval stages else your staff will be spending time trying to get things through the administrative stages instead of actual work. Very simply, have a standard for specifying what needs to be done, the minimal tests needed to verify success, and how you fallback if you fail the tests. If someone can't specify it and insist on frobbing around, they likely don't understand the problem or the needed work. Cheers, Joe -- RSUC / GweepNet / Spunk / FnB / Usenix / SAGE
Re: NOC Best Practices
, something like Alarmpoint. There is nothing more frustrating for an on-call to be paged and have no idea who to call back, who paged, or what the number is. I've written so much my fingers hurt from these Blackberry keys. Hope this information helps a little. Best of luck, -Kevin Excuse the spelling/punctuation... This is from my mobile. -Original Message- From: Joe Provo nanog-p...@rsuc.gweep.net Date: Sat, 17 Jul 2010 14:56:04 To: Kasper Adelkarim.a...@gmail.com Reply-To: nanog-p...@rsuc.gweep.net Cc: NANOG listnanog@nanog.org Subject: Re: NOC Best Practices On Fri, Jul 16, 2010 at 09:34:53PM +0300, Kasper Adel wrote: Thanks for all the people that replied off list, asking me to send them responses i will get. [snip] Which is useful but i am looking for more stuff from the best people that run the best NOCs in the world. So i'm throwing this out again. I am looking for pointers, suggestions, URLs, documents, donations on what a professional NOC would have on the below topics: A lot, as others have said, depending on the business, staffing, goals, SLA, contracts, etc. 1) Briefly, how they handle their own tickets with vendors or internal Run a proper ticketing system over which you have control (RT and friends rather than locking you into something you have to pay for changes). Don't just by ticket closure rate, judge by succesfully resolving problems. Encourage folks to use the system for tracking projects and keeping notes on work in progress rather than private datastores. Inculcate a culture of open exploration to solve problems rather than rote memorization. This gets you a large way to #2. 2) How they create a learning environment for their people (Documenting Syslog, lessons learned from problems...etc) Mentoring, shoulder surfing. Keep your senior people in the mix of triage response so they don't get dull and cross-pollenate skills. When someone is new, have their probationary period be shadowing the primary on-call the entire time. Your third shift [or whatever spans your maintenance windows] should be the folks who actually wind up executing well-specified maintenances (with guidance as needed) and be the breeding ground of some of your better hands-on folks. 3) Shift to Shift hand over procedures This will depend on your systems for tickets, logbooks, etc. Sole that first and this should become evident. 4) Manual tests they start their day with and what they automate (common stuff) This will vary on the business and what's on-site; I can't advise you to always include the genset is you don't have one. 5) Change management best practices and working with operations/engineering when a change will be implemented Standing maintenance windows (of varying severity if that matters yo your business), clear definition of what needs to be done only duringthose and what can be done anytime [hint: policy tuning shouldn't be restructed to them, and you shouldn't make it so an urgent things like a BGP leak can't be fixed]. Linear rather than parallel workflows for approval, and not too many approval stages else your staff will be spending time trying to get things through the administrative stages instead of actual work. Very simply, have a standard for specifying what needs to be done, the minimal tests needed to verify success, and how you fallback if you fail the tests. If someone can't specify it and insist on frobbing around, they likely don't understand the problem or the needed work. Cheers, Joe -- RSUC / GweepNet / Spunk / FnB / Usenix / SAGE
Re: NOC Best Practices
What about e-TOM? Is it better than ITIL V3? Regards, Xavier Telconet S.A -Original Message- From: Joe Provo nanog-p...@rsuc.gweep.net Date: Sat, 17 Jul 2010 14:56:04 To: Kasper Adelkarim.a...@gmail.com Reply-To: nanog-p...@rsuc.gweep.net Cc: NANOG listnanog@nanog.org Subject: Re: NOC Best Practices On Fri, Jul 16, 2010 at 09:34:53PM +0300, Kasper Adel wrote: Thanks for all the people that replied off list, asking me to send them responses i will get. [snip] Which is useful but i am looking for more stuff from the best people that run the best NOCs in the world. So i'm throwing this out again. I am looking for pointers, suggestions, URLs, documents, donations on what a professional NOC would have on the below topics: A lot, as others have said, depending on the business, staffing, goals, SLA, contracts, etc. 1) Briefly, how they handle their own tickets with vendors or internal Run a proper ticketing system over which you have control (RT and friends rather than locking you into something you have to pay for changes). Don't just by ticket closure rate, judge by succesfully resolving problems. Encourage folks to use the system for tracking projects and keeping notes on work in progress rather than private datastores. Inculcate a culture of open exploration to solve problems rather than rote memorization. This gets you a large way to #2. 2) How they create a learning environment for their people (Documenting Syslog, lessons learned from problems...etc) Mentoring, shoulder surfing. Keep your senior people in the mix of triage response so they don't get dull and cross-pollenate skills. When someone is new, have their probationary period be shadowing the primary on-call the entire time. Your third shift [or whatever spans your maintenance windows] should be the folks who actually wind up executing well-specified maintenances (with guidance as needed) and be the breeding ground of some of your better hands-on folks. 3) Shift to Shift hand over procedures This will depend on your systems for tickets, logbooks, etc. Sole that first and this should become evident. 4) Manual tests they start their day with and what they automate (common stuff) This will vary on the business and what's on-site; I can't advise you to always include the genset is you don't have one. 5) Change management best practices and working with operations/engineering when a change will be implemented Standing maintenance windows (of varying severity if that matters yo your business), clear definition of what needs to be done only duringthose and what can be done anytime [hint: policy tuning shouldn't be restructed to them, and you shouldn't make it so an urgent things like a BGP leak can't be fixed]. Linear rather than parallel workflows for approval, and not too many approval stages else your staff will be spending time trying to get things through the administrative stages instead of actual work. Very simply, have a standard for specifying what needs to be done, the minimal tests needed to verify success, and how you fallback if you fail the tests. If someone can't specify it and insist on frobbing around, they likely don't understand the problem or the needed work. Cheers, Joe -- RSUC / GweepNet / Spunk / FnB / Usenix / SAGE
Re: NOC Best Practices
eTOM is best regarded as a companion to ITIL practices. It has additional layers not covered by ITIL and vice versa. I think a combination of practices from both is the best method. -Kevin -Original Message- From: Xavier Banchon xbanc...@telconet.net Date: Sat, 17 Jul 2010 20:20:26 To: nanog-p...@rsuc.gweep.net; Kasper Adelkarim.a...@gmail.com Reply-To: xbanc...@telconet.net Cc: NANOG listnanog@nanog.org Subject: Re: NOC Best Practices What about e-TOM? Is it better than ITIL V3? Regards, Xavier Telconet S.A -Original Message- From: Joe Provo nanog-p...@rsuc.gweep.net Date: Sat, 17 Jul 2010 14:56:04 To: Kasper Adelkarim.a...@gmail.com Reply-To: nanog-p...@rsuc.gweep.net Cc: NANOG listnanog@nanog.org Subject: Re: NOC Best Practices On Fri, Jul 16, 2010 at 09:34:53PM +0300, Kasper Adel wrote: Thanks for all the people that replied off list, asking me to send them responses i will get. [snip] Which is useful but i am looking for more stuff from the best people that run the best NOCs in the world. So i'm throwing this out again. I am looking for pointers, suggestions, URLs, documents, donations on what a professional NOC would have on the below topics: A lot, as others have said, depending on the business, staffing, goals, SLA, contracts, etc. 1) Briefly, how they handle their own tickets with vendors or internal Run a proper ticketing system over which you have control (RT and friends rather than locking you into something you have to pay for changes). Don't just by ticket closure rate, judge by succesfully resolving problems. Encourage folks to use the system for tracking projects and keeping notes on work in progress rather than private datastores. Inculcate a culture of open exploration to solve problems rather than rote memorization. This gets you a large way to #2. 2) How they create a learning environment for their people (Documenting Syslog, lessons learned from problems...etc) Mentoring, shoulder surfing. Keep your senior people in the mix of triage response so they don't get dull and cross-pollenate skills. When someone is new, have their probationary period be shadowing the primary on-call the entire time. Your third shift [or whatever spans your maintenance windows] should be the folks who actually wind up executing well-specified maintenances (with guidance as needed) and be the breeding ground of some of your better hands-on folks. 3) Shift to Shift hand over procedures This will depend on your systems for tickets, logbooks, etc. Sole that first and this should become evident. 4) Manual tests they start their day with and what they automate (common stuff) This will vary on the business and what's on-site; I can't advise you to always include the genset is you don't have one. 5) Change management best practices and working with operations/engineering when a change will be implemented Standing maintenance windows (of varying severity if that matters yo your business), clear definition of what needs to be done only duringthose and what can be done anytime [hint: policy tuning shouldn't be restructed to them, and you shouldn't make it so an urgent things like a BGP leak can't be fixed]. Linear rather than parallel workflows for approval, and not too many approval stages else your staff will be spending time trying to get things through the administrative stages instead of actual work. Very simply, have a standard for specifying what needs to be done, the minimal tests needed to verify success, and how you fallback if you fail the tests. If someone can't specify it and insist on frobbing around, they likely don't understand the problem or the needed work. Cheers, Joe -- RSUC / GweepNet / Spunk / FnB / Usenix / SAGE
Re: NOC Best Practices
Thanks for all the people that replied off list, asking me to send them responses i will get. I got nothing other than : http://www.nanog.org/meetings/nanog24/abstracts.php?pt=OTM1Jm5hbm9nMjQ=nm=nanog24 and Network Management- Accounting and Performance Strategies - Just the first three chapters Which is useful but i am looking for more stuff from the best people that run the best NOCs in the world. So i'm throwing this out again. I am looking for pointers, suggestions, URLs, documents, donations on what a professional NOC would have on the below topics: 1) Briefly, how they handle their own tickets with vendors or internal 2) How they create a learning environment for their people (Documenting Syslog, lessons learned from problems...etc) 3) Shift to Shift hand over procedures 4) Manual tests they start their day with and what they automate (common stuff) 5) Change management best practices and working with operations/engineering when a change will be implemented Should i be looking for ITIL stuff or its not any good? Thanks, Kim On Wed, Jul 14, 2010 at 8:24 PM, Kasper Adel karim.a...@gmail.com wrote: Hello Everyone, I am currently working on building a NOC so i'm looking for materials/pointers to Best Practices documented out there. On the top of my head are things like: 1) Documenting Incidents and handling them 2) Documenting Syslog messages 3) Documenting Vendor Software Bugs 4) Shift to Shift Hand over procedures 5) Commonly used scripts for monitoring 6) Frequently testing High Availability 7) Capturing config changes. etc I can see that this is years of experience but i am wondering if any of this was captured some where. Thanks, Kim
Re: NOC Best Practices
I believe, myself included, are hesitant to answer because it really depends upon a lot of variables. Type of business your NOC is running, the operating budget, number of racks, etc. The details matter when narrowing things down. But yes, I have seen this ITIL http://www.frontrange.com/ click the Register for a Free ITIL Success Kit! You may be interested in. -- Thanks, Joe On Fri, Jul 16, 2010 at 11:34 AM, Kasper Adel karim.a...@gmail.com wrote: Thanks for all the people that replied off list, asking me to send them responses i will get. I got nothing other than : http://www.nanog.org/meetings/nanog24/abstracts.php?pt=OTM1Jm5hbm9nMjQ=nm=nanog24 and Network Management- Accounting and Performance Strategies - Just the first three chapters Which is useful but i am looking for more stuff from the best people that run the best NOCs in the world. So i'm throwing this out again. I am looking for pointers, suggestions, URLs, documents, donations on what a professional NOC would have on the below topics: 1) Briefly, how they handle their own tickets with vendors or internal 2) How they create a learning environment for their people (Documenting Syslog, lessons learned from problems...etc) 3) Shift to Shift hand over procedures 4) Manual tests they start their day with and what they automate (common stuff) 5) Change management best practices and working with operations/engineering when a change will be implemented Should i be looking for ITIL stuff or its not any good? Thanks, Kim On Wed, Jul 14, 2010 at 8:24 PM, Kasper Adel karim.a...@gmail.com wrote: Hello Everyone, I am currently working on building a NOC so i'm looking for materials/pointers to Best Practices documented out there. On the top of my head are things like: 1) Documenting Incidents and handling them 2) Documenting Syslog messages 3) Documenting Vendor Software Bugs 4) Shift to Shift Hand over procedures 5) Commonly used scripts for monitoring 6) Frequently testing High Availability 7) Capturing config changes. etc I can see that this is years of experience but i am wondering if any of this was captured some where. Thanks, Kim
NOC Best Practices
Hello Everyone, I am currently working on building a NOC so i'm looking for materials/pointers to Best Practices documented out there. On the top of my head are things like: 1) Documenting Incidents and handling them 2) Documenting Syslog messages 3) Documenting Vendor Software Bugs 4) Shift to Shift Hand over procedures 5) Commonly used scripts for monitoring 6) Frequently testing High Availability 7) Capturing config changes. etc I can see that this is years of experience but i am wondering if any of this was captured some where. Thanks, Kim