Re: NOC Best Practices

2010-08-09 Thread Stefan Liström

Hello Kim

I am also interested in NOC best practices, but have found out that it 
is not easy to find much documented on the subject. I think as most seem 
to have already answered in your thread, that is because every NOC is a 
little different from the other. Specially depending on the type of 
organisation or company they are working for.


One of the things we have done in the research and educational community 
in Europe is to start a Task Force[1] on the topic. The task force has 
not really kicked off yet, so unfortunately we don't have any answers to 
your questions yet. I also guess your from a commercial company which 
might have a little different priorities than we do. That said, maybe 
looking at our questions and problems, might give you some food for 
thoughts in regards to what is important for your NOC.


Following my link[2] below you can find our Terms of Reference. 
Basically what we are aiming to investigate and what we initially think 
is interesting to discuss in regards to a NOC.


Not sure if it is helpful for you, but during our initial discussions 
around the task force we had some presentations about the NOC from 
different kinds of organisations. You can find the presentation slides 
on our meeting page[3].


If you are interested in ITIL and operations I can recommend the 
following two books:

IT Service Management Based on ITIL V3, A Pocket Guide
The Visible OPS Handbook, Implementing ITIL in 4 practical and auditable 
steps


They are fairly easily read and make some good points. But if you 
consider implementing ITIL, be aware of the fact that it is easy to 
overcomplicating things. I would recommend starting out small and only 
use the things you think makes sense in regards to your organisation.


Someone in this thread mentioned e-tom[4] which is published by TMForum. 
TMForum publish best practices in among other things operations, the 
downside is that you have to be a member to access most of their 
published documents.


[1] http://www.terena.org/activities/tf-noc/
[2] http://www.terena.org/activities/tf-noc/tf-noc-tor_v3.pdf
[3] http://www.terena.org/activities/tf-noc/prep/programme.html
[4] 
http://www.tmforum.org/DocumentsBusiness/BusinessProcessFramework/35431/article.html


Best regards
Stefan

On 2010-07-16 20:34, Kasper Adel wrote:

Thanks for all the people that replied off list, asking me to send them
responses i will get.

I got nothing other than :
http://www.nanog.org/meetings/nanog24/abstracts.php?pt=OTM1Jm5hbm9nMjQ=nm=nanog24
and

Network Management-  Accounting and Performance Strategies - Just the first
three chapters

Which is useful but i am looking for more stuff from the best people that
run the best NOCs in the world.

So i'm throwing this out again.

I am looking for pointers, suggestions, URLs, documents, donations on what a
professional NOC would have on the below topics:

1) Briefly, how they handle their own tickets with vendors or internal
2) How they create a learning environment for their people (Documenting
Syslog, lessons learned from problems...etc)
3) Shift to Shift hand over procedures
4) Manual tests  they start their day with and what they automate (common
stuff)
5) Change management best practices and working with operations/engineering
when a change will be implemented

Should i be looking for ITIL stuff or its not any good?

Thanks,
Kim

On Wed, Jul 14, 2010 at 8:24 PM, Kasper Adelkarim.a...@gmail.com  wrote:


Hello Everyone,

I am currently working on building a NOC so i'm looking for
materials/pointers to Best Practices documented out there.

On the top of my head are things like:

1) Documenting Incidents and handling them
2) Documenting Syslog messages
3) Documenting Vendor Software Bugs
4) Shift to Shift Hand over procedures
5) Commonly used scripts for monitoring
6) Frequently testing High Availability
7) Capturing config changes.
etc

I can see that this is years of experience but i am wondering if any of
this was captured some where.

Thanks,
Kim





Re: NOC Best Practices

2010-07-17 Thread Joe Provo
On Fri, Jul 16, 2010 at 09:34:53PM +0300, Kasper Adel wrote:
 Thanks for all the people that replied off list, asking me to send them
 responses i will get.
[snip]
 Which is useful but i am looking for more stuff from the best people that
 run the best NOCs in the world.
 
 So i'm throwing this out again.
 
 I am looking for pointers, suggestions, URLs, documents, donations on what a
 professional NOC would have on the below topics:

A lot, as others have said, depending on the business, staffing, 
goals, SLA, contracts, etc.

 1) Briefly, how they handle their own tickets with vendors or internal

Run a proper ticketing system over which you have control (RT and 
friends rather than locking you into something you have to pay for 
changes).  Don't just by ticket closure rate, judge by succesfully 
resolving problems. Encourage folks to use the system for tracking 
projects and keeping notes on work in progress rather than private 
datastores. Inculcate a culture of open exploration to solve problems
rather than rote memorization. This gets you a large way to #2.

 2) How they create a learning environment for their people (Documenting
 Syslog, lessons learned from problems...etc)

Mentoring, shoulder surfing. Keep your senior people in the mix 
of triage  response so they don't get dull and cross-pollenate 
skills.  When someone is new, have their probationary period be 
shadowing the primary on-call the entire time.  Your third shift 
[or whatever spans your maintenance windows] should be the folks 
who actually wind up executing well-specified maintenances (with 
guidance as needed) and be the breeding ground of some of your 
better hands-on folks.

 3) Shift to Shift hand over procedures

This will depend on your systems for tickets, logbooks, etc. 
Sole that first and this should become evident.

 4) Manual tests  they start their day with and what they automate (common
 stuff)

This will vary on the business and what's on-site; I can't 
advise you to always include the genset is you don't have 
one.

 5) Change management best practices and working with operations/engineering
 when a change will be implemented

Standing maintenance windows (of varying severity if that 
matters yo your business), clear definition of what needs 
to be done only duringthose and what can be done anytime 
[hint: policy tuning shouldn't be restructed to them, and 
you shouldn't make it so an urgent things like a BGP leak 
can't be fixed].  Linear rather than parallel workflows 
for approval, and not too many approval stages else your 
staff will be spending time trying to get things through 
the administrative stages instead of actual work.  Very
simply, have a standard for specifying what needs to be 
done, the minimal tests needed to verify success, and how
you fallback if you fail the tests.  If someone can't 
specify it and insist on frobbing around, they likely don't 
understand the problem or the needed work.

Cheers,

Joe
-- 
 RSUC / GweepNet / Spunk / FnB / Usenix / SAGE



Re: NOC Best Practices

2010-07-17 Thread khatfield
, something like 
Alarmpoint. 

There is nothing more frustrating for an on-call to be paged and have no idea 
who to call back, who paged, or what the number is.

I've written so much my fingers hurt from these Blackberry keys. Hope this 
information helps a little.

Best of luck,
-Kevin

Excuse the spelling/punctuation... This is from my mobile.
-Original Message-
From: Joe Provo nanog-p...@rsuc.gweep.net
Date: Sat, 17 Jul 2010 14:56:04 
To: Kasper Adelkarim.a...@gmail.com
Reply-To: nanog-p...@rsuc.gweep.net
Cc: NANOG listnanog@nanog.org
Subject: Re: NOC Best Practices

On Fri, Jul 16, 2010 at 09:34:53PM +0300, Kasper Adel wrote:
 Thanks for all the people that replied off list, asking me to send them
 responses i will get.
[snip]
 Which is useful but i am looking for more stuff from the best people that
 run the best NOCs in the world.
 
 So i'm throwing this out again.
 
 I am looking for pointers, suggestions, URLs, documents, donations on what a
 professional NOC would have on the below topics:

A lot, as others have said, depending on the business, staffing, 
goals, SLA, contracts, etc.

 1) Briefly, how they handle their own tickets with vendors or internal

Run a proper ticketing system over which you have control (RT and 
friends rather than locking you into something you have to pay for 
changes).  Don't just by ticket closure rate, judge by succesfully 
resolving problems. Encourage folks to use the system for tracking 
projects and keeping notes on work in progress rather than private 
datastores. Inculcate a culture of open exploration to solve problems
rather than rote memorization. This gets you a large way to #2.

 2) How they create a learning environment for their people (Documenting
 Syslog, lessons learned from problems...etc)

Mentoring, shoulder surfing. Keep your senior people in the mix 
of triage  response so they don't get dull and cross-pollenate 
skills.  When someone is new, have their probationary period be 
shadowing the primary on-call the entire time.  Your third shift 
[or whatever spans your maintenance windows] should be the folks 
who actually wind up executing well-specified maintenances (with 
guidance as needed) and be the breeding ground of some of your 
better hands-on folks.

 3) Shift to Shift hand over procedures

This will depend on your systems for tickets, logbooks, etc. 
Sole that first and this should become evident.

 4) Manual tests  they start their day with and what they automate (common
 stuff)

This will vary on the business and what's on-site; I can't 
advise you to always include the genset is you don't have 
one.

 5) Change management best practices and working with operations/engineering
 when a change will be implemented

Standing maintenance windows (of varying severity if that 
matters yo your business), clear definition of what needs 
to be done only duringthose and what can be done anytime 
[hint: policy tuning shouldn't be restructed to them, and 
you shouldn't make it so an urgent things like a BGP leak 
can't be fixed].  Linear rather than parallel workflows 
for approval, and not too many approval stages else your 
staff will be spending time trying to get things through 
the administrative stages instead of actual work.  Very
simply, have a standard for specifying what needs to be 
done, the minimal tests needed to verify success, and how
you fallback if you fail the tests.  If someone can't 
specify it and insist on frobbing around, they likely don't 
understand the problem or the needed work.

Cheers,

Joe
-- 
 RSUC / GweepNet / Spunk / FnB / Usenix / SAGE



Re: NOC Best Practices

2010-07-17 Thread Xavier Banchon
What about e-TOM?  Is it better than ITIL V3?

Regards,

Xavier


Telconet S.A

-Original Message-
From: Joe Provo nanog-p...@rsuc.gweep.net
Date: Sat, 17 Jul 2010 14:56:04 
To: Kasper Adelkarim.a...@gmail.com
Reply-To: nanog-p...@rsuc.gweep.net
Cc: NANOG listnanog@nanog.org
Subject: Re: NOC Best Practices

On Fri, Jul 16, 2010 at 09:34:53PM +0300, Kasper Adel wrote:
 Thanks for all the people that replied off list, asking me to send them
 responses i will get.
[snip]
 Which is useful but i am looking for more stuff from the best people that
 run the best NOCs in the world.
 
 So i'm throwing this out again.
 
 I am looking for pointers, suggestions, URLs, documents, donations on what a
 professional NOC would have on the below topics:

A lot, as others have said, depending on the business, staffing, 
goals, SLA, contracts, etc.

 1) Briefly, how they handle their own tickets with vendors or internal

Run a proper ticketing system over which you have control (RT and 
friends rather than locking you into something you have to pay for 
changes).  Don't just by ticket closure rate, judge by succesfully 
resolving problems. Encourage folks to use the system for tracking 
projects and keeping notes on work in progress rather than private 
datastores. Inculcate a culture of open exploration to solve problems
rather than rote memorization. This gets you a large way to #2.

 2) How they create a learning environment for their people (Documenting
 Syslog, lessons learned from problems...etc)

Mentoring, shoulder surfing. Keep your senior people in the mix 
of triage  response so they don't get dull and cross-pollenate 
skills.  When someone is new, have their probationary period be 
shadowing the primary on-call the entire time.  Your third shift 
[or whatever spans your maintenance windows] should be the folks 
who actually wind up executing well-specified maintenances (with 
guidance as needed) and be the breeding ground of some of your 
better hands-on folks.

 3) Shift to Shift hand over procedures

This will depend on your systems for tickets, logbooks, etc. 
Sole that first and this should become evident.

 4) Manual tests  they start their day with and what they automate (common
 stuff)

This will vary on the business and what's on-site; I can't 
advise you to always include the genset is you don't have 
one.

 5) Change management best practices and working with operations/engineering
 when a change will be implemented

Standing maintenance windows (of varying severity if that 
matters yo your business), clear definition of what needs 
to be done only duringthose and what can be done anytime 
[hint: policy tuning shouldn't be restructed to them, and 
you shouldn't make it so an urgent things like a BGP leak 
can't be fixed].  Linear rather than parallel workflows 
for approval, and not too many approval stages else your 
staff will be spending time trying to get things through 
the administrative stages instead of actual work.  Very
simply, have a standard for specifying what needs to be 
done, the minimal tests needed to verify success, and how
you fallback if you fail the tests.  If someone can't 
specify it and insist on frobbing around, they likely don't 
understand the problem or the needed work.

Cheers,

Joe
-- 
 RSUC / GweepNet / Spunk / FnB / Usenix / SAGE


Re: NOC Best Practices

2010-07-17 Thread khatfield
eTOM is best regarded as a companion to ITIL practices. It has additional 
layers not covered by ITIL and vice versa.

I think a combination of practices from both is the best method. 

-Kevin
-Original Message-
From: Xavier Banchon xbanc...@telconet.net
Date: Sat, 17 Jul 2010 20:20:26 
To: nanog-p...@rsuc.gweep.net; Kasper Adelkarim.a...@gmail.com
Reply-To: xbanc...@telconet.net
Cc: NANOG listnanog@nanog.org
Subject: Re: NOC Best Practices

What about e-TOM?  Is it better than ITIL V3?

Regards,

Xavier


Telconet S.A

-Original Message-
From: Joe Provo nanog-p...@rsuc.gweep.net
Date: Sat, 17 Jul 2010 14:56:04 
To: Kasper Adelkarim.a...@gmail.com
Reply-To: nanog-p...@rsuc.gweep.net
Cc: NANOG listnanog@nanog.org
Subject: Re: NOC Best Practices

On Fri, Jul 16, 2010 at 09:34:53PM +0300, Kasper Adel wrote:
 Thanks for all the people that replied off list, asking me to send them
 responses i will get.
[snip]
 Which is useful but i am looking for more stuff from the best people that
 run the best NOCs in the world.
 
 So i'm throwing this out again.
 
 I am looking for pointers, suggestions, URLs, documents, donations on what a
 professional NOC would have on the below topics:

A lot, as others have said, depending on the business, staffing, 
goals, SLA, contracts, etc.

 1) Briefly, how they handle their own tickets with vendors or internal

Run a proper ticketing system over which you have control (RT and 
friends rather than locking you into something you have to pay for 
changes).  Don't just by ticket closure rate, judge by succesfully 
resolving problems. Encourage folks to use the system for tracking 
projects and keeping notes on work in progress rather than private 
datastores. Inculcate a culture of open exploration to solve problems
rather than rote memorization. This gets you a large way to #2.

 2) How they create a learning environment for their people (Documenting
 Syslog, lessons learned from problems...etc)

Mentoring, shoulder surfing. Keep your senior people in the mix 
of triage  response so they don't get dull and cross-pollenate 
skills.  When someone is new, have their probationary period be 
shadowing the primary on-call the entire time.  Your third shift 
[or whatever spans your maintenance windows] should be the folks 
who actually wind up executing well-specified maintenances (with 
guidance as needed) and be the breeding ground of some of your 
better hands-on folks.

 3) Shift to Shift hand over procedures

This will depend on your systems for tickets, logbooks, etc. 
Sole that first and this should become evident.

 4) Manual tests  they start their day with and what they automate (common
 stuff)

This will vary on the business and what's on-site; I can't 
advise you to always include the genset is you don't have 
one.

 5) Change management best practices and working with operations/engineering
 when a change will be implemented

Standing maintenance windows (of varying severity if that 
matters yo your business), clear definition of what needs 
to be done only duringthose and what can be done anytime 
[hint: policy tuning shouldn't be restructed to them, and 
you shouldn't make it so an urgent things like a BGP leak 
can't be fixed].  Linear rather than parallel workflows 
for approval, and not too many approval stages else your 
staff will be spending time trying to get things through 
the administrative stages instead of actual work.  Very
simply, have a standard for specifying what needs to be 
done, the minimal tests needed to verify success, and how
you fallback if you fail the tests.  If someone can't 
specify it and insist on frobbing around, they likely don't 
understand the problem or the needed work.

Cheers,

Joe
-- 
 RSUC / GweepNet / Spunk / FnB / Usenix / SAGE


Re: NOC Best Practices

2010-07-16 Thread Kasper Adel
Thanks for all the people that replied off list, asking me to send them
responses i will get.

I got nothing other than :
http://www.nanog.org/meetings/nanog24/abstracts.php?pt=OTM1Jm5hbm9nMjQ=nm=nanog24
and

Network Management-  Accounting and Performance Strategies - Just the first
three chapters

Which is useful but i am looking for more stuff from the best people that
run the best NOCs in the world.

So i'm throwing this out again.

I am looking for pointers, suggestions, URLs, documents, donations on what a
professional NOC would have on the below topics:

1) Briefly, how they handle their own tickets with vendors or internal
2) How they create a learning environment for their people (Documenting
Syslog, lessons learned from problems...etc)
3) Shift to Shift hand over procedures
4) Manual tests  they start their day with and what they automate (common
stuff)
5) Change management best practices and working with operations/engineering
when a change will be implemented

Should i be looking for ITIL stuff or its not any good?

Thanks,
Kim

On Wed, Jul 14, 2010 at 8:24 PM, Kasper Adel karim.a...@gmail.com wrote:

 Hello Everyone,

 I am currently working on building a NOC so i'm looking for
 materials/pointers to Best Practices documented out there.

 On the top of my head are things like:

 1) Documenting Incidents and handling them
 2) Documenting Syslog messages
 3) Documenting Vendor Software Bugs
 4) Shift to Shift Hand over procedures
 5) Commonly used scripts for monitoring
 6) Frequently testing High Availability
 7) Capturing config changes.
 etc

 I can see that this is years of experience but i am wondering if any of
 this was captured some where.

 Thanks,
 Kim



Re: NOC Best Practices

2010-07-16 Thread JoeSox
I believe, myself included, are hesitant to answer because it really
depends upon a lot of variables. Type of business your NOC is running,
the operating budget, number of racks, etc.
The details matter when narrowing things down.

But yes, I have seen this ITIL
http://www.frontrange.com/
click the Register for a Free ITIL Success Kit!

You may be interested in.
--
Thanks, Joe

On Fri, Jul 16, 2010 at 11:34 AM, Kasper Adel karim.a...@gmail.com wrote:
 Thanks for all the people that replied off list, asking me to send them
 responses i will get.

 I got nothing other than :
 http://www.nanog.org/meetings/nanog24/abstracts.php?pt=OTM1Jm5hbm9nMjQ=nm=nanog24
 and

 Network Management-  Accounting and Performance Strategies - Just the first
 three chapters

 Which is useful but i am looking for more stuff from the best people that
 run the best NOCs in the world.

 So i'm throwing this out again.

 I am looking for pointers, suggestions, URLs, documents, donations on what a
 professional NOC would have on the below topics:

 1) Briefly, how they handle their own tickets with vendors or internal
 2) How they create a learning environment for their people (Documenting
 Syslog, lessons learned from problems...etc)
 3) Shift to Shift hand over procedures
 4) Manual tests  they start their day with and what they automate (common
 stuff)
 5) Change management best practices and working with operations/engineering
 when a change will be implemented

 Should i be looking for ITIL stuff or its not any good?

 Thanks,
 Kim

 On Wed, Jul 14, 2010 at 8:24 PM, Kasper Adel karim.a...@gmail.com wrote:

 Hello Everyone,

 I am currently working on building a NOC so i'm looking for
 materials/pointers to Best Practices documented out there.

 On the top of my head are things like:

 1) Documenting Incidents and handling them
 2) Documenting Syslog messages
 3) Documenting Vendor Software Bugs
 4) Shift to Shift Hand over procedures
 5) Commonly used scripts for monitoring
 6) Frequently testing High Availability
 7) Capturing config changes.
 etc

 I can see that this is years of experience but i am wondering if any of
 this was captured some where.

 Thanks,
 Kim





NOC Best Practices

2010-07-14 Thread Kasper Adel
Hello Everyone,

I am currently working on building a NOC so i'm looking for
materials/pointers to Best Practices documented out there.

On the top of my head are things like:

1) Documenting Incidents and handling them
2) Documenting Syslog messages
3) Documenting Vendor Software Bugs
4) Shift to Shift Hand over procedures
5) Commonly used scripts for monitoring
6) Frequently testing High Availability
7) Capturing config changes.
etc

I can see that this is years of experience but i am wondering if any of this
was captured some where.

Thanks,
Kim