Re: [lopsa-discuss] change SLA and rates

Tom Limoncelli Wed, 19 Nov 2008 08:06:24 -0800

On Tue, Nov 18, 2008 at 8:41 PM,  <[EMAIL PROTECTED]> wrote:
> On Tue, 18 Nov 2008, Tom Limoncelli wrote:
>
>> On Tue, Nov 18, 2008 at 4:50 PM,  <[EMAIL PROTECTED]> wrote:
>>>
>>> On Tue, 18 Nov 2008, Tom Limoncelli wrote:
>>>
>>>> On Tue, Nov 18, 2008 at 12:36 PM,  <[EMAIL PROTECTED]> wrote:
>>>>>
>>>>> my security team is currently recieving ~500 change tickets a month
>>>>> (not
>>>>> counting patching, upgrades, etc) with a 2 business day SLA to complete
>>>>> them. we are getting a lot of people screaming that we should be more
>>>>> responsive and implement the changes faster.
>>>>>
>>>>> I'd like to hear from other folks as to what sort of change rate and
>>>>> schedule is considered reasonable for large orginizations. I'm
>>>>> especially
>>>>> interested in hearing from anyone in the financial sector.
>>>>
>>>> Can the tickets be categorized into 5-15 different "types" so that you
>>>> can assign different SLAs to different requests types?
>>>
>>> we've done a little of this (seperating out 'research this' 'write a new
>>> script to do that' from 'make a new firewall hole for this other thing'),
>>> but we are trying to find out what is considered 'reasonable' SLAs for
>>> different types of things
>>
>> Have you interviewed customers about what they'd want in an SLA?  They
>> probably have a model of what they consider small, medium, and large
>> requests.  If you can match the perceived amount of time that
>> something should take to the actual time, people will be happy.
>> Typically it looks something like:
>> 1. small tasks that are blocking them from doing another task (add 1
>> machine to a pre-existing ruleset, resetting a password so that a
>> person can get her work done, etc.): Expectation: immediate
>
> this is where most of the tickets fall. individually they will take about 5
> min to do (plus another 5+ min with the ticketing system, but that's a
> seperate problem).


Why is there so much overhead for only doing one?  That's a rhetorical
question, actually.  I'm guessing that it takes that long to
authenticate with that particular firewall, pull down the config, edit
it, run some smoke tests, upload it, and then do Q.A.   If the editing
is 1 or 10 new entries the overhead is the killed.

Maybe there are some ways to improve the process overhead.  Just off
the top of my head I'm thinking:

1.  Is there any way to automate the process?
./getfwconfig.py customername
vi customername.txt
./pushfwconfig.py customername

2.  If the ACLs were structured to use host groups, these updates
might be a matter of just adding IP addresses to a group.  That might
make it easier to automate the updates.

./addhost.py  customername listname hostname [hostname2] [hostname3]

3.  If the ACLs are structured to be a scaffold that is similar for
all customers with just the lists being different, then the lists
could be stored in a database (or text file) and configurations could
be generated and (if a change happens) pushed to the firewall
automatically.

$ cat customername.txt
main-net: web1 web2 web3
prod-net: smtp1 smtp2 web4
$ make
Change detected.
Generating new ACLs.
Pushing to firewall customername

Once confidence has been established in such a system, one could
imagine running it hourly.  If a customer's ACLs do actually change,
it would update their firewall and email a notification to the
customer (heck, it could close the ticket automatically too).

If that is a success, the next step would be to add self-service
options for those kind of updates.  Yes, vigorous tests would validate
the input  (is it on the right subnet, is there a DNS entry, etc) and
a human might be required for the final approval.  However, clicking
"ok" is a lot better than 5 minutes of work.

In fact, now that I've gotten to the end of this email, I realize that
the best way to deal with the 5 minute overhead is to factor it out.
The work to close a ticket should be adding/removing the hostname to a
list in a text file, database, or even LDAP group.  Done.  The actual
updates would be asyncronously happening in the background, with
automatic testing, ticket closure and customer notification.  Now your
helpdesk people are simply validating in puts and manipulating lists.
Customers will still see the change being made within the 24 hour time
period.

Tom
_______________________________________________
Discuss mailing list
[email protected]
http://lopsa.org/cgi-bin/mailman/listinfo/discuss
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Re: [lopsa-discuss] change SLA and rates

Reply via email to