Re: Ahoy, SLA boffins!

2009-07-28 Thread Patrick W. Gilmore

On Jul 29, 2009, at 12:34 AM, Bill Woodcock wrote:

So I've embarked on the no-doubt-futile task of trying to interpret  
SLAs as empirically-verifiable technical specifications, rather than  
as marketing blather.  And there's something that I'm finding  
particularly puzzling:


In most SLAs, there seem to be two separate guarantees proffered:  
one concerning "network availability" and one concerning "packet  
loss."  Now, if I were to put my engineer hat on, and try to  
_imagine_ what the difference might be, I might imagine "network  
availability" to have something to do with layer-2 link status being  
presented as "up," while packet loss would be the percentage of  
packets dropped.  But when I actually read SLAs, "network  
availability" is generally defined as the portion of the month that  
the path from the customer's local loop to the transit or peering  
routers was "available" to transmit packets.  Packet loss, on the  
other hand, is generally defined as the portion of packets which are  
lost while crossing that exact same piece of network.


Now, what am I missing here?  Is this one of those Heisenberg  
things, where "network availability" is the time the network _could  
have_ delivered a packet _when you weren't actually doing so_, while  
"packet loss" is the time the network _couldn't_ deliver a packet  
when you _were_ actually doing so?


Is "network availability" inherently unmeasurable on a network  
that's less than 100% utilized?


Am I over-thinking this?


Yes.  But not because you are coming to strange conclusions, but  
because (as you say in your first sentence), you are trying to put  
empirical / objective meaning to marketing blather.


I had a simple way to fix this.  I defined a network as "down" with  
more than X% packet loss (usually with X in the 2-5 range, depending  
on other deal parameters).  IMHO, a network with 5% packet loss -is-  
down.  I don't know about you, but none of my customers will use my  
service if they have 5% loss.  TCP is finicky!  This receives the  
strongest credit because you cannot use the service.


Below X, you are not "down", just degraded, and therefore the link has  
some utility, but not 100% utility.  This receives a credit, but not  
as strong a credit as being unable to use a link.


Oh, and, of course, if the there is no light on the fiber, then we are  
(obviously) "down" as well.


Make sense?

Or I am over-thinking it? :)

--
TTFN,
patrick

P.S. Now you get to think about things like "packet loss to / from  
where?" and whether the last mile should count.



Seriously, though, I know there are people who don't consider SLAs  
to be fantasy-fiction, and some of them must not be innumerate, and  
some subset of those must be on NANOG, and the intersection set  
might be equal to or greater than one, right?  Can anybody explain  
this to me in a way I can translate into code, while still taking  
myself seriously?


   -Bill









Re: Ahoy, SLA boffins!

2009-07-29 Thread Michael Dillon
> Am I over-thinking this?

Yes, I think so. Often a large component of an SLA is related to the
cost of compliance versus the cost of the penalty imposed. If it is
cheaper to pay the occasional penalty, rather than construct the
network to meet the SLA, then the network operator will often make a
purely sales/marketing decision to use the SLA without including
engineering/OPS in the discussion.

Also, the wording often refers to unplanned downtime so that any
planned downtime doesn't get counted in the non-availability measure.
And sometimes you find some allowance for packet drop during a limited
time period so that if you drop a thousand packets, it doesn't count
if it happens during the peak hour of the day or if all packets are
dropped in a few minutes timeframe.

Another limitation that I have seen refers to "core" network or "core"
PoPs meaning the part of the network in the major market area
(generally the USA and Western Europe) but not covering network or
PoPs in "fringe" areas.

I don't believe that there is any hard science behind SLAs and that
most engineering/OPS teams don't even know what are the actual SLAs
being given to customers. There are engineering targets that are
sometimes referred to as SLAs but they are not the Service Level
Agreement that is in signed customer contracts.

All that aside, it would be interesting to see some standards for
measuring and reporting things like "network availability" from an
engineering point of view.

--Michael Dillon



RE: Ahoy, SLA boffins!

2009-07-29 Thread Andreas, Rich
Bill,
To be brief, but hopefully not too fleeting, the majority of the
standards orgs - ITU, MEF -  use packet loss to derive availability.
Loss% = the % of packets which were transmitted but not received by the
destination host.  As for availability, loss is measured across some
time period.  If during that period X% of the transmitted  packets were
NOT lost, then the network is said to be available.  Typically a 20%
figure is used, e.g. if 20% of the packets transmitted during a 5-minute
period were received then the network is said to be 100% Available for
that 5-minute time period.  Some Carriers have taken this to the extreme
to say that if at least 1 packet was successfully transmitted then the
network was 100% Available for the time period.  

Loss is a measure of the networks usability, Availability is ...??
(Meaningless??)  What utility does a network have that is "Available"
yet sustaining a loss rate which renders it inoperable? 

Rich
 
 
-Original Message-
From: Bill Woodcock [mailto:wo...@pch.net] 
Sent: Wednesday, July 29, 2009 12:34 AM
To: nanog
Subject: Ahoy, SLA boffins!


So I've embarked on the no-doubt-futile task of trying to interpret  
SLAs as empirically-verifiable technical specifications, rather than  
as marketing blather.  And there's something that I'm finding  
particularly puzzling:

In most SLAs, there seem to be two separate guarantees proffered: one  
concerning "network availability" and one concerning "packet loss."   
Now, if I were to put my engineer hat on, and try to _imagine_ what  
the difference might be, I might imagine "network availability" to  
have something to do with layer-2 link status being presented as "up,"  
while packet loss would be the percentage of packets dropped.  But  
when I actually read SLAs, "network availability" is generally defined  
as the portion of the month that the path from the customer's local  
loop to the transit or peering routers was "available" to transmit  
packets.  Packet loss, on the other hand, is generally defined as the  
portion of packets which are lost while crossing that exact same piece  
of network.

Now, what am I missing here?  Is this one of those Heisenberg things,  
where "network availability" is the time the network _could have_  
delivered a packet _when you weren't actually doing so_, while "packet  
loss" is the time the network _couldn't_ deliver a packet when you  
_were_ actually doing so?

Is "network availability" inherently unmeasurable on a network that's  
less than 100% utilized?

Am I over-thinking this?

Seriously, though, I know there are people who don't consider SLAs to  
be fantasy-fiction, and some of them must not be innumerate, and some  
subset of those must be on NANOG, and the intersection set might be  
equal to or greater than one, right?  Can anybody explain this to me  
in a way I can translate into code, while still taking myself seriously?

 -Bill








Re: Ahoy, SLA boffins!

2009-07-29 Thread Leo Bicknell

I think the desired goal here is to separate the access SLA from
the backbone SLA.  That is, consider a simple picture:


Network Cloud--Provider Edge Router-Local Loop-Customer Router

Network availability is the % of the time the customer router and
provider edge router can communicate, and is designed to measure
if the local loop is up.  For instance, let's say the provider edge
router looses all its uplinks to the Network Cloud, your local loop
is up and functioing but you have 100% packet loss to all destinations.

The "packet loss" SLA kicks in on a per-destination basis.  Everything
is up and working, but the provider has a full circuit and is
dropping 20% of the packets on that link.  You catch it, you get a
credit.

I think the technical reason why these are separate has to do with
the expectations.  If my local loop is dropping 0.5% of the packets
due to errors, it is broken and must be fixed.  If some random
destination on the Internet is dropping 0.5% of the packets well,
that's a normal day in the life of the network.  Plus, if your local
loop takes errors then you get a credit.  However, if there's a
full link in the backbone but none of your packets take it, and
thus you are unaffected, you don't.

Now, having said all that, and having been one of the people who've
attempted to communicate sane, rational, technical ideas to marketing
and legal the chance that anything sane made it in the actual contract
is, well, nil.

-- 
   Leo Bicknell - bickn...@ufp.org - CCIE 3440
PGP keys at http://www.ufp.org/~bicknell/


pgpmHITc36X9A.pgp
Description: PGP signature


Re: Ahoy, SLA boffins!

2009-07-29 Thread Michael Dillon
> Now, having said all that, and having been one of the people who've
> attempted to communicate sane, rational, technical ideas to marketing
> and legal the chance that anything sane made it in the actual contract
> is, well, nil.

I disagree.

If someone takes the trouble to publish a technical document describing a
sane technical way to measure a network SLA, and they also provide code
for measuring/calculating the SLA, then there is a good chance that the
industry will pick it up.

Look at 95th percentile billing. Dave Rand at Abovenet thought it up,
probably to
simplify the billing process and keep billing overhead costs down. Then UUNet
picked it up and suddenly just about everyone was offering a 95th percentile
billing model.

-- Michael Dillon



Re: Ahoy, SLA boffins!

2009-07-29 Thread Net
Aawaw

On 7/29/09, Bill Woodcock  wrote:
>
> So I've embarked on the no-doubt-futile task of trying to interpret
> SLAs as empirically-verifiable technical specifications, rather than
> as marketing blather.  And there's something that I'm finding
> particularly puzzling:
>
> In most SLAs, there seem to be two separate guarantees proffered: one
> concerning "network availability" and one concerning "packet loss."
> Now, if I were to put my engineer hat on, and try to _imagine_ what
> the difference might be, I might imagine "network availability" to
> have something to do with layer-2 link status being presented as "up,"
> while packet loss would be the percentage of packets dropped.  But
> when I actually read SLAs, "network availability" is generally defined
> as the portion of the month that the path from the customer's local
> loop to the transit or peering routers was "available" to transmit
> packets.  Packet loss, on the other hand, is generally defined as the
> portion of packets which are lost while crossing that exact same piece
> of network.
>
> Now, what am I missing here?  Is this one of those Heisenberg things,
> where "network availability" is the time the network _could have_
> delivered a packet _when you weren't actually doing so_, while "packet
> loss" is the time the network _couldn't_ deliver a packet when you
> _were_ actually doing so?
>
> Is "network availability" inherently unmeasurable on a network that's
> less than 100% utilized?
>
> Am I over-thinking this?
>
> Seriously, though, I know there are people who don't consider SLAs to
> be fantasy-fiction, and some of them must not be innumerate, and some
> subset of those must be on NANOG, and the intersection set might be
> equal to or greater than one, right?  Can anybody explain this to me
> in a way I can translate into code, while still taking myself seriously?
>
>  -Bill
>
>
>
>
>



RE: Ahoy, SLA boffins!

2009-07-29 Thread Holmes,David A
We use the BRIX active measurement system (BRIX now owned by EXFO) which
gathers round trip time, packet loss, and jitter randomly every minute
24x7x365 for our major backbone links to calculate SLAs. "Network
Availability" can be measured empirically using BRIX calculated values
of packet loss, and expressed in terms of #9's, which BRIX will also
calculate over any time period for which BRIX historical data is being
kept. BRIX historical data is kept on an embedded Oracle data base. BRIX
usually runs on a Solaris SMP server.   

-Original Message-
From: Bill Woodcock [mailto:wo...@pch.net] 
Sent: Tuesday, July 28, 2009 9:34 PM
To: nanog
Subject: Ahoy, SLA boffins!


So I've embarked on the no-doubt-futile task of trying to interpret SLAs
as empirically-verifiable technical specifications, rather than as
marketing blather.  And there's something that I'm finding particularly
puzzling:

In most SLAs, there seem to be two separate guarantees proffered: one  
concerning "network availability" and one concerning "packet loss."   
Now, if I were to put my engineer hat on, and try to _imagine_ what the
difference might be, I might imagine "network availability" to have
something to do with layer-2 link status being presented as "up,"  
while packet loss would be the percentage of packets dropped.  But when
I actually read SLAs, "network availability" is generally defined as the
portion of the month that the path from the customer's local loop to the
transit or peering routers was "available" to transmit packets.  Packet
loss, on the other hand, is generally defined as the portion of packets
which are lost while crossing that exact same piece of network.

Now, what am I missing here?  Is this one of those Heisenberg things,
where "network availability" is the time the network _could have_
delivered a packet _when you weren't actually doing so_, while "packet
loss" is the time the network _couldn't_ deliver a packet when you
_were_ actually doing so?

Is "network availability" inherently unmeasurable on a network that's
less than 100% utilized?

Am I over-thinking this?

Seriously, though, I know there are people who don't consider SLAs to be
fantasy-fiction, and some of them must not be innumerate, and some
subset of those must be on NANOG, and the intersection set might be
equal to or greater than one, right?  Can anybody explain this to me in
a way I can translate into code, while still taking myself seriously?

 -Bill







Re: Ahoy, SLA boffins!

2009-07-29 Thread William Herrin
On Wed, Jul 29, 2009 at 12:34 AM, Bill Woodcock wrote:
> Am I over-thinking this?

The SLA's I've looked at promise me that if their service is hard down
for a week (with no ambiguity whatsoever) they'll credit my bill for
upwards of 2% of the $50k/year or so I spend on the Internet
connection for my mutli-million dollar online service.

So yeah, you're overthinking it. When they start coupling those SLAs
with some sort of serious business loss insurance, then paying
attention to the SLA and carefully examining what constitutes failure
may make some kind sense at a technical level.

Regards,
Bill Herrin


-- 
William D. Herrin  her...@dirtside.com  b...@herrin.us
3005 Crane Dr. .. Web: 
Falls Church, VA 22042-3004



Re: Ahoy, SLA boffins!

2009-07-29 Thread JC Dill

William Herrin wrote:

On Wed, Jul 29, 2009 at 12:34 AM, Bill Woodcock wrote:
  

Am I over-thinking this?



The SLA's I've looked at promise me that if their service is hard down
for a week (with no ambiguity whatsoever) they'll credit my bill for
upwards of 2% of the $50k/year or so I spend on the Internet
connection for my mutli-million dollar online service.
  


I'm really surprised anyone considers this an SLA, or anything special 
in a business contract.  I automatically expect to get a credit of 
1.923% if the service were not provided for a period of 168 hours, no 
questions asked and no SLA required. 

When service is simply not provided, there's nothing special about not 
having to pay for it.  I don't know of any business where you can have a 
contract that requires you to pay your monthly/annual fee for services 
when said services are not provided.  If you have a housekeeping or lawn 
service that is supposed to come once a week, and you have an annual 
contract with them for this service at $50/week, and they miss a week 
(provide no service) you don't pay them anyway for that missed week.  
You don't need an SLA in your contract with them to have this right to 
withhold payment for the period of time when the services are not 
provided *at all*.


An SLA comes into play when a service is degraded below the quality you 
contracted for.  What credit do they give you when you have 168 hours of 
degraded service, e.g. 50% of the service level you specified in your 
RFQ?  That's where your SLA comes in.  The SLA specifies at what point 
your service is considered "degraded" (how much below the contracted 
service level, and how long of a time period is required before it is 
considered below grade) and what $credit you may receive when you are 
provided some service, but not to the level specified in your contract.


jc




Re: Ahoy, SLA boffins!

2009-07-29 Thread William Herrin
On Wed, Jul 29, 2009 at 4:19 PM, JC Dill wrote:
> William Herrin wrote:
>> The SLA's I've looked at promise me that if their service is hard down
>> for a week (with no ambiguity whatsoever) they'll credit my bill for
>> upwards of 2% of the $50k/year or so I spend on the Internet
>> connection for my mutli-million dollar online service.

> An SLA comes into play when a service is degraded below the quality you
> contracted for.  What credit do they give you when you have 168 hours of
> degraded service, e.g. 50% of the service level you specified in your RFQ?
>  That's where your SLA comes in.  The SLA specifies at what point your
> service is considered "degraded" (how much below the contracted service
> level, and how long of a time period is required before it is considered
> below grade) and what $credit you may receive when you are provided some
> service, but not to the level specified in your contract.

Hi JC,

Perhaps you miss my point: what the ISP is offering to pay me as a
result of a failure to deliver adequate service is so much less than
my loss for the same as to render the payment meaningless. I'm gonna
terminate the contract for nonperformance and hire someone who can get
the job done long before its worth my time to chase you for an
SLA-based service credit. And we both know it. The only way I ever
chase you for an SLA credit is I'm playing the blame game instead of
doing my job for my customers.

Regards,
Bill Herrin


-- 
William D. Herrin  her...@dirtside.com  b...@herrin.us
3005 Crane Dr. .. Web: 
Falls Church, VA 22042-3004



Re: Ahoy, SLA boffins!

2009-07-29 Thread Stephen Sprunk
JC Dill wrote:
> William Herrin wrote:
>> The SLA's I've looked at promise me that if their service is hard
>> down for a week (with no ambiguity whatsoever) they'll credit my bill
>> for upwards of 2% of the $50k/year or so I spend on the Internet
>> connection for my mutli-million dollar online service. 
>
> I'm really surprised anyone considers this an SLA, or anything special
> in a business contract.  I automatically expect to get a credit of
> 1.923% if the service were not provided for a period of 168 hours, no
> questions asked and no SLA required.
>
> When service is simply not provided, there's nothing special about not
> having to pay for it.

Read your contract closely and you'll find that, except for an explicit
SLA clause (which will cost you extra), they make no guarantee that the
circuit will work at all and you'll still owe them money.  On top of
that, the SLA payouts are usually capped at an amount _less_ than the
price increase due to demanding an SLA.  If your circuit costs $2k/mo,
and it's down for an entire month, you'll probably still owe them at
least $1500 for that non-service -- and you could buy a non-SLA service
for the same $1500/mo.

(Savvy customers who are spending big bucks know how to negotiate these
terms to be more favorable, but most customers aren't savvy unless
they've already been burned by this.)

S

-- 
Stephen Sprunk "God does not play dice."  --Albert Einstein
CCIE #3723 "God is an inveterate gambler, and He throws the
K5SSSdice at every possible opportunity." --Stephen Hawking



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Ahoy, SLA boffins!

2009-07-30 Thread JC Dill

Stephen Sprunk wrote:

Read your contract closely and you'll find that, except for an explicit
SLA clause (which will cost you extra), they make no guarantee that the
circuit will work at all and you'll still owe them money.  


I am not a lawyer.  However, over the years many lawyers have told me 
you can't have a legally enforcible contract that says (in essence) you 
owe me money even if I give you absolutely nothing in exchange (or visa 
versa).  A legally enforcible contract must *always* have an exchange of 
consideration - I give you something (money, labor, tangible property, 
intangible property) in exchange for something you give me. 

Many businesses try this type of crap all the time, but (according to 
the above mentioned lawyers) it's not worth the paper it is written on.  
They make these clauses hoping the other party doesn't know their 
rights.  However, contract law (e.g. the UCC) trumps unenforcible and 
illegal clauses in your contracts (this is why we *have* civil laws 
regarding civil contracts, otherwise there would be no point in civil 
laws at all).  But please don't take my word for it, ask your own lawyer 
to review your contract and give you an opinion about the legality and 
enforceability of clauses of this type, in your particular contract.


jc




RE: Ahoy, SLA boffins!

2009-07-30 Thread Leigh Porter

Indeed, that's why some companies have contracts managers with experience of 
thieving gits who try to rip you off on SLAs. We indeed have been burned and so 
our contracts worth any money now have real good incentives for the vendors to 
come up with the goods and make what they sell us work. Even though, sometimes 
important stuff gets dropped because the vendor refuses to be bound by it, and 
then, we get screwed over it. 

--
Leigh



-Original Message-
From: Stephen Sprunk [mailto:step...@sprunk.org]
Sent: Wed 7/29/2009 10:52 PM
To: JC Dill
Cc: North American Noise and Off-topic Gripes
Subject: Re: Ahoy, SLA boffins!
 
JC Dill wrote:
> William Herrin wrote:
>> The SLA's I've looked at promise me that if their service is hard
>> down for a week (with no ambiguity whatsoever) they'll credit my bill
>> for upwards of 2% of the $50k/year or so I spend on the Internet
>> connection for my mutli-million dollar online service. 
>
> I'm really surprised anyone considers this an SLA, or anything special
> in a business contract.  I automatically expect to get a credit of
> 1.923% if the service were not provided for a period of 168 hours, no
> questions asked and no SLA required.
>
> When service is simply not provided, there's nothing special about not
> having to pay for it.

Read your contract closely and you'll find that, except for an explicit
SLA clause (which will cost you extra), they make no guarantee that the
circuit will work at all and you'll still owe them money.  On top of
that, the SLA payouts are usually capped at an amount _less_ than the
price increase due to demanding an SLA.  If your circuit costs $2k/mo,
and it's down for an entire month, you'll probably still owe them at
least $1500 for that non-service -- and you could buy a non-SLA service
for the same $1500/mo.

(Savvy customers who are spending big bucks know how to negotiate these
terms to be more favorable, but most customers aren't savvy unless
they've already been burned by this.)

S

-- 
Stephen Sprunk "God does not play dice."  --Albert Einstein
CCIE #3723 "God is an inveterate gambler, and He throws the
K5SSSdice at every possible opportunity." --Stephen Hawking




Re: Ahoy, SLA boffins!

2009-07-30 Thread Matthew Petach
On 7/29/09, William Herrin  wrote:
>  Perhaps you miss my point: what the ISP is offering to pay me as a
>  result of a failure to deliver adequate service is so much less than
>  my loss for the same as to render the payment meaningless. I'm gonna
>  terminate the contract for nonperformance and hire someone who can get
>  the job done long before its worth my time to chase you for an
>  SLA-based service credit. And we both know it. The only way I ever
>  chase you for an SLA credit is I'm playing the blame game instead of
>  doing my job for my customers.

Actually, SLA credits are useful in cases where it's not the only path
between two sites; if, for example, you have 12 OC192 links running
across the US, but your peak traffic on them doesn't exceed 80Gb
combined, having an OC192 down for a day or two won't really hurt
you; there's no reason to cancel the circuit, the rest of your links are
carrying the traffic just fine, but since one of the links failed to meet
its SLA, you might as well push the vendor to give you the SLA
credit back; it saves you some money, you have no lost customers,
you have no other impact to your business.  It's not about playing
the "blame game", it's about giving the vendor an incentive to try
to run their system a bit more reliably.

Now, for single-homed customers depending on that one link,
I agree, an SLA is largely meaningless compared to the impact
of being down.  But there's many cases where the SLA is
meaningful, and collecting SLA credits is worth it, without
there being a corresponding massive loss in revenue
associated with the outage.

Matt