Re: trollage (Re: Akamai server reliability)

2005-12-04 Thread Edward B. Dreger

CO> Date: Mon, 28 Nov 2005 14:57:58 -0600 (CST)
CO> From: Chris Owen

CO> However, I do think Akamai would be better off getting their issues with
CO> their replacement boxes straightened out.  I agree that we get value for
CO> having the boxes on our network (and so do they lets not forget).

*shrug*

It's not that expensive to ship boxen back and forth, and I'd hazard a 
guess they have people who troubleshoot the dead en masse.  If a dead 
box costs $50, the question becomes how much more would prolonging box 
death cost?


CO> However, it is a bit frustrating to replace the same box 3 times in less

Heh.  Never had _that_ bad, personally.


CO> than a month.  Hauling a box down to the colo is no big deal but when the

Depends.  In Kansas, no.  In $big_metro_area during rush hour... well, 
I've learned why people state "distance" in terms of hours. :-)


CO> box you are taking down there has a dead CPU fan and two dead case fans
CO> it's hard not to think you might be wasting your time.

True.  So if the CPU fan is dead, just say the box is plugged in; act 
surprised when doesn't ping. ;-)


CO> It isn't just that they are wasting my time.  They are also wasting their
CO> own time.  It's the overall lack efficiency that bothers me ;-]

There are enough clue-challenged networks that I wouldn't want arbitrary 
people playing around with my gear.  Shipping can be more efficient.


Eddy
--
Everquick Internet - http://www.everquick.net/
A division of Brotsman & Dreger, Inc. - http://www.brotsman.com/
Bandwidth, consulting, e-commerce, hosting, and network building
Phone: +1 785 865 5885 Lawrence and [inter]national
Phone: +1 316 794 8922 Wichita

DO NOT send mail to the following addresses:
[EMAIL PROTECTED] -*- [EMAIL PROTECTED] -*- [EMAIL PROTECTED]
Sending mail to spambait addresses is a great way to get blocked.
Ditto for broken OOO autoresponders and foolish AV software backscatter.


Re: Akamai server reliability

2005-11-29 Thread Ryan Dobrynski

i see lots of hardware failures. one particular problem i have noted is
just kind of a bad case layout. the network card sits in a little pci
riser card and is also bolted to the back of the case. it likes to unseat
itself over time. eventually the MB dies. i'm not sure if one has anything
to do with the other but i suspect it does because if i pull the screws
out of the network card and just let it sti in the pci riser it doesnt
unseat itself and the box tends to live a bit longer.

but yes. i have more failures of akamai boxes on my network then anything
by a long shot. chalk it up to cheap hardware i guess.



On Mon, 28 Nov
2005, Roy wrote:

> Date: Mon, 28 Nov 2005 10:39:51 -0800
> From: Roy <[EMAIL PROTECTED]>
> To: nanog@merit.edu
> Subject: Akamai server reliability
>
>
> Hi,
>
> Many moons ago, we got a set of Akamai servers.  Over the years I think
> they replaced every one of them at least once.  Last August we got a
> another set of servers due to a move and now two of those three servers
> have failed.
>
> I still have the original server that started garlic.com in production
> after 11+ years so I know servers can last a long time.  I don't
> understand why Akamai failure rates are so high
>
> Is anyone else seeing high failure rates of Akamai servers at their
> facilities?
>
> Roy
>
>
>
>
>

Ryan Dobrynski
Hat-Swapping Gnome
Choice Communications



(\_/)
(O.o)
(> < ) this is Bunny. Copy Bunny into your signature to help him on his way to 
world domination.



Re: Akamai server reliability

2005-11-29 Thread Michael . Dillon

> I still have the original server that started garlic.com in production 
> after 11+ years so I know servers can last a long time.  I don't 
> understand why Akamai failure rates are so high

Applications which cause the disk to thrash will wear out
disk drives much more quickly than non-thrashing applications.
When I still ran USENET news servers back before cyclic file 
systems were used, I remember that their hard drives died frequently,
often after less than a year of service, but those drives were 
thrashing 24 by 7. You can hear drives thrashing and feel it 
by touching the case. It is caused by almost completely random 
access resulting in almost constant head movement.

It is cost effective to just thrash cheap drives and
replace them when they die.

--Michael Dillon



Re: trollage (Re: Akamai server reliability)

2005-11-29 Thread Gadi Evron


To quote a science fiction story I'm fond of, "efficiency depends on 
what you want to effish".  


--Steven M. Bellovin, http://www.cs.columbia.edu/~smb


Sci-fi injection!

(marking another beer owed)

Gadi.


Re: trollage (Re: Akamai server reliability)

2005-11-28 Thread Simon Lyall

On Mon, 28 Nov 2005, Pete Templin wrote:
> And hopefully they'll (someday) send servers in my direction - is their
> "minimum criteria" creeping upwards at the same rate as overall Internet
> traffic did in the late 90s?

The impression I got was they originally scattered their machines to
everyone who had a network with a growth plan and bought them a beer. Some
people even got/get paid to host them.

After the .com crash they started being a bit more careful about who they
gave them to and doing a bit more analysis as whether a new site was worth
the trouble.

One way to get a cluster might be to suggest that your will make better
use of it than a nearby company with a cluster that is much smaller than
you. I have heard of people trying this in Australia, no idea how well it
works.

I know people who were doing under 10Mb/s via their clusters, but they are
in Aus/NZ so the threshold might be higher elsewhere.

-- 
Simon J. Lyall  |  Very Busy  |  Web: http://www.darkmere.gen.nz/
"To stay awake all night adds a day to your life" - Stilgar | eMT.



Re: trollage (Re: Akamai server reliability)

2005-11-28 Thread Jon Lewis


On Mon, 28 Nov 2005, Deepak Jain wrote:

I'm sorry, isn't that exactly what an airbill *is* paying for -- to get the 
equipment on site?


They also frequently need boxes power cycled.  It got to be so frequent 
that we "gave them" a remote reboot switch for all their gear and told 
them how to use it.  They still kept emailing us for reboots until I 
finally used a contact at akamai to get the remote reboot info properly 
placed.


We've had our share of failed boxes, DOA boxes, boxes with components 
literally falling out of them on arrival, etc.  I suspect it's just a sign 
of the box building having been farmed out to the cheapest available 
source.  When you're building boxes in really large volume, what's a few 
missing screws here there? :)


The man hours (really, we are talking about less than a single hour to 
replace a server including all the mounting and repacking). The one man hour 
that they need (no more than 6 a year by the look of it) should offset the 
value the ISP is getting from not buying bandwidth to get to the content and 
for the improved performance they get.


I wouldn't count on that.  With bandwidth prices continually falling, and 
the ISP business changing (at least for us, dialup/DSL is dying, hosting 
is taking off, and now instead of having spare outbound capacity to sell 
to Akamai), we do more outbound than inbound, so the servers really don't 
save us anything except maybe a bit of latency.


If that model doesn't work for the ISP in question, they should ask Akamai to 
pull their gear.


Think of the man hours that'd take, ripping them out, boxing them up, etc. 
:)


--
 Jon Lewis   |  I route
 Senior Network Engineer |  therefore you are
 Atlantic Net|
_ http://www.lewis.org/~jlewis/pgp for PGP public key_


Re: trollage (Re: Akamai server reliability)

2005-11-28 Thread Steven M. Bellovin

In message <[EMAIL PROTECTED]>, Randy Bush writes:
>
>> It isn't just that they are wasting my time.  They are also wasting their
>> own time.  It's the overall lack efficiency that bothers me ;-]
>
>i suspect you have a datapoint on how they're doing financially.
>they ain't stoopid.  they'll deal with it when the cost/benefit
>gets high enough on their priority list.  isn't the first time
>that good s&m covers some technical gaps, and won't be the last.
>
To quote a science fiction story I'm fond of, "efficiency depends on 
what you want to effish".  

--Steven M. Bellovin, http://www.cs.columbia.edu/~smb




Re: trollage (Re: Akamai server reliability)

2005-11-28 Thread Pete Templin


Deepak Jain wrote:

If that model doesn't work for the ISP in question, they should ask 
Akamai to pull their gear.


And hopefully they'll (someday) send servers in my direction - is their 
"minimum criteria" creeping upwards at the same rate as overall Internet 
traffic did in the late 90s?


pt


Re: trollage (Re: Akamai server reliability)

2005-11-28 Thread Randy Bush

> It isn't just that they are wasting my time.  They are also wasting their
> own time.  It's the overall lack efficiency that bothers me ;-]

i suspect you have a datapoint on how they're doing financially.
they ain't stoopid.  they'll deal with it when the cost/benefit
gets high enough on their priority list.  isn't the first time
that good s&m covers some technical gaps, and won't be the last.

randy



Re: trollage (Re: Akamai server reliability)

2005-11-28 Thread Petri Helenius


Chris Owen wrote:


It isn't just that they are wasting my time.  They are also wasting their
own time.  It's the overall lack efficiency that bothers me ;-]
 

Don't worry, it wont take long until google parks their 
datacenter-in-a-container outside at the fiber junction and the content 
distribution guys will be obsoleted overnight.


Pete



Re: trollage (Re: Akamai server reliability)

2005-11-28 Thread Chris Owen

On Mon, 28 Nov 2005, Deepak Jain wrote:

> >> Never underestimate the amount of airbills that can be paid with KISS
> >> strategy.
> >
> > Especially since Akamai doesn't pay for truck rolls and man hours to get
> > the replacements done onsite.
>
> I'm sorry, isn't that exactly what an airbill *is* paying for -- to get
> the equipment on site?
>
> The man hours (really, we are talking about less than a single hour to
> replace a server including all the mounting and repacking). The one man
> hour that they need (no more than 6 a year by the look of it) should
> offset the value the ISP is getting from not buying bandwidth to get to
> the content and for the improved performance they get.
>
> If that model doesn't work for the ISP in question, they should ask
> Akamai to pull their gear.

I didn't really get the impression that people were really complaining so
much (I certainly wasn't) as they were just pointing out there was an
issue.

However, I do think Akamai would be better off getting their issues with
their replacement boxes straightened out.  I agree that we get value for
having the boxes on our network (and so do they lets not forget).
However, it is a bit frustrating to replace the same box 3 times in less
than a month.  Hauling a box down to the colo is no big deal but when the
box you are taking down there has a dead CPU fan and two dead case fans
it's hard not to think you might be wasting your time.

It isn't just that they are wasting my time.  They are also wasting their
own time.  It's the overall lack efficiency that bothers me ;-]

Chris

--
~~~
Chris Owen~ Garden City (620) 275-1900 ~  Lottery (noun):
President ~ Wichita (316) 858-3000 ~A stupidity tax
Hubris Communications Inc ~   www.hubris.net   ~
~~~



Re: trollage (Re: Akamai server reliability)

2005-11-28 Thread Joel Jaeggli


On Mon, 28 Nov 2005, Chris Owen wrote:


As far as I can tell the only thing that will get a box replaced is if it
can't be booted/pinged.  We've pointed out dead CPU fans before (even on
the incoming replacement boxes) and they've never seemed to care.  If it
runs it runs.  If it doesn't they replace the entire box.


Having built a fair number of machines to live for 5 years or longer in 
data-centers I will never visit, there's relatively little that you want 
to triage onsite on a rackmount pc. Drives, in hot-plug enclosures and 
removable power supply modules are about it... Smart-hands are good for 
racking and stacking, swapping disks, recabling the oob, swapping media 
and so forth.  It's not really a good use of someone else's time to have 
them performing experimental surgery on pc's. Much better to simply ship 
out another one and ship the old one back in the same box.


Decent modern 1u chassis still have sufficient airflow with a couple fans 
failed to remain adequately cool, further there's now enough sensors in a 
pc to be able to tell when you getting in trouble, rpm indicator for all 
the fans, intake processor and output temperature, thermal sensors in 
each of the drives etc. Our success-rate at indetifying machines before 
they fail has gotten substantially better over time.



Given all their redundancy I suppose that is probably the way to go.



Chris

--
~~~
Chris Owen~ Garden City (620) 275-1900 ~  Lottery (noun):
President ~ Wichita (316) 858-3000 ~A stupidity tax
Hubris Communications Inc ~   www.hubris.net   ~
~~~



--
--
Joel Jaeggli   Unix Consulting [EMAIL PROTECTED]
GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2



Re: trollage (Re: Akamai server reliability)

2005-11-28 Thread Deepak Jain




Never underestimate the amount of airbills that can be paid with KISS 
strategy.



Especially since Akamai doesn't pay for truck rolls and man hours to get 
the replacements done onsite.




I'm sorry, isn't that exactly what an airbill *is* paying for -- to get 
the equipment on site?


The man hours (really, we are talking about less than a single hour to 
replace a server including all the mounting and repacking). The one man 
hour that they need (no more than 6 a year by the look of it) should 
offset the value the ISP is getting from not buying bandwidth to get to 
the content and for the improved performance they get.


If that model doesn't work for the ISP in question, they should ask 
Akamai to pull their gear.


DJ


Re: trollage (Re: Akamai server reliability)

2005-11-28 Thread Mikael Abrahamsson


On Mon, 28 Nov 2005, Christian Kuhtz wrote:

Never underestimate the amount of airbills that can be paid with KISS 
strategy.


Especially since Akamai doesn't pay for truck rolls and man hours to get 
the replacements done onsite.


--
Mikael Abrahamssonemail: [EMAIL PROTECTED]


Re: trollage (Re: Akamai server reliability)

2005-11-28 Thread Chris Owen

On Mon, 28 Nov 2005, Christopher L. Morrow wrote:

> On Mon, 28 Nov 2005, Bill Woodcock wrote:
>
> >
> >   On Mon, 28 Nov 2005, Christian Kuhtz wrote:
> > > > I know the idea is to have very cheap boxes in clusters, but I 
> > wonder
> > > > how much they're paying in shipping for replacing the cheap 
> > hardware.
> > > Never underestimate the amount of airbills that can be paid with KISS
> > > strategy.
> >
> > Yep, that's true.  Shipping is cheap, it's customs that's expensive and
> > time-consuming, and Akamai tends to avoid the kind of places where you
> > have to deal with a lot of customs.
>
> I'd note that the original poster didn't classify 'broken' or 'outage'
> or 'non-functioning'... just the end result: "replacement".
>
> So, is akamai doing some fancy SMART detection and seeing bad fans and
> replacing, seeing a bad cpu fan or disk or memory corruption and
> replacing, or are these hard box outages with no recourse but a complete
> immediate replacement?

As far as I can tell the only thing that will get a box replaced is if it
can't be booted/pinged.  We've pointed out dead CPU fans before (even on
the incoming replacement boxes) and they've never seemed to care.  If it
runs it runs.  If it doesn't they replace the entire box.

Given all their redundancy I suppose that is probably the way to go.

Chris

--
~~~
Chris Owen~ Garden City (620) 275-1900 ~  Lottery (noun):
President ~ Wichita (316) 858-3000 ~A stupidity tax
Hubris Communications Inc ~   www.hubris.net   ~
~~~



Re: trollage (Re: Akamai server reliability)

2005-11-28 Thread Christopher L. Morrow

On Mon, 28 Nov 2005, Bill Woodcock wrote:

>
>   On Mon, 28 Nov 2005, Christian Kuhtz wrote:
> > > I know the idea is to have very cheap boxes in clusters, but I wonder
> > > how much they're paying in shipping for replacing the cheap hardware.
> > Never underestimate the amount of airbills that can be paid with KISS
> > strategy.
>
> Yep, that's true.  Shipping is cheap, it's customs that's expensive and
> time-consuming, and Akamai tends to avoid the kind of places where you
> have to deal with a lot of customs.

I'd note that the original poster didn't classify 'broken' or 'outage' or
'non-functioning'... just the end result: "replacement".

So, is akamai doing some fancy SMART detection and seeing bad fans and
replacing, seeing a bad cpu fan or disk or memory corruption and
replacing, or are these hard box outages with no recourse but a complete
immediate replacement?

(just curious as they don't let us play with these pieces/parts :) )


Akamai server reliability

2005-11-28 Thread Roy


Hi,

Many moons ago, we got a set of Akamai servers.  Over the years I think 
they replaced every one of them at least once.  Last August we got a 
another set of servers due to a move and now two of those three servers 
have failed. 

I still have the original server that started garlic.com in production 
after 11+ years so I know servers can last a long time.  I don't 
understand why Akamai failure rates are so high


Is anyone else seeing high failure rates of Akamai servers at their 
facilities?


Roy






Re: trollage (Re: Akamai server reliability)

2005-11-28 Thread Bill Woodcock

  On Mon, 28 Nov 2005, Christian Kuhtz wrote:
> > I know the idea is to have very cheap boxes in clusters, but I wonder
> > how much they're paying in shipping for replacing the cheap hardware.
> Never underestimate the amount of airbills that can be paid with KISS
> strategy.

Yep, that's true.  Shipping is cheap, it's customs that's expensive and 
time-consuming, and Akamai tends to avoid the kind of places where you 
have to deal with a lot of customs.  

-Bill



trollage (Re: Akamai server reliability)

2005-11-28 Thread Christian Kuhtz



On Nov 28, 2005, at 2:02 PM, Vinny Abello wrote:
I know the idea is to have very cheap boxes in clusters, but I  
wonder how much they're paying in shipping for replacing the cheap  
hardware.


Never underestimate the amount of airbills that can be paid with KISS  
strategy.


Anything else is trollage on NANOG.



Re: Akamai server reliability

2005-11-28 Thread Vinny Abello


At 01:39 PM 11/28/2005, Roy wrote:


Hi,

Many moons ago, we got a set of Akamai servers.  Over the years I 
think they replaced every one of them at least once.  Last August we 
got a another set of servers due to a move and now two of those 
three servers have failed.
I still have the original server that started garlic.com in 
production after 11+ years so I know servers can last a long 
time.  I don't understand why Akamai failure rates are so high


Is anyone else seeing high failure rates of Akamai servers at their 
facilities?


Out of the total three Akamai servers we have, I think we've had two 
of them replaced in the past three or four years that we've had them. 
One was replaced several times. The replacement servers tend to be 
refurbished and I've seen multiple things wrong with them when they 
arrive. If I recall correctly, one replacement wouldn't even boot 
successfully... Just kept crashing. Reloading the OS from an Akamai 
recovery CD had no affect. Shipping does cause problems whereby the 
parts can come loose during transit.


The most common problem we see is failed hard drives and/or SCSI bus 
errors which are likely related to the hard drive failures. I'm 
surprised Akamai doesn't have any hardware RAID with hot swap yet (at 
least not in the boxes we have). It would be much less costly for 
them to ship a new hard drive than a whole new server each time a 
hard drive fails. I know the idea is to have very cheap boxes in 
clusters, but I wonder how much they're paying in shipping for 
replacing the cheap hardware.


As of late, we've had no known problems with our Akamai boxes. That 
one box does occasionally have weird SCSI hangs where the other two 
work nonstop. For the most part it is fine though.




Vinny Abello
Network Engineer
Server Management
[EMAIL PROTECTED]
(973)300-9211 x 125
(973)940-6125 (Direct)
PGP Key Fingerprint: 3BC5 9A48 FC78 03D3 82E0  E935 5325 FBCB 0100 977A

Tellurian Networks - The Ultimate Internet Connection
http://www.tellurian.com (888)TELLURIAN

"Courage is resistance to fear, mastery of fear - not absence of 
fear" -- Mark Twain




Re: Akamai server reliability

2005-11-28 Thread Mike Tancsa


At 01:39 PM 28/11/2005, Roy wrote:

Is anyone else seeing high failure rates of Akamai servers at their 
facilities?


Nope, just one bad box in many years.

---Mike 



Re: Akamai server reliability

2005-11-28 Thread Chris Owen

On Mon, 28 Nov 2005, Roy wrote:

> Is anyone else seeing high failure rates of Akamai servers at their
> facilities?

We had 3 boxes for 5-6 years without a problem.  Then one of them failed.
We've since replaced that box 5-6 times in the last year.  The replacement
boxes often come with non-spining CPU fans and other issues so I'm not
that surprised.  The last replacement was a few months ago though so maybe
this one will stick around.

I think whoever is doing their refurbs isn't doing a very good job.  They
never seem very concerned though.

Chris

--
~~~
Chris Owen~ Garden City (620) 275-1900 ~  Lottery (noun):
President ~ Wichita (316) 858-3000 ~A stupidity tax
Hubris Communications Inc ~   www.hubris.net   ~
~~~