Re: trollage (Re: Akamai server reliability)
CO> Date: Mon, 28 Nov 2005 14:57:58 -0600 (CST) CO> From: Chris Owen CO> However, I do think Akamai would be better off getting their issues with CO> their replacement boxes straightened out. I agree that we get value for CO> having the boxes on our network (and so do they lets not forget). *shrug* It's not that expensive to ship boxen back and forth, and I'd hazard a guess they have people who troubleshoot the dead en masse. If a dead box costs $50, the question becomes how much more would prolonging box death cost? CO> However, it is a bit frustrating to replace the same box 3 times in less Heh. Never had _that_ bad, personally. CO> than a month. Hauling a box down to the colo is no big deal but when the Depends. In Kansas, no. In $big_metro_area during rush hour... well, I've learned why people state "distance" in terms of hours. :-) CO> box you are taking down there has a dead CPU fan and two dead case fans CO> it's hard not to think you might be wasting your time. True. So if the CPU fan is dead, just say the box is plugged in; act surprised when doesn't ping. ;-) CO> It isn't just that they are wasting my time. They are also wasting their CO> own time. It's the overall lack efficiency that bothers me ;-] There are enough clue-challenged networks that I wouldn't want arbitrary people playing around with my gear. Shipping can be more efficient. Eddy -- Everquick Internet - http://www.everquick.net/ A division of Brotsman & Dreger, Inc. - http://www.brotsman.com/ Bandwidth, consulting, e-commerce, hosting, and network building Phone: +1 785 865 5885 Lawrence and [inter]national Phone: +1 316 794 8922 Wichita DO NOT send mail to the following addresses: [EMAIL PROTECTED] -*- [EMAIL PROTECTED] -*- [EMAIL PROTECTED] Sending mail to spambait addresses is a great way to get blocked. Ditto for broken OOO autoresponders and foolish AV software backscatter.
Re: Akamai server reliability
i see lots of hardware failures. one particular problem i have noted is just kind of a bad case layout. the network card sits in a little pci riser card and is also bolted to the back of the case. it likes to unseat itself over time. eventually the MB dies. i'm not sure if one has anything to do with the other but i suspect it does because if i pull the screws out of the network card and just let it sti in the pci riser it doesnt unseat itself and the box tends to live a bit longer. but yes. i have more failures of akamai boxes on my network then anything by a long shot. chalk it up to cheap hardware i guess. On Mon, 28 Nov 2005, Roy wrote: > Date: Mon, 28 Nov 2005 10:39:51 -0800 > From: Roy <[EMAIL PROTECTED]> > To: nanog@merit.edu > Subject: Akamai server reliability > > > Hi, > > Many moons ago, we got a set of Akamai servers. Over the years I think > they replaced every one of them at least once. Last August we got a > another set of servers due to a move and now two of those three servers > have failed. > > I still have the original server that started garlic.com in production > after 11+ years so I know servers can last a long time. I don't > understand why Akamai failure rates are so high > > Is anyone else seeing high failure rates of Akamai servers at their > facilities? > > Roy > > > > > Ryan Dobrynski Hat-Swapping Gnome Choice Communications (\_/) (O.o) (> < ) this is Bunny. Copy Bunny into your signature to help him on his way to world domination.
Re: Akamai server reliability
> I still have the original server that started garlic.com in production > after 11+ years so I know servers can last a long time. I don't > understand why Akamai failure rates are so high Applications which cause the disk to thrash will wear out disk drives much more quickly than non-thrashing applications. When I still ran USENET news servers back before cyclic file systems were used, I remember that their hard drives died frequently, often after less than a year of service, but those drives were thrashing 24 by 7. You can hear drives thrashing and feel it by touching the case. It is caused by almost completely random access resulting in almost constant head movement. It is cost effective to just thrash cheap drives and replace them when they die. --Michael Dillon
Re: trollage (Re: Akamai server reliability)
To quote a science fiction story I'm fond of, "efficiency depends on what you want to effish". --Steven M. Bellovin, http://www.cs.columbia.edu/~smb Sci-fi injection! (marking another beer owed) Gadi.
Re: trollage (Re: Akamai server reliability)
On Mon, 28 Nov 2005, Pete Templin wrote: > And hopefully they'll (someday) send servers in my direction - is their > "minimum criteria" creeping upwards at the same rate as overall Internet > traffic did in the late 90s? The impression I got was they originally scattered their machines to everyone who had a network with a growth plan and bought them a beer. Some people even got/get paid to host them. After the .com crash they started being a bit more careful about who they gave them to and doing a bit more analysis as whether a new site was worth the trouble. One way to get a cluster might be to suggest that your will make better use of it than a nearby company with a cluster that is much smaller than you. I have heard of people trying this in Australia, no idea how well it works. I know people who were doing under 10Mb/s via their clusters, but they are in Aus/NZ so the threshold might be higher elsewhere. -- Simon J. Lyall | Very Busy | Web: http://www.darkmere.gen.nz/ "To stay awake all night adds a day to your life" - Stilgar | eMT.
Re: trollage (Re: Akamai server reliability)
On Mon, 28 Nov 2005, Deepak Jain wrote: I'm sorry, isn't that exactly what an airbill *is* paying for -- to get the equipment on site? They also frequently need boxes power cycled. It got to be so frequent that we "gave them" a remote reboot switch for all their gear and told them how to use it. They still kept emailing us for reboots until I finally used a contact at akamai to get the remote reboot info properly placed. We've had our share of failed boxes, DOA boxes, boxes with components literally falling out of them on arrival, etc. I suspect it's just a sign of the box building having been farmed out to the cheapest available source. When you're building boxes in really large volume, what's a few missing screws here there? :) The man hours (really, we are talking about less than a single hour to replace a server including all the mounting and repacking). The one man hour that they need (no more than 6 a year by the look of it) should offset the value the ISP is getting from not buying bandwidth to get to the content and for the improved performance they get. I wouldn't count on that. With bandwidth prices continually falling, and the ISP business changing (at least for us, dialup/DSL is dying, hosting is taking off, and now instead of having spare outbound capacity to sell to Akamai), we do more outbound than inbound, so the servers really don't save us anything except maybe a bit of latency. If that model doesn't work for the ISP in question, they should ask Akamai to pull their gear. Think of the man hours that'd take, ripping them out, boxing them up, etc. :) -- Jon Lewis | I route Senior Network Engineer | therefore you are Atlantic Net| _ http://www.lewis.org/~jlewis/pgp for PGP public key_
Re: trollage (Re: Akamai server reliability)
In message <[EMAIL PROTECTED]>, Randy Bush writes: > >> It isn't just that they are wasting my time. They are also wasting their >> own time. It's the overall lack efficiency that bothers me ;-] > >i suspect you have a datapoint on how they're doing financially. >they ain't stoopid. they'll deal with it when the cost/benefit >gets high enough on their priority list. isn't the first time >that good s&m covers some technical gaps, and won't be the last. > To quote a science fiction story I'm fond of, "efficiency depends on what you want to effish". --Steven M. Bellovin, http://www.cs.columbia.edu/~smb
Re: trollage (Re: Akamai server reliability)
Deepak Jain wrote: If that model doesn't work for the ISP in question, they should ask Akamai to pull their gear. And hopefully they'll (someday) send servers in my direction - is their "minimum criteria" creeping upwards at the same rate as overall Internet traffic did in the late 90s? pt
Re: trollage (Re: Akamai server reliability)
> It isn't just that they are wasting my time. They are also wasting their > own time. It's the overall lack efficiency that bothers me ;-] i suspect you have a datapoint on how they're doing financially. they ain't stoopid. they'll deal with it when the cost/benefit gets high enough on their priority list. isn't the first time that good s&m covers some technical gaps, and won't be the last. randy
Re: trollage (Re: Akamai server reliability)
Chris Owen wrote: It isn't just that they are wasting my time. They are also wasting their own time. It's the overall lack efficiency that bothers me ;-] Don't worry, it wont take long until google parks their datacenter-in-a-container outside at the fiber junction and the content distribution guys will be obsoleted overnight. Pete
Re: trollage (Re: Akamai server reliability)
On Mon, 28 Nov 2005, Deepak Jain wrote: > >> Never underestimate the amount of airbills that can be paid with KISS > >> strategy. > > > > Especially since Akamai doesn't pay for truck rolls and man hours to get > > the replacements done onsite. > > I'm sorry, isn't that exactly what an airbill *is* paying for -- to get > the equipment on site? > > The man hours (really, we are talking about less than a single hour to > replace a server including all the mounting and repacking). The one man > hour that they need (no more than 6 a year by the look of it) should > offset the value the ISP is getting from not buying bandwidth to get to > the content and for the improved performance they get. > > If that model doesn't work for the ISP in question, they should ask > Akamai to pull their gear. I didn't really get the impression that people were really complaining so much (I certainly wasn't) as they were just pointing out there was an issue. However, I do think Akamai would be better off getting their issues with their replacement boxes straightened out. I agree that we get value for having the boxes on our network (and so do they lets not forget). However, it is a bit frustrating to replace the same box 3 times in less than a month. Hauling a box down to the colo is no big deal but when the box you are taking down there has a dead CPU fan and two dead case fans it's hard not to think you might be wasting your time. It isn't just that they are wasting my time. They are also wasting their own time. It's the overall lack efficiency that bothers me ;-] Chris -- ~~~ Chris Owen~ Garden City (620) 275-1900 ~ Lottery (noun): President ~ Wichita (316) 858-3000 ~A stupidity tax Hubris Communications Inc ~ www.hubris.net ~ ~~~
Re: trollage (Re: Akamai server reliability)
On Mon, 28 Nov 2005, Chris Owen wrote: As far as I can tell the only thing that will get a box replaced is if it can't be booted/pinged. We've pointed out dead CPU fans before (even on the incoming replacement boxes) and they've never seemed to care. If it runs it runs. If it doesn't they replace the entire box. Having built a fair number of machines to live for 5 years or longer in data-centers I will never visit, there's relatively little that you want to triage onsite on a rackmount pc. Drives, in hot-plug enclosures and removable power supply modules are about it... Smart-hands are good for racking and stacking, swapping disks, recabling the oob, swapping media and so forth. It's not really a good use of someone else's time to have them performing experimental surgery on pc's. Much better to simply ship out another one and ship the old one back in the same box. Decent modern 1u chassis still have sufficient airflow with a couple fans failed to remain adequately cool, further there's now enough sensors in a pc to be able to tell when you getting in trouble, rpm indicator for all the fans, intake processor and output temperature, thermal sensors in each of the drives etc. Our success-rate at indetifying machines before they fail has gotten substantially better over time. Given all their redundancy I suppose that is probably the way to go. Chris -- ~~~ Chris Owen~ Garden City (620) 275-1900 ~ Lottery (noun): President ~ Wichita (316) 858-3000 ~A stupidity tax Hubris Communications Inc ~ www.hubris.net ~ ~~~ -- -- Joel Jaeggli Unix Consulting [EMAIL PROTECTED] GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2
Re: trollage (Re: Akamai server reliability)
Never underestimate the amount of airbills that can be paid with KISS strategy. Especially since Akamai doesn't pay for truck rolls and man hours to get the replacements done onsite. I'm sorry, isn't that exactly what an airbill *is* paying for -- to get the equipment on site? The man hours (really, we are talking about less than a single hour to replace a server including all the mounting and repacking). The one man hour that they need (no more than 6 a year by the look of it) should offset the value the ISP is getting from not buying bandwidth to get to the content and for the improved performance they get. If that model doesn't work for the ISP in question, they should ask Akamai to pull their gear. DJ
Re: trollage (Re: Akamai server reliability)
On Mon, 28 Nov 2005, Christian Kuhtz wrote: Never underestimate the amount of airbills that can be paid with KISS strategy. Especially since Akamai doesn't pay for truck rolls and man hours to get the replacements done onsite. -- Mikael Abrahamssonemail: [EMAIL PROTECTED]
Re: trollage (Re: Akamai server reliability)
On Mon, 28 Nov 2005, Christopher L. Morrow wrote: > On Mon, 28 Nov 2005, Bill Woodcock wrote: > > > > > On Mon, 28 Nov 2005, Christian Kuhtz wrote: > > > > I know the idea is to have very cheap boxes in clusters, but I > > wonder > > > > how much they're paying in shipping for replacing the cheap > > hardware. > > > Never underestimate the amount of airbills that can be paid with KISS > > > strategy. > > > > Yep, that's true. Shipping is cheap, it's customs that's expensive and > > time-consuming, and Akamai tends to avoid the kind of places where you > > have to deal with a lot of customs. > > I'd note that the original poster didn't classify 'broken' or 'outage' > or 'non-functioning'... just the end result: "replacement". > > So, is akamai doing some fancy SMART detection and seeing bad fans and > replacing, seeing a bad cpu fan or disk or memory corruption and > replacing, or are these hard box outages with no recourse but a complete > immediate replacement? As far as I can tell the only thing that will get a box replaced is if it can't be booted/pinged. We've pointed out dead CPU fans before (even on the incoming replacement boxes) and they've never seemed to care. If it runs it runs. If it doesn't they replace the entire box. Given all their redundancy I suppose that is probably the way to go. Chris -- ~~~ Chris Owen~ Garden City (620) 275-1900 ~ Lottery (noun): President ~ Wichita (316) 858-3000 ~A stupidity tax Hubris Communications Inc ~ www.hubris.net ~ ~~~
Re: trollage (Re: Akamai server reliability)
On Mon, 28 Nov 2005, Bill Woodcock wrote: > > On Mon, 28 Nov 2005, Christian Kuhtz wrote: > > > I know the idea is to have very cheap boxes in clusters, but I wonder > > > how much they're paying in shipping for replacing the cheap hardware. > > Never underestimate the amount of airbills that can be paid with KISS > > strategy. > > Yep, that's true. Shipping is cheap, it's customs that's expensive and > time-consuming, and Akamai tends to avoid the kind of places where you > have to deal with a lot of customs. I'd note that the original poster didn't classify 'broken' or 'outage' or 'non-functioning'... just the end result: "replacement". So, is akamai doing some fancy SMART detection and seeing bad fans and replacing, seeing a bad cpu fan or disk or memory corruption and replacing, or are these hard box outages with no recourse but a complete immediate replacement? (just curious as they don't let us play with these pieces/parts :) )
Akamai server reliability
Hi, Many moons ago, we got a set of Akamai servers. Over the years I think they replaced every one of them at least once. Last August we got a another set of servers due to a move and now two of those three servers have failed. I still have the original server that started garlic.com in production after 11+ years so I know servers can last a long time. I don't understand why Akamai failure rates are so high Is anyone else seeing high failure rates of Akamai servers at their facilities? Roy
Re: trollage (Re: Akamai server reliability)
On Mon, 28 Nov 2005, Christian Kuhtz wrote: > > I know the idea is to have very cheap boxes in clusters, but I wonder > > how much they're paying in shipping for replacing the cheap hardware. > Never underestimate the amount of airbills that can be paid with KISS > strategy. Yep, that's true. Shipping is cheap, it's customs that's expensive and time-consuming, and Akamai tends to avoid the kind of places where you have to deal with a lot of customs. -Bill
trollage (Re: Akamai server reliability)
On Nov 28, 2005, at 2:02 PM, Vinny Abello wrote: I know the idea is to have very cheap boxes in clusters, but I wonder how much they're paying in shipping for replacing the cheap hardware. Never underestimate the amount of airbills that can be paid with KISS strategy. Anything else is trollage on NANOG.
Re: Akamai server reliability
At 01:39 PM 11/28/2005, Roy wrote: Hi, Many moons ago, we got a set of Akamai servers. Over the years I think they replaced every one of them at least once. Last August we got a another set of servers due to a move and now two of those three servers have failed. I still have the original server that started garlic.com in production after 11+ years so I know servers can last a long time. I don't understand why Akamai failure rates are so high Is anyone else seeing high failure rates of Akamai servers at their facilities? Out of the total three Akamai servers we have, I think we've had two of them replaced in the past three or four years that we've had them. One was replaced several times. The replacement servers tend to be refurbished and I've seen multiple things wrong with them when they arrive. If I recall correctly, one replacement wouldn't even boot successfully... Just kept crashing. Reloading the OS from an Akamai recovery CD had no affect. Shipping does cause problems whereby the parts can come loose during transit. The most common problem we see is failed hard drives and/or SCSI bus errors which are likely related to the hard drive failures. I'm surprised Akamai doesn't have any hardware RAID with hot swap yet (at least not in the boxes we have). It would be much less costly for them to ship a new hard drive than a whole new server each time a hard drive fails. I know the idea is to have very cheap boxes in clusters, but I wonder how much they're paying in shipping for replacing the cheap hardware. As of late, we've had no known problems with our Akamai boxes. That one box does occasionally have weird SCSI hangs where the other two work nonstop. For the most part it is fine though. Vinny Abello Network Engineer Server Management [EMAIL PROTECTED] (973)300-9211 x 125 (973)940-6125 (Direct) PGP Key Fingerprint: 3BC5 9A48 FC78 03D3 82E0 E935 5325 FBCB 0100 977A Tellurian Networks - The Ultimate Internet Connection http://www.tellurian.com (888)TELLURIAN "Courage is resistance to fear, mastery of fear - not absence of fear" -- Mark Twain
Re: Akamai server reliability
At 01:39 PM 28/11/2005, Roy wrote: Is anyone else seeing high failure rates of Akamai servers at their facilities? Nope, just one bad box in many years. ---Mike
Re: Akamai server reliability
On Mon, 28 Nov 2005, Roy wrote: > Is anyone else seeing high failure rates of Akamai servers at their > facilities? We had 3 boxes for 5-6 years without a problem. Then one of them failed. We've since replaced that box 5-6 times in the last year. The replacement boxes often come with non-spining CPU fans and other issues so I'm not that surprised. The last replacement was a few months ago though so maybe this one will stick around. I think whoever is doing their refurbs isn't doing a very good job. They never seem very concerned though. Chris -- ~~~ Chris Owen~ Garden City (620) 275-1900 ~ Lottery (noun): President ~ Wichita (316) 858-3000 ~A stupidity tax Hubris Communications Inc ~ www.hubris.net ~ ~~~