In article <4a173189$0$18246$da0fe...@news.zen.co.uk>, Andy Yates <andyy1...@gmail.com> writes:
>Hi Hal > >Its up to us to specify what we think the SLA should be - the guide is >"as accurate as possible"! I think there is an implied "at reasonable cost" in there. I've never run a data center nor had to hassle with SLAs. If my boss gave me that task, I'd push back real hard. Where is the knee of the benefit curve? Is 100 ms good enough? What fraction of the time? How much more is 10 ms worth? There are two types of costs. One is hardware and easy to see. The other is operations. If you spec things too tight, you will create a lot of work for the operations team. If you do anything sane, the clocks will be within 10-100 ms most of the time. Is that good enough? What sort of "most" do you need? Do you have legal requirements? (as in stock market transactions) What does your lawyer say? If you are going to put a SLA for time into a contract. You will have to have a way to verify that you are meeting specs so you might as well start debugging the monitoring process now. If you are sufficiently paranoid, you will need (at least) 2 of them in each data center. You will also need a time-wizard to keep track of things. >> How stable is your temperature? (Both the room and the CPU load.) > >Temperature will be very stable, the DC is the very well specified and >scrupulously engineered - no cables blocking air flow etc. Generally >speaking the CPU is over specified. Does anybody ever hold the door open for more than a few seconds? Can you be sure they won't do it tomorrow? That's only half the problem. The other is the source of heat inside the box. An active system makes a lot more heat than an idle one. To get numbers, I'd setup a system, turn on lots of logging, leave it idle for a long time (say a day) then look at the drift. (It's in loopstats.) Then start a good load, let it run for several hours, and see how much the drift changed. I'd also look at the offset during the transient. (PS: If your specs are tight, you will have to repeat that experiment each time you get a new flavor of server box. It's just another item for the checklist.) >> What is the load on the LAN between the clients and servers? >> (Delay is not a problem. Variation in delay is a problem.) >The NTP will be on a separate management LAN to the production traffic >so not subject to the variances that application load has on the network. That seems like a reasonable assumption. Are you sure? Will it ever get used for an emergency transfer of a large file? (say recovering from a crashed disk) There are a handful of things I can think of that will screwup your clocks. temperature network load software bugs operational screwups driver quirks Linux has a history of screwing up the timekeeping kernel code. Operators can be very ingenious at finding ways to screw things up. If your time spec is tight enough, you will have to go over the checklist carefully with time in mind. You'll need to add things like "wait x minutes for the system to warm up" when you swap in a new box for one that died. Ethernet drivers often try to batch interrupts to reduce CPU overhead. Details matter. Another item for the checklist. -- These are my opinions, not necessarily my employer's. I hate spam. _______________________________________________ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions