Re: [Beowulf] precise synchronization of system clocks

Lawrence Stewart Mon, 29 Sep 2008 15:14:52 -0700


On Sep 29, 2008, at 4:10 PM, Prentice Bisbal wrote:

In the previous thread I instigated about running services in cluster
nodes, there was some mentioning of precisely synchronizing the system
clocks and this issue is also mentioned in this paper:

"The Case of Missing Supercomputer Performance: Achieving Optimal
Performance on the 8,192 processor ASCI Q" (Petrini, Kerbisin andPakin)
http://hpc.pnl.gov/people/fabrizio/papers/sc03_noise.pdf
I've also read a few other papers on the topic, and it seems youneed tosync the system clocks to ~1 uS. On top of that, I imagine you alsoneed
to synch the activities of each system so they all stop to do the same
system-level tasks at the same time.
The papers I read all mentioned different OSes, or at leastspecialized
hardware. Can this level of synchronization be achieved in Linux on
commodity hardware?  I imagine NTP doesn't have the resolution needed
for this, and Don Becker has some strong feelings against NTP.

The SiCortex systems I work on are not commodity, but they do runLinux. All the node chips in the machine are frequency locked to thesame oscillator, so the core cycle counters (MIPS standard) advance atthe same rate, but because the cores are released from reset atdifferent times, they are not initially synchronized. We recentlyadded a global clock synchronization step to booting the system bytimestamping messages sent over an out-of-band channel of theinterconnect. After some futzing around, we're able to synchronize allthe cycle counters to within about 50 nanoseconds. The timerinterrupts then happen at the same counter values system wide, whichnaturally synchronizes most of the daemons that wake up. I don'tthink we've gone to the trouble of gang scheduling them as well, whichwould also be a good idea.

We tried reducing the standard 1000Hz timer interrupts to 100 Hz, buta bunch of stuff in the IP network stack reacted badly, slowing downIP communications. We haven't tracked it all down yet.

As one would expect from the papers you cite, the clocksynchronization has had a very dramatic effect on large scalecollectives - a 5800 rank 8-byte allreduce is now down to 36microseconds, where it was something like 170 microseconds before theclock project.

Since clusters built from commodity servers run on independentoscillators, it it much harder to synchronize them - NTP will do avery good job estimating the relative frequencies, but all thoseoscillators will drift independently with temperature and aging, soyou have to run NTP continually.

However, the problem to solve - synchronizing local clocks with eachother, is different from the one NTP is intended to solve. You don'treally care what the wall clock time is, you only care that all thesystems have the same time.

I've seen some other papers on the subject of using LAN timestamps toprovide much more accurate local synchronization. Here's one thatcites 10 microsecond results:


High-Precision Relative Clock Synchronization Using Time Stamp Counters
Guo-Song Tian; Yu-Chu Tian; Fidge, C.

Engineering of Complex Computer Systems, 2008. ICECCS 2008. 13th IEEEInternational Conference on

Volume , Issue , March 31 2008-April 3 2008 Page(s):69 - 78

Incidently, a good way to measure the effects of OS noise locally isto write a program that reads the core cycle counter in a tight loop,and keeps statistics on the intervals between successive samples. Youcan find out how often and for how long your OS is going out to lunch.


_larry

_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] precise synchronization of system clocks

Reply via email to