Re: [ntp:questions] Testing throughput in NTP servers
On 2012-09-14 21:44, Terje Mathisen wrote: Ulf Samuelsson wrote: On 2012-09-12 21:24, Richard B. Gilbert wrote: On 9/12/2012 2:34 PM, unruh wrote: On 2012-09-12, Ulf Samuelsson u...@invalid.com wrote: Anyone knows if there are any available Linux based S/W to test the throughput of NTP servers? I.E: packets per second? % of lost packets etc? Best Regards Ulf Samuelsson I hope not. I can just see someone deciding to test one of the stratum 1 main servers (eg at the usno) Why in the world would you want this? Sigh! I'm sure it has happened and will happen again! I'm sure that there are people complaining to the National Bureau of Standards or the Naval Observatory that their time is incorrect! ;-) If you really want time with better than micro-second accuracy, consider get a GPS Timing receiver. The one I bought several years ago claimed 50 nanosecond plus or minus of the correct time. The NTP server we will be testing will be connected to a Cesium clock providing a 1pps pulse so that is really not my problem. I want to check if this system can handle DDoS attacks, and bad packets. This will be done in a lab environment, possibly point-to-point from the test machine, to the server, or maybe In order to test DDoS, probably some FPGA H/W is needed to generate good packets, and the S/W stuff is there to generate bad packets and check how the server reacts to those Do you really need that? It seems to me that by modifying an ethernet card driver to do ntp processing in kernel mode, you should be able to handle at least the same number of ntp requests as you can do ping replies. One of the key requirements of the Ethernet card is to do timestamping of the incoming packets. There are FPGA solutions with 10 GbE capable of this today. Part of the requirement, is that the dependencies on the underlying O/S should be minimized. If an FPGA can handle everything, then this is ideal. Currently the FPGA will split incoming packets into streams with one stream per thread. Way back when, around 1992, Drew Major managed t o get a NetWare 386 server to handle a read request in 300 clock cycles. This was from receipt of the packet and included parsing, access control checks, locating the requested data somewhere in the memory cache, constructing the response packet and handing it back to the NIC. Assuming we can get the actual ntp standard request code processing down to the absolute minimum (read the RDTSC counter (or a similar low-latency clock source) and the latest OS tick value/RDTSC count, scale the offset count by a fixed factor, then add to the OS clock value) we should be able to get the entire processing down to ~100 clock cycles or so. I.e. moving packet data in/out of the NIC buffers is going to take comparable time. The first FPGA H/W has some limitations, in that the reading the timestamp counter from the CPU is really not recommended, since you kill the PCIe performance. Instead the ntp code adds a delay to the incoming packet timestamp, and the FPGA H/W sends out the packet at the correct time. (Any other kind of request is handled as today, i.e. queued for ntpd processing, unless DDOS level packet rates cause the queue to pass some very low limit in size, at which point we discard the requests.) Any packet which fails some minimum sanity checks can be discarded quickly, this is less overhead than handling it over to the regular user-level ntpd process. How do you test that this works? Any specific S/W package that you developed? Recording the packets will be done with FPGA H/W as well. So a network sniffer won't be fast enough? The FPGA card is a network sniffer as well so there is ready made S/W for this. You're talking 10 GiGE wire speed, right? Yes. That's more than 100 M requests/second! Line speed is 10M+ packets/second. I have been told that a single compromised home router can generate about 3000 packets per second on a 100 Mbps network. With 3-4000 such routers you reach 10 GbE linespeed. My local service provider is now offering 1 Gb Internet access at home, (if I care for some throughput), so with some decent H/W, there could be more. This solution is supposed to have some lifetime. Probably some intelligence in front of the NTP server which removes nasty packets would be useful as well. Taking a pessimistic view (1K clock cycles/request) would give just 3M packets/core/second, so a 32-core (4x8) machine would suffice. Getting closer to my 100-cycle target (for chained processing of a bunch of consecutive request packets) drops the cpu requirements down to a regular quad core single cpu machine, but at this point the bus probably won't be able to keep up with the NIC. Terje BR Ulf Samuelsson ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Testing throughput in NTP servers
Ulf Samuelsson wrote: On 2012-09-14 21:44, Terje Mathisen wrote: Do you really need that? It seems to me that by modifying an ethernet card driver to do ntp processing in kernel mode, you should be able to handle at least the same number of ntp requests as you can do ping replies. One of the key requirements of the Ethernet card is to do timestamping of the incoming packets. There are FPGA solutions with 10 GbE capable of this today. OK. Part of the requirement, is that the dependencies on the underlying O/S should be minimized. If an FPGA can handle everything, then this is ideal. Currently the FPGA will split incoming packets into streams with one stream per thread. Why can't your FPGA do the entire NTP processing, for all regular request packets? Grab the current timestamp, fill it into the T1 T2 fields, along with leap indicator etc., then return the packet. The first FPGA H/W has some limitations, in that the reading the timestamp counter from the CPU is really not recommended, since you kill the PCIe performance. If the FPGA can do everything, then it (obviously) require a board-local clock source as well... Instead the ntp code adds a delay to the incoming packet timestamp, and the FPGA H/W sends out the packet at the correct time. OK, this still means that the host CPU must be involved in every packet. (Any other kind of request is handled as today, i.e. queued for ntpd processing, unless DDOS level packet rates cause the queue to pass some very low limit in size, at which point we discard the requests.) Any packet which fails some minimum sanity checks can be discarded quickly, this is less overhead than handling it over to the regular user-level ntpd process. How do you test that this works? Any specific S/W package that you developed? I haven't done this for ntp (yet), but I have been involved with high-performance network code since around 1986, and I wrote my own file transfer sw that did large frames/sliding window/selective retransmit back in 1984. Recording the packets will be done with FPGA H/W as well. So a network sniffer won't be fast enough? The FPGA card is a network sniffer as well so there is ready made S/W for this. You're talking 10 GiGE wire speed, right? Yes. That's more than 100 M requests/second! Line speed is 10M+ packets/second. That should be easy. (Famous last words!) With a 1000 cycles/packet processing budget, 3 cores of a 3GHz quad core cpu would be more or less sufficient. I have been told that a single compromised home router can generate about 3000 packets per second on a 100 Mbps network. With 3-4000 such routers you reach 10 GbE linespeed. My local service provider is now offering 1 Gb Internet access at home, (if I care for some throughput), so with some decent H/W, there could be more. This solution is supposed to have some lifetime. Probably some intelligence in front of the NTP server which removes nasty packets would be useful as well. If that intelligence takes more processing cycles than an actual ntp request/reply response, then I'd be willing to settle for a very simple rate limiter. I.e. if the same (possibly faked!) source address is generating more than a packet per second or so, send a KOD reply, then stop responding. Of course, handling/filtering 10M packets/second _could_ require a 10M entry hash table of source addresses, at which point FPGA hw will get into memory access problems, right? I would like to look into locking N-1 of the cores into a busy loop, polling for new packets and processing them as soon as they arrive. Since this avoid the IRQ overhead it should be possible to at least get very close to the actual bus transfer rate, and with close to fixed time delay from line receipt until the cpu gets access. For outgoing wire speed packets it is a bit harder, since you must send streams of packets, and the actual delay will depend upon the current buffer/queue level. Estimating the actual outgoing time by checking the queue size should give a pretty good guesstimate, we are talking about sub 1000 bit packets, so each ntp packet takes less than 100 ns. Terje -- - Terje.Mathisen at tmsw.no almost all programming can be viewed as an exercise in caching ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Testing throughput in NTP servers
Part of the requirement, is that the dependencies on the underlying O/S should be minimized. If an FPGA can handle everything, then this is ideal. Currently the FPGA will split incoming packets into streams with one stream per thread. Why can't your FPGA do the entire NTP processing, for all regular request packets? Grab the current timestamp, fill it into the T1 T2 fields, along with leap indicator etc., then return the packet. Because noone wrote the HDL for it (yet). This is the alternative solution, if the first solution does not work. The first FPGA H/W has some limitations, in that the reading the timestamp counter from the CPU is really not recommended, since you kill the PCIe performance. If the FPGA can do everything, then it (obviously) require a board-local clock source as well... Instead the ntp code adds a delay to the incoming packet timestamp, and the FPGA H/W sends out the packet at the correct time. OK, this still means that the host CPU must be involved in every packet. Yes, in the current implementation. ... That's more than 100 M requests/second! Line speed is 10M+ packets/second. That should be easy. (Famous last words!) SMOP = Small Matter Of Programming. With a 1000 cycles/packet processing budget, 3 cores of a 3GHz quad core cpu would be more or less sufficient. ... Terje ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Testing throughput in NTP servers
Ulf Samuelsson wrote: Instead the ntp code adds a delay to the incoming packet timestamp, and the FPGA H/W sends out the packet at the correct time. OK, this still means that the host CPU must be involved in every packet. Yes, in the current implementation. Line speed is 10M+ packets/second. That should be easy. (Famous last words!) SMOP = Small Matter Of Programming. Yep, this is definitely a SMOP. I don't know exactly how your intended 10GE NIC works, but I'll assume it has some form of bus master interface, since anything else will definitely NOT run at wire speed, right? I'll also assume that the NIC driver sets up the required input/output buffers, so that user-level processes can access them. In that case my N-1 cores scenario maps nicely, even without kernel access: In my cache-aligned round robin shm buffer test code the writer thread/core managed to generate 14 M packets/second, and each of the 3 reader threads/cores picks up pretty much all of them as they pass by. Using fetchadd (LOCK XADD on x86) to atomically grab the next packet works nicely even when 3 cores are spinning in a tight loop doing nothing else. As soon as we add some processing time (to fetch interpolate the current time stamp) lock contention will drop down. We'll use the same algorithm to drop outgoing packets into the output buffer. (With proper packet scatter/gather bus master hw, it is probably possible to hand packets to the NIC in the form of a list of pointer/size pairs. If so, we can make the list entries fixed size (64 or 128 bits) and reuse the same buffer for the actual incoming and outgoing data, avoiding packet copying.) Most packets will be processed within a fraction of a us from the time it is received, and for NTP this is good enough, even without HW timestamping/1588 hw. Terje -- - Terje.Mathisen at tmsw.no almost all programming can be viewed as an exercise in caching ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Testing throughput in NTP servers
Rick Jones wrote: Terje Mathisen terje.mathisen at tmsw.no wrote: You're talking 10 GiGE wire speed, right? That's more than 100 M requests/second! If I recall correctly, the maximum number of minimim sized frames per second for 10GbE is something like 14.7M or 14.8M each way. I think Ooops, I managed to skip the bits-to-bytes step.:-( Anyway, the CPU cycles needed for wire speed is obviously easy to get then, the bottlenecks should be in the NIC/bus/hw parts. Luigi Rizzo has gotten 10+ Mpps with an Intel NIC (just basic networking) using his netmap stuff. Using a two-socket server with E5-2680s, and a non-Intel 10GbE NIC I've seen around 2.89 M pps each way on a single port with aggregate, concurrent, burst-mode netperf TCP_RR tests. If I use non-burst mode and many more concurrent netperf's it is more like 2.5 (from memory). I suspect it becomes more of a context-switching benchmark at that point. Haven't quite gotten around to driving both ports at once. Right. Terje -- - Terje.Mathisen at tmsw.no almost all programming can be viewed as an exercise in caching ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Testing throughput in NTP servers
On 2012-09-12 20:34, unruh wrote: On 2012-09-12, Ulf Samuelsson u...@invalid.com wrote: Anyone knows if there are any available Linux based S/W to test the throughput of NTP servers? I.E: packets per second? % of lost packets etc? Best Regards Ulf Samuelsson I hope not. I can just see someone deciding to test one of the stratum 1 main servers (eg at the usno) Why in the world would you want this? I am part of a small team that is evaluating how to replace some existing stratum 1 servers, so your guess is not far off The tests will be done in a lab environment so you will still be able to record Dr Phil or whatever your NTP needs are :-) BR Ulf Samuelsson ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Testing throughput in NTP servers
On 2012-09-12 21:24, Richard B. Gilbert wrote: On 9/12/2012 2:34 PM, unruh wrote: On 2012-09-12, Ulf Samuelsson u...@invalid.com wrote: Anyone knows if there are any available Linux based S/W to test the throughput of NTP servers? I.E: packets per second? % of lost packets etc? Best Regards Ulf Samuelsson I hope not. I can just see someone deciding to test one of the stratum 1 main servers (eg at the usno) Why in the world would you want this? Sigh! I'm sure it has happened and will happen again! I'm sure that there are people complaining to the National Bureau of Standards or the Naval Observatory that their time is incorrect! ;-) If you really want time with better than micro-second accuracy, consider get a GPS Timing receiver. The one I bought several years ago claimed 50 nanosecond plus or minus of the correct time. The NTP server we will be testing will be connected to a Cesium clock providing a 1pps pulse so that is really not my problem. I want to check if this system can handle DDoS attacks, and bad packets. This will be done in a lab environment, possibly point-to-point from the test machine, to the server, or maybe In order to test DDoS, probably some FPGA H/W is needed to generate good packets, and the S/W stuff is there to generate bad packets and check how the server reacts to those Recording the packets will be done with FPGA H/W as well. BR Ulf Samuelsson ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Testing throughput in NTP servers
On 9/14/2012 11:44 AM, Ulf Samuelsson wrote: I am part of a small team that is evaluating how to replace some existing stratum 1 servers... On 9/14/2012 11:57 AM, Ulf Samuelsson wrote: The NTP server we will be testing will be connected to a Cesium clock providing a 1pps pulse... I want to check if this system can handle DDoS attacks, and bad packets. This will be done in a lab environment, possibly point-to-point from the test machine, to the server... I think I've seen some data on the University of Wisconsin unintentional ntp DDOS, as well as some other occurrences; IIRC, the upstream pipe was as much or more of a issue, than anything else. -- E-Mail Sent to this address blackl...@anitech-systems.com will be added to the BlackLists. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Testing throughput in NTP servers
BlackLists wrote: Ulf Samuelsson wrote: I am part of a small team that is evaluating how to replace some existing stratum 1 servers... The NTP server we will be testing will be connected to a Cesium clock providing a 1pps pulse... I want to check if this system can handle DDoS attacks, and bad packets. This will be done in a lab environment, possibly point-to-point from the test machine, to the server... I think I've seen some data on the University of Wisconsin unintentional ntp DDOS, as well as some other occurrences; IIRC, the upstream pipe was as much or more of a issue, than anything else. This might be a source of relevant information: http://pages.cs.wisc.edu/~plonka/netgear-sntp/ -- E-Mail Sent to this address blackl...@anitech-systems.com will be added to the BlackLists. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Testing throughput in NTP servers
Ulf Samuelsson wrote: On 2012-09-12 21:24, Richard B. Gilbert wrote: On 9/12/2012 2:34 PM, unruh wrote: On 2012-09-12, Ulf Samuelsson u...@invalid.com wrote: Anyone knows if there are any available Linux based S/W to test the throughput of NTP servers? I.E: packets per second? % of lost packets etc? Best Regards Ulf Samuelsson I hope not. I can just see someone deciding to test one of the stratum 1 main servers (eg at the usno) Why in the world would you want this? Sigh! I'm sure it has happened and will happen again! I'm sure that there are people complaining to the National Bureau of Standards or the Naval Observatory that their time is incorrect! ;-) If you really want time with better than micro-second accuracy, consider get a GPS Timing receiver. The one I bought several years ago claimed 50 nanosecond plus or minus of the correct time. The NTP server we will be testing will be connected to a Cesium clock providing a 1pps pulse so that is really not my problem. I want to check if this system can handle DDoS attacks, and bad packets. This will be done in a lab environment, possibly point-to-point from the test machine, to the server, or maybe In order to test DDoS, probably some FPGA H/W is needed to generate good packets, and the S/W stuff is there to generate bad packets and check how the server reacts to those Do you really need that? It seems to me that by modifying an ethernet card driver to do ntp processing in kernel mode, you should be able to handle at least the same number of ntp requests as you can do ping replies. Way back when, around 1992, Drew Major managed to get a NetWare 386 server to handle a read request in 300 clock cycles. This was from receipt of the packet and included parsing, access control checks, locating the requested data somewhere in the memory cache, constructing the response packet and handing it back to the NIC. Assuming we can get the actual ntp standard request code processing down to the absolute minimum (read the RDTSC counter (or a similar low-latency clock source) and the latest OS tick value/RDTSC count, scale the offset count by a fixed factor, then add to the OS clock value) we should be able to get the entire processing down to ~100 clock cycles or so. I.e. moving packet data in/out of the NIC buffers is going to take comparable time. (Any other kind of request is handled as today, i.e. queued for ntpd processing, unless DDOS level packet rates cause the queue to pass some very low limit in size, at which point we discard the requests.) Any packet which fails some minimum sanity checks can be discarded quickly, this is less overhead than handling it over to the regular user-level ntpd process. Recording the packets will be done with FPGA H/W as well. So a network sniffer won't be fast enough? You're talking 10 GiGE wire speed, right? That's more than 100 M requests/second! Taking a pessimistic view (1K clock cycles/request) would give just 3M packets/core/second, so a 32-core (4x8) machine would suffice. Getting closer to my 100-cycle target (for chained processing of a bunch of consecutive request packets) drops the cpu requirements down to a regular quad core single cpu machine, but at this point the bus probably won't be able to keep up with the NIC. Terje -- - Terje.Mathisen at tmsw.no almost all programming can be viewed as an exercise in caching ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Testing throughput in NTP servers
Ulf, Please contact me from a working email address. Sorry to bother everybody else. -- Harlan Stenn st...@ntp.org http://networktimefoundation.org - be a member! ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Testing throughput in NTP servers
Terje Mathisen terje.mathisen at tmsw.no wrote: You're talking 10 GiGE wire speed, right? That's more than 100 M requests/second! If I recall correctly, the maximum number of minimim sized frames per second for 10GbE is something like 14.7M or 14.8M each way. I think Luigi Rizzo has gotten 10+ Mpps with an Intel NIC (just basic networking) using his netmap stuff. Using a two-socket server with E5-2680s, and a non-Intel 10GbE NIC I've seen around 2.89 M pps each way on a single port with aggregate, concurrent, burst-mode netperf TCP_RR tests. If I use non-burst mode and many more concurrent netperf's it is more like 2.5 (from memory). I suspect it becomes more of a context-switching benchmark at that point. Haven't quite gotten around to driving both ports at once. rick jones -- It is not a question of half full or empty - the glass has a leak. The real question is Can it be patched? these opinions are mine, all mine; HP might not want them anyway... :) feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH... ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Testing throughput in NTP servers
On Wed, Sep 12, 2012 at 02:28:24PM +0200, Ulf Samuelsson wrote: Anyone knows if there are any available Linux based S/W to test the throughput of NTP servers? I.E: packets per second? % of lost packets etc? I've used tcpdump and tcpreplay to measure the maximum packet rate ntpd can handle. IIRC, the ntpd process itself needed only a couple of percent of the CPU, I think the bottleneck is always in the kernel or the NIC. -- Miroslav Lichvar ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
[ntp:questions] Testing throughput in NTP servers
Anyone knows if there are any available Linux based S/W to test the throughput of NTP servers? I.E: packets per second? % of lost packets etc? Best Regards Ulf Samuelsson ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Testing throughput in NTP servers
On Sep 12, 2012, at 5:28 AM, Ulf Samuelsson wrote: Anyone knows if there are any available Linux based S/W to test the throughput of NTP servers? I.E: packets per second? % of lost packets etc? To _measure_ those things, I'd probably use tcpdump, possibly plus some of the NTP traffic graphing scripts that various folks have contributed. For actually generating a load, one can either script calls via ntpdate or similar, or simply add the server to the NTP pool, especially if your IP gets vended to the Turk Telekom netblocks. :-) Regards, -- -Chuck ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Testing throughput in NTP servers
On 2012-09-12, Ulf Samuelsson u...@invalid.com wrote: Anyone knows if there are any available Linux based S/W to test the throughput of NTP servers? I.E: packets per second? % of lost packets etc? Best Regards Ulf Samuelsson I hope not. I can just see someone deciding to test one of the stratum 1 main servers (eg at the usno) Why in the world would you want this? ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Testing throughput in NTP servers
unruh wrote: Ulf Samuelsson wrote: Anyone knows if there are any available Linux based S/W to test the throughput of NTP servers? I.E: packets per second? % of lost packets etc? I hope not. I can just see someone deciding to test one of the stratum 1 main servers (eg at the usno) Why in the world would you want this? Corporate BS paperwork, or Education Assignment would be my guess. -- E-Mail Sent to this address blackl...@anitech-systems.com will be added to the BlackLists. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Testing throughput in NTP servers
On 9/12/2012 2:34 PM, unruh wrote: On 2012-09-12, Ulf Samuelsson u...@invalid.com wrote: Anyone knows if there are any available Linux based S/W to test the throughput of NTP servers? I.E: packets per second? % of lost packets etc? Best Regards Ulf Samuelsson I hope not. I can just see someone deciding to test one of the stratum 1 main servers (eg at the usno) Why in the world would you want this? Sigh! I'm sure it has happened and will happen again! I'm sure that there are people complaining to the National Bureau of Standards or the Naval Observatory that their time is incorrect! ;-) If you really want time with better than micro-second accuracy, consider get a GPS Timing receiver. The one I bought several years ago claimed 50 nanosecond plus or minus of the correct time. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Testing throughput in NTP servers
On 2012-09-12, Richard B. Gilbert rgilber...@comcast.net wrote: On 9/12/2012 2:34 PM, unruh wrote: On 2012-09-12, Ulf Samuelsson u...@invalid.com wrote: Anyone knows if there are any available Linux based S/W to test the throughput of NTP servers? I.E: packets per second? % of lost packets etc? Best Regards Ulf Samuelsson I hope not. I can just see someone deciding to test one of the stratum 1 main servers (eg at the usno) Why in the world would you want this? Sigh! I'm sure it has happened and will happen again! I'm sure that there are people complaining to the National Bureau of Standards or the Naval Observatory that their time is incorrect! ;-) If you really want time with better than micro-second accuracy, consider get a GPS Timing receiver. The one I bought several years ago claimed 50 nanosecond plus or minus of the correct time. Unfortunately that is useless, if you are trying to use it discipline a computer clock. The interrupt service routines on a computer are slow enough and have enough variability (eg other interrupts occuring which switch of interrupts briefly, disk reads, etc) that getting about 2-3usec standard deviation in the timing is about the best you can do. Also you have to be very careful about termination of the line from the gps receiver to the clock, etc. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions