On 2012-09-14 21:44, Terje Mathisen wrote:
Ulf Samuelsson wrote:
On 2012-09-12 21:24, Richard B. Gilbert wrote:
On 9/12/2012 2:34 PM, unruh wrote:
On 2012-09-12, Ulf Samuelsson <u...@invalid.com> wrote:
Anyone knows if there are any available Linux based S/W to test the
throughput of NTP servers?
I.E:

    packets per second?
    % of lost packets
    etc?

Best Regards
Ulf Samuelsson

I hope not. I can just see someone deciding to test one of the
stratum 1
main servers (eg at the usno) Why in the world would you want this?


Sigh!  I'm sure it has happened and will happen again!  I'm sure that
there are people complaining to the National Bureau of Standards or
the Naval Observatory that their time is incorrect! ;-)

If you really want time with better than micro-second accuracy, consider
get a GPS Timing receiver. The one I bought several years
ago claimed 50 nanosecond plus or minus of the  correct time.

The NTP server we will be testing will be connected to a Cesium clock
providing a 1pps pulse so that is really not my problem.
I want to check if this system can handle DDoS attacks, and bad packets.
This will be done in a lab environment, possibly point-to-point from
the test machine, to the server, or maybe

In order to test DDoS, probably some FPGA H/W is needed to generate good
packets, and the S/W stuff is there to generate bad packets and
check how the server reacts to those

Do you really need that?

It seems to me that by modifying an ethernet card driver to do ntp
processing in kernel mode, you should be able to handle at least the
same number of ntp requests as you can do ping replies.


One of the key requirements of the Ethernet card is to do timestamping of the incoming packets. There are FPGA solutions with 10 GbE capable
of this today.

Part of the requirement, is that the dependencies on the underlying O/S
should be minimized.  If an FPGA can handle everything, then this is ideal.
Currently the FPGA will split incoming packets into streams with one
stream per thread.

Way back when, around 1992, Drew Major managed t o get a NetWare 386
server to handle a read request in 300 clock cycles. This was from
receipt of the packet and included parsing, access control checks,
locating the requested data somewhere in the memory cache, constructing
the response packet and handing it back to the NIC.

Assuming we can get the actual ntp standard request code processing down
to the absolute minimum (read the RDTSC counter (or a similar
low-latency clock source) and the latest OS tick value/RDTSC count,
scale the offset count by a fixed factor, then add to the OS clock
value) we should be able to get the entire processing down to ~100 clock
cycles or so. I.e. moving packet data in/out of the NIC buffers is going
to take comparable time.


The first FPGA H/W has some limitations, in that the reading the timestamp counter from the CPU is really not recommended, since you kill
the PCIe performance.
Instead the ntp code adds a delay to the incoming packet timestamp,
and the FPGA H/W sends out the packet at the correct time.

(Any other kind of request is handled as today, i.e. queued for ntpd
processing, unless DDOS level packet rates cause the queue to pass some
very low limit in size, at which point we discard the requests.)

Any packet which fails some minimum sanity checks can be discarded
quickly, this is less overhead than handling it over to the regular
user-level ntpd process.


How do you test that this works?
Any specific S/W package that you developed?

Recording the packets will be done with FPGA H/W as well.

So a network sniffer won't be fast enough?

The FPGA card is a network sniffer as well so there is ready made S/W for this.


You're talking 10 GiGE wire speed, right?

Yes.


That's more than 100 M requests/second!

Line speed is 10M+ packets/second.

I have been told that a single compromised home router can generate about 3000 packets per second on a 100 Mbps network.
With 3-4000 such routers you reach 10 GbE linespeed.
My local service provider is  now offering 1 Gb Internet access at home,
(if I care for some throughput),
so with some decent H/W, there could be more.
This solution is supposed to have some lifetime.

Probably some intelligence in front of the NTP server
which removes nasty packets would be useful as well.


Taking a pessimistic view (1K clock cycles/request) would give just 3M
packets/core/second, so a 32-core (4x8) machine would suffice.


Getting closer to my 100-cycle target (for chained processing of a bunch
of consecutive request packets) drops the cpu requirements down to a
regular quad core single cpu machine, but at this point the bus probably
won't be able to keep up with the NIC.

Terje

BR
Ulf Samuelsson

_______________________________________________
questions mailing list
questions@lists.ntp.org
http://lists.ntp.org/listinfo/questions

Reply via email to