I added a few lines of code to the main receive routine. They were copied from old-old code and restore bumping sys_newversion and sys_oldversion. There is also a bail case that I don't fully understand. Would you please check. It seems to be working, but there might be something interesting in the bail path.
I'm working on adding counters to the mrulist code. So I'm thinking about counters. We should review the existing counters for the main packet flow. They get logged hourly via sysstats and ntpq can also show them via sysstats. The sysstats counters get reset every hour. There is no way to see the totals. We can reconstruct them from the log files. When things get complicated, I think the key step for understanding what counters mean is an overview description of how the code works - a high level flow chart where we can associate a counter with a box. Do we have technology for making flow charts? Are they better than text? For the MRU stuff, I think I can do a decent job with text. (It's not that complicated.) Should the text go in the code or in a separate doc where the ntpq documentation can refer to it? ... -------- There is some "interesting" code at the bottom of ntpd/ntp_monitor.c Go to the bottom of the file, then back up a screen or so to the "Got one, initialize it" comment. 10 lines above that is a call to ntp_random. That code either gives up and doesn't return a slot or it preempts the oldest slot when it isn't old enough for the normal recycle-oldest-slot path to work. The only description of that idea I can find is in docs/includes/access-commands.txt under discard monitor. The description says "probability", so I expect the argument should be in the range of 0 to 1. But the code seems to be working with the slot age in seconds. So either the code or documentation needs fixing. It's probably minor in either case, but I haven't figured out a simple description for what the code is actually doing. It may be just a scale factor. It seems unlikely that code path is ever used, at least on a well tuned system. It may have been useful back before the MRU list had a hash table so large tables weren't possible. It might be interesting today for a memory constrained system supporting a lot of traffic. I think the idea is to make sure that abusive clients don't get lost in the noise on a busy server. Each slot is roughly 100 bytes. So 100 megabytes is a million slots. (I think that fits even on a Raspberry Pi.) At 1000 packets per second, the oldest slot would be 1000 seconds even if the noise was only 1 hit per slot. An abusive client has to be sending faster than that in order to be abusive. Ahh. Maybe I just figured it out. There is a design oversight in the MRU recycle/preempt logic. There are 2 simple cases where you want to reuse the last slot. The first is when the table isn't full, but the oldest slot is older than mru_maxage. This code exists. It lets you avoid cluttering up the list with stuff that is too old to be interesting. (We could add a background call to move old slots to the free list, but currently they just sit there so you can get really old slots.) The other case doesn't exist yet and/or is tangled with that "random". If the table is full you want to use the oldest slot. But maybe not if it isn't old enough. I think we need another parameter and a few more lines of code to implement this. My "seems unlikely" comment above was assuming the second case was handled sanely. More cruft. All the tests for oldest != NULL can be removed. If the list is totally empty, the slot would have come from the free list. -- These are my opinions. I hate spam. _______________________________________________ devel mailing list [email protected] http://lists.ntpsec.org/mailman/listinfo/devel
