Re: Is python2 dead?
Hal Murray via devel : > Do you have any data on Go GC times? Yes. They're pretty miniscule. Most Go GC is performed concurrently with normal program execution, except for one stop-the-world phase that typically runs on the close order of 1ms for real production programs. https://medium.com/servicetitan-engineering/go-vs-c-part-2-garbage-collection-9384677f86f1 "Nearly all STW pauses in Go are really sub-millisecond ones. If you look more real-life test case (see e.g. this file), you’ll notice that 16GB static set on a ~ regular 16-core server implies your longest pause = 50ms (vs 5s for .NET), and 99.99% of pauses are shorter than 7ms (92ms for .NET)!" -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org https://lists.ntpsec.org/mailman/listinfo/devel
Re: Is python2 dead?
Hal Murray : > Maybe it's time to switch to Go? I've thought about it. It shouldn't be all that difficuly. I wrote a tol that would help: https://gitlab.com/esr/pytogo > How long would it take us to rewrite, from scratch, everything in ntpclients? Around three man-monts is my estimate. > I occasionally poke around in ntpq. I find it very hard to work with. I > think the others are much simpler. Yes, that's so. Most of the complexity is in ntpq. > Is the basic structure right? If we were starting from scratch, what would > pylib look like? I've learned by hard experience not to try to do a language translation and a rewrite at the same time, so this is a question I wouldn't want to broach while doing a lift. That said, I think the structure of pylib is basically sound. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org https://lists.ntpsec.org/mailman/listinfo/devel
Re: Is python2 dead?
Gary E. Miller via devel : > > Maybe it's time to switch to Go? > > Please, no. Go is a garbage collected language. Just what NTPsec does > not need, random, unpredictable delays. We're only takling client-side as yet. -- http://www.catb.org/~esr/";>Eric S. Raymond signature.asc Description: PGP signature ___ devel mailing list devel@ntpsec.org https://lists.ntpsec.org/mailman/listinfo/devel
Re: Is python2 dead?
Hal Murray via devel : > Really really dead? Or maybe just hiding in some dark corner? Python 2 was end-of-lifed on 1 Jan 2020. It looks pretty dead from where I'm sitting, but I'm aware that people who run RHEL have a different opynion. > Should we drop support for python2 as part of the next release? > Or announce in the next release that we will drop it as part of the following > release? The policy I havee for my projects these days is that I leave the poly machinery in place untiil I want to do something Python 3 specific, then I drop it. I would be OK with your second alternative. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org https://lists.ntpsec.org/mailman/listinfo/devel
Test #2
Checking to make sure I've resubscribed. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org https://lists.ntpsec.org/mailman/listinfo/devel
Re: Python - hard to re-aquire
Hal Murray : > > > I'm pretty sure this is a problem with the ntpq code, not with Python - > > Python in general has a reputation for being *easy* to read six months > > later, > > which I think is deserved. It's one of the first things I noticed when I > > started coding in Python back in 1998 or so. > > The hard to-go-back-to comment came from friends that I trust as much as I > trust you. (I worked closely with them for ~10 years.) The story of the blind men and the elephant comes to mind. > I don't maintain a lot of python code. I have a handful of python hacks. > The > small ones are easy to maintain. I find ntpq hard to work with. As you > suggest, that could be because it is crappy code. I'm not sure that it's crappy code; some things have to be as complicated as they are. What it certainly is is *difficult* code. > It could also be because my > head depends on type checking and python is full of automatic conversions > that > conflict with strong type checking. I think you will be happier working in Go than in Python. It's strongly tryped, and the compiler's error messages are so helpful that they come as rather a shock after years of GCC. > >> There is a bug in our ntpq, or was not-that-long-ago and I'm pretty sure > >> it > > > I'm aware. Some time back I spent a day hunting for that bug. I couldn't > > find it. That's a nasty thicket of code in there. > > The bug is in the interface between the code that collects packets and checks > the sequence numbers to make sure it got all the packets in the clump and the > code up a few layers that decides how big the clump should be. Tangled in > there is that the collect-it code should return a partial clump but doesn't. > > I'll track it down if you want to look again. It may be in the issue already. If you find the issue could use more detail, please add it. Then I'm willing to take anoither swing at fixing the problem. It has been niggling at me ever since. > Another thing to consider... > > You are planning to convert everything to Go without changing the structure, > then go back and clean things up. > > Why didn't ntpq get cleaned up after it was moved to Python? It did. The code ended up as pretty good Python, not just C awkwardly recoded as Python. A way to tell that is, for example, that it makes proper use of first-class maps rather than replicating the many ugly and unfortunate ways that C programmers try to approximate them. Often that kind of cleanup exposes and destroys bugs, but it's certainly not guaranteed to fix *every* pre-existing bug. I agree with your analysis - I think there is some dataflow error, that it was in the C code as well, and it got faithfully replicated by the faithful translation. Of course, the same risk of bug-for-bug compatibility exists in doing a stupid literal translation of ntpd. But in both cases that risk has to be compared to a different one, that trying to translate and refactor or rewrite at the same time will lead to a complexity explosion. I have seen such explosions before. They tend to ruin porting efforts and leave a large blast crater. I have learned a healthy fear of trying to do too much during a port. > [Go being easy to read.] > > BTW, I will not strongly assert that this is an advantage over Rust, because > > I never grokked Rust well enough to make a really fair comparison. But Rust > > requires a lot of ceremony in the form of lifetime declarations that neither > > Python nor Go does; a priori this probably does put it at a readability > > disadvantage. > > This should be an interesting experiment. The same info that is a > complication also tells you what is going on. Maybe. Or it could *obscure* what's going on in a surfeit of syntax. I've used and read languages that have that tendency. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Using Go for NTPsec
e to work harder to find a code defect that will pass a zero pointer than you would in C. I've only ever gotten there by accidentally dereferencing a pointer-valued variable that had been declared but not yet initialized. There is a stricter sense of "memory-safe" that means you can never generate an invalid pointer at all. Rust is almost there, but not quite. It has static null-pointer safety (you normally have to initialize a pointer to a valid target location at declaration time) but not dynamic safety. That would require a runtime check, and Rust makes a point of not having a runtime. > I put near-optimal performance high on the list. My attempt at some > simple Rust timing code got surprisingly high numbers for reading > the clock. It may have been loop overhead. Manually replacing an > iterator style loop (the clean way) with an ugly by-hand counter got > somethig reasonable. I haven't tried to figure out what is/was > going on. (yet) That's a surprising result. One of Rust's most touted advantages is low overhead for close-to-the-metal code. I wouldn't have expected it to compile fat code from an iterator. > > I gave up the experiment when I discovered that Rust had no equivalent of > > poll(2)/select(2). > > I poked around a bit and didn't find anything that looked good. I > assume you wanted to wait for network data. Did you consider a > thread per socket? I did, but I was trying to write a simple proof of concept. Not a good time to introduce threading. > My experience is that my experiment of similar trivial size was a > success. Was that because I picked a problem area that was easier > in some measure, or maybe I just got lucky and all the library > support I needed was already there. My money would be on the latter. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Using Go for NTPsec
n Go's in the sense of foreclosing more errors. I'd say the other is what they call "zero-overhead abstractions". Rust may have a more expressive type system than Go, too - there are some tricky issues around evaluating that. Unfortunately, I found the price for these virtues too high. I wish it had been otherwise. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: GC timing
Hal Murray : > If you pass in a buffer, there is no reason to allocate anything in the case > of a server processing a request so this whole discussion is a wild goose > chase. It's a little more complicated than that, because I was describing the lowest-level recvfrom() in the socket library. Even if we use that for the dumb phase 1 translation, it might be right in phase 2 to move to the higher-level interfaces in the ipv4 and ipv6 libraries. Some of those dynamically allocate buffers. In thinking this through, I want to be reasonably sure that not just the phase 1 but phase 2 implementation isn't going to take a sync accuracy hit. Thus, I'm trying to make worst-plausible-case assumptions. > Two orders of magnitude is 10K packets/second. That's in the interesting > range. Our current code can do 400K non-auth packets/second. That's only > 40% > of wire speed on a gigabit link. NTS is 90K packets/sec, 25% of wire speed. > > Memory speeds are ballpark of 20 GB/sec. So if you have a GB of headroom, it > will take at least 50 ms just to scan that. > > NIST servers are running 10E10 packets/day. That's averaging 115K > packets/second. > > https://tf.nist.gov/general/pdf/2818.pdf A NIST-grade server can afford to dedicate a lot of RAM and then set the GOGC knob to a high value that trades having a bigger working set for a longer expected interval between GCs. You can, actually, pick the mimimum GC interval you think is tolerable and get it by pushing GOGC high enough. Of course if you don't have enough physical RAM memory-retrieval times will blow up due to swapping, but these days putting a bunch of terabytes of RAM in a datacenter-grade server isn't even unusual any more 115K 90-byte packets/second will fill a GB in about a tenth of a second. Drop N terabytes of RAM on the problem and now you're looking at memory-full intervals of about N * 100 seconds. It's not going to take a large N to make latency spikes a rare and minor blip in the traffic. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Using Go for NTPsec
Hal Murray : > That sounds like the right ballpark. Again, if I were working in this area I > would be writing hack code to generate numbers. It's got to have a buffer > for > each item waiting in the channel. Does it do an alloc/free on each item or > does it avoid that by saving the buffer and reusing it? We have 3 sizes of > packets: no-auth (48 bytes), shared-key (68), and NTS (232). Can we tell it > how much to copy or will it copy worst-case? > > Processing a packet takes a few uSec for no-auth and ~10 usec for NTS. A > single 250ns is not important relative to 10 usec but will be significant > relative to a few usec if you do it several times per packet. Premature optimization. We should do *nothing* other than writing the simplest, most idiomatic code possible unless our *measurements* of GC latency spikes show that they're reaching an uncomfortable frequency. This measurement is pretty trivial to do. See https://golang.org/pkg/runtime/debug/#ReadGCStats > I still haven't seen how you plan to get the data from the client side to the > server side without a lock. See sample code above. > > The server side does: > loop: > recv_request > get_info > fill in reply > send_reply > > I suppose you could do it if channels have a peek option. That will take a > lot > more code. You will need a main channel and a sub-channel per thread and > another thread to copy over. And then you have to handle the case where a > server thread is idle so it's not checking its new-data channel... Ugh. Yeah, there's got to be a simpler way than that. But it's just borrowing trouble to try to specify the way this far in advance. The Phase 1 goal has to be a dumb, literal, unthreaded translation that comes as close as possible to just transcribing trhe existing C into Go and exhibits sane sync behavior. If we never get further than that it will still be a win because no more buffer-overrun exploits. But we will. Dunno about you, but I expect phase 2 (stupid Go to properly idiomatic Go with fully exploited concurrency) to be a helluva lot of fun. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: GC timing
Hal Murray : > > >> That doesn't make sense. Where does your "one second apart" come from? > Why > >> is "currently has 2 threads" interesting? > > When do we poll at a less than one-secpmd interval? Most allocatopmns wo;l; > > ber associated with making a packet fra,e for he send, thn dealing with a > > response that comes bacl less than 100ms later. > > I was thinking of the server side. Pool servers can easily get 100s of > packets/second. I assume that means we have to write the server side so that > it doesn't do any allocations. At just 100s of packets per second I don't think there's any real problem with dynamic allocation. Taking 96 byters as a typical query length (that includes auth and digest) let's round up to 1024 to account for allocation overhead and to make the calculation convenient. A memory gigabyte is 2^20 of these packets. That's 1048576 seconds of traffic, or 291 hours, or 12 days and change. Even if your system has just a GiB of headroom, you're only going to trigger a GC due to queries about that often. I don't think we get to numbers that even approach being worrying until two magnitudes up from here. > What is the API for recvfrom()? Do you pass in a buffer, like in C, or does > it return a newly allocated buffer? You pass in a buffer. In theory we could maintain a buffer ring. I'd want to see actual benchmarks showing frequent GCs before I'd believe it was necessary, though. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Using Go for NTPsec
Hal Murray : > > You're thinking in C. This is a problem, because mutex-and-mailbox > > architectures don't scale up well. Human brains get overwhelmed at about 4 > > mutexes; beyond that the proliferation of corner cases get to be more than > > the meat can handle. > > I'm missing something. What is the problem with lots of mutexes? I'm > guessing you are thinking of deadlocks. They don't happen if code running > with the lock held doesn't call any code that needs another lock -- a leaf > node in the tree. I think my brain can handle lots of leaf nodes. How do you know something is a leaf node? Lock claims can be arbitrarily fall down a call chain, after all. Humans are simply very, very bad at noticing unplanned mutex interactions. This is why finding more tractable concurrency primitives is a major theme in recent language designs. Nobody know what the overall best set of primitives is for defect reduction (and "best" may not be singular or even well-defined), but pretty much anything is better than mutex-and-mailbox. > Does Go have any utilities to scan code and build the lock tree? No. Naked locks are available in Go but they're considered an option of last resort and used seldom. > How do you avoid analogous deadlocks with channels? By being careful. :-) There's no way to automate your way out of this, and that is the major known problem with CSP (communicating sequential processes) as a concurrency-primitive set - indeed, early notice of this problem is the reason there was very little experimentation with CSP between occam (1983) and Go (2009). Go is actually something of a surprise; according to the conventional wisdom in this area CSP should not have been as successful as the Go implementation in Go has turned out to be. That is, deadlocks *should* be a bigger problem in the real world than they are. I don't know why this is. The Go designers did something subtle that I don't understand, and nobody else seems to either. I have researched this because I find it puzzling and interesting, and not found any plausible theory. Other primitive sets have different problems. In general, the more different things you can do with a concurrency-primitive set, the more it challenges human cognitive limitations and blows up defect rates. So in the history of language design you see a lot of attempts to invent more limited and tractable primitives - Ada rendezvous, for example, or generators in Icon - followed by dissatisfaction when they turn out not to be enough of an improvement over mutex-and-mailbox to sweep the field. Since the collapse of Dennard scaling the pace of invention in this area has increased. Which is why Rust has library support for several different primitive sets. Don't try mixing them if you value your sanity. > What is the cost of a lock vs a channel pass? Given that the channel needs > to > schedule another thread, I'll guess that costs significantly more. You're right, it does. There are occasional complaints about this - here's a representative one. https://dparrish.com/2016/03/go-channels-slow-for-bulk-data On the other hand, there isn't enough visible bitching about channel overhead to suggest that it's a pervasive problem. I found a good dive into benchmarking here https://syslog.ravelin.com/so-just-how-fast-are-channels-anyway-4c156a407e45 that may explain why. Part of his takeaway is * When it makes sense to use channels don’t worry overmuch about their performance: they’re pretty good! Normally the work you will do per item will greatly exceed the 90–250 ns it takes to move the item through the channel, so it’s just not worth worrying about. > The lock I'm actually interested in is the one that protects the data that > the > server thread needs to fill in the packet. We need that even if we aren't > messing with the no-GC flag. > > There is another problem with your proposal. You skipped step 3.5, do the > authentication. We need several CPUs working on that step in order to keep > up > with a gigabit link. Interesting. OK, then the critical-region goroutine needs to manage some subthreads of its own. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: GC timing
Hal Murray : > > > I don't know all those numbers yet. But: given that NTPsec only currently > > has 2 threads and our allocations are typically occuring one second apart or > > less per upstream or downstream, I can't even plausibly *imagine* a Raft > > implementation having lower memory churn than we do. > > That doesn't make sense. Where does your "one second apart" come from? Why > is "currently has 2 threads" interesting? When do we poll at a less than one-secpmd interval? Most allocatopmns wo;l; ber associated with making a packet fra,e for he send, thn dealing with a response that comes bacl less than 100ms later. > One area to keep in mind is the MRU list. Good point, that is a source of allocations I hadn't thought of. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Interleaved Mode (Was: Re: Using Go for NTPsec)
Richard Laager via devel : > On 7/5/21 8:38 AM, Eric S. Raymond via devel wrote: > > > There is a close-to-RFC to handle this area. "Interleave" is the > > > buzzword. I > > > haven't studied it. The idea is to grab a transmit time stamp, then > > > tweak the > > > protocol a bit so you can send that on the next packet. > > > Daniel discovered it was broken and removed it from the protcol machine. > > Broken implementation or broken design? If the latter, is the current IETF > proposal (wherever that is) still broken? You'll have to ask Daniel that. I've dforgotten the details and never saw the IETF proposal you speak of. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Using Go for NTPsec
test suite, *then* I changed it to exploit concurrency. The positive takeaway is that this worked. I got the performance gains I was looking for, and used them to make lifting the history of GCC practical. Even so, I had enough trouble with the second phase to convince me that the port would have foundered if I had *combined* it with trying to rethink the architecture. I'm good, but I'm not *that* good - I'd be surprised if anybody is. This is why I have the same intention for NTPsec. "First make it work. Then make it right. Then make it fast." > [Split out stuff so we can write the time-critical parts in C or Rust.] > > > I want to stay away from mixing languages if at all possible. The joints > > between them are always *serious* defect attractors and major sources of > > maintainence complexity. > > I envision the APIs being text strings over stdin/stdout. I think they will > be simple enough so that the joints won't be a serious problem. Put it on > your list of options in case you decide you have to do something about GC. Noted. That does reduce my reluctance somewhat. > > Picking up new languages *is* one of my strong points, yet I found Rust > > rebarbative in the extreme. This did nothing to make me optimistic about > > finding developers to work in it. > > I think that is the root of our "discussion". Your version of good/clean > focuses on the language/environment. You are willing to (try to) dance > around > run time quirks. Mine focuses on the runtime. I'm willing to struggle with > the language/environment. Or at lease struggle some more until I learn > something critical. No. You think I didn't struggle with Go when I was doing the reposurgeon port? Before that I had written exactly one Go program, loccount. 2.1KLOC - near trivial. Reposurgeon is extremely algorithmically dense - porting it was *hard*, took me a year of work. I'm willing enough to struggle with the language/environment. Given that you describe it as "not one of your strong points", I'm probably more willing than you are. That would be unsurprising, as in my past experience I have been more willing to handle that kind of novelty than almost anybody around me. No, I'm pushing Rust away - and determined to exit from C - because of reasons in the larger context. We need to get to a memory-safe language, we need decadal stability, and we need one with a reasonably low barrier to entry for new devs. Rust fails two of those tests. Go passes all of them. If that means we need to do some acrobatics to deal with GC-induced latency spikes, that's a cost I'm willing to incur. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Using Go for NTPsec
Hal Murray : > You have a new toy. The only tool needed is a simple lock. Oh? What about the concurrent DNS thread we already have? At this point I have two years of heavy experience in Go, so the toy is no longer new. If a better upgrade from C existed, I would know about it - Rust comes closest but, alas, doesn't make the cut yet, though it might with five more years of maturity. > There are N server threads. They are each in a loop that does a recvfrom(), > fills in the reply, then a sendto(). > > Filling in the reply needs to access some data that is in globals in the > current code. That needs a simple lock. If the packet is authenticated, > accessing the key needs another lock. You're thinking in C. This is a problem, because mutex-and-mailbox architectures don't scale up well. Human brains get overwhelmed at about 4 mutexes; beyond that the proliferation of corner cases get to be more than the meat can handle. In obedience to "never write code that is as clever as you can imagine, because you have to be more clever than that to debug it", we'd really want to hold NTPsec down to 3 mutexes or fewer. You may think you can keep a fully concurrentized NTPsec down to 3 mutexes, but I doubt it. Actually already at 4 in the design you're talking about, counting DNS and logging mutexes. This is the universe's way of telling us we're going to need more tractable concurrency primitives going forward. Languages in which "more tractable" has actually been implemented in a production-quality toolchain that can meet our porting requirements are few and far between. In fact, as far as I can tell, that set is a singleton. > >> You can't put it on an ouput queue. Handing it to the kernel it is part > >> of > >> the time critical region. > > I meant output *channel*. With a goroutine spinning on the end of it. > > Either I wasn't clear or you were really focused on using your new toy. > > The case we are trying to protect from the GC is the transmit side. The > critical chunk of code starts with putting a time stamp into the packet, > optionally adds the authentication, then sends it. At that point, the packet > is gone. There is nothing to put on a queue or into a channel. Yes, we've apparently talking past each other. Let's try to fix that. In Go (or any other language that supports Communicating Sequential Processes and lightweight threads, which includes Rust), it's bad style to use mutexes. When you have a critical region, good style is to make a service thread that owns that region, spinning forever reading requests from a channel and shipping results to another channel. Channels are thread-safe queues; they provide serialization, replacing the role mutexes play in C. Each Go program that uses concurrency is a flock of channel-service threads flying in formation, tossing state around via channels. It is much easier to describe and enforce invariants for this kind of design than for a mutex-and-mailbox architecture, because each service thread is a serial state machine. The major remaining challenge is avoiding deadlocks. What you're telling me (and I recognize it is correct) is that one of our service threads has to have the sequence: 1. Wait on the inmput channel for a packet to send. 2. Turn off GC 3. Timestamp the packet. 4. Ship the packet 5. Turn on GC 6. Loop back to 1. Look, ma, no mutex! > > Do you know what our actual bottlenecks are? Have you [rofioled them? > > Have I used Eric's favorite profiling tool? Probably not. > > But I think I have a pretty good idea of where the CPU cycles are going. Who > do you think wrote all the timing hacks in the attic? OK, one of the most useful things you could do, then, is write a whitepaper describing where the bottlenecks are and why you think so. That knowledge needs to get out of your head to where other people can use it. > I have a multi-threaded echo server and enough other hacks to drive it and > collect data. It has a knob to inject spin between the recvfrom and sendto. > The idea is to simulate the NTP computations and/or crypto. Do you want > graphs? Yes, but it's not just me that neds them. Anyone working on this code needs to know where the bottlenecks are. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Using Go for NTPsec
Hal Murray : > > > We don't have a multithreaded server yet. Worst case we have two threads, > > and only one can ever reach the critical region in question. Don't borrow > > trouble! :-) > > I'm interested in building a server that will keep a gigabit link running at > full speed. We can do that with multiple threads. Right. But the way to get there is certainly *not* to try to design ahead while you're still thinking in a language like C where concurrent programming is difficult and error-prone. Once you get used to being able to program in Go's implementation of communicating sequential processes you'll gave much better tools for doing concurrency. > > If we go to using threads more heavily, the idiomaric Go way to handle this > > problem would be to have a queue that does only the code in window 1. When > > you write to the request queue, it would stop GC, do time-critical magic, > > restart GC, and ship the result on the output queue. > > That's adding a bottleneck that we need multiple threads to avoid. Do you know what our actual bottlenecks are? Have you [rofioled them? > You can't put it on an ouput queue. Handing it to the kernel it is part of > the time critical region. I meant output *channel*. With a goroutine spinning on the end of it. > > I'm not worries about a DDoS. I don't think the protocol machine gives a > > hostile client or server any way to force hitting that window. > > I'll work up some numbers if you think that will be interesting. (It will > take a while. I have to put a system back together.) Any possibility of a DDoS being feasible is interesting. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Using Go for NTPsec
Hal Murray : > >> 1. packet tx happening right after tx timestamp for server response > > > A) Mitigate window 1 by turning off GC before it and back on after. > > Things get complicated. Consider a multi threaded server. If you have > several busy server threads, can they keep the GC off 100% of the time? > (Condider a DDoS attack.) We don't have a multithreaded server yet. Worst case we have two threads, and only one can ever reach the critical region in question. Don't borrow trouble! :-) If we go to using threads more heavily, the idiomaric Go way to handle this problem would be to have a queue that does only the code in window 1. When you write to the request queue, it would stop GC, do time-critical magic, restart GC, and ship the result on the output queue. I'm not worries about a DDoS. I don't think the protocol machine gives a hostile client or server any way to force hitting that window. > If you are going to claim the GC doesn't take long and doesn't happen often, > maybe you should just put a big comment in the code and not do anything. > Better would be a design note collecting all the info. I've said several times that I think the most likely outcome is that no special action is needed! But you've asked me to justify the position that Go GC would not be not a performance problem, which is a *completely reasonable* thing to ask; I made the same demand of myself when I started thinking about a Go port. Thus the detaled discussion of mitigation strategy. > There is a close-to-RFC to handle this area. "Interleave" is the buzzword. > I > haven't studied it. The idea is to grab a transmit time stamp, then tweak > the > protocol a bit so you can send that on the next packet. Daniel discovered it was broken and removed it from the protcol machine. > >> 2. serial NMEA data timestamps > > > B) Mitigate window 2 by taking timestamps before and after sample read, > > asking the Go runtime if a GC has occurred in that interval, and throwing > > out > > the sample if it has. This tactic might slow convergence times minutely but > > should not affect overall sync accuracy. > > There isn't a convenient "before". You want to do something like readline, > and that's going to wait a while. You don't care if it does a GC while you > are waiting. So you will have to do something like read the first character, > grab the before stamp, read the second character, grab the after time stamp, > grab the rest of the line. Then work out the begin/end of line timing by > counting characters and doing the arithmetic with the baud rate. > > But is that worth the effort? Are there any serial devices with timing good > enough to notice? Good question. Certainly a Macx-1 gets nowhere near that granularity, not with .0125s of inherent jitter due to USB poll interval; that swamps 600usec by more than two orders of magnitude. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Using Go for NTPsec
Dan Drown via devel : > Let's talk a bit about what time critical sections are currently in the > code. I think that will help drive the decision about the impact of garbage > collection. > > I haven't looked at ntpsec's codebase lately, so some of this might be out > of date. Please feel free to correct any mistakes or omissions. > > Time critical code: > > 1. packet tx happening right after tx timestamp for server response > 2. serial NMEA data timestamps > > Non time critical code: > > 1. packet rx timestamp (assumption: SO_TIMESTAMPNS or alike is being used) > 2. packet tx happening right after tx timestamp for client request > (assumption: SO_TIMESTAMPNS or alike is being used) > 3. receiving SHM data > 4. receiving PPS data > 5. calculating/updating local clock offset/frequency > > The time critical code can tolerate some level of delay (~hundreds of > microseconds), as things like packet tx can be delayed for a multitude of > kernel and hardware reasons. The good news is both of the time critical > code paths are somewhat predictable and if we can manually schedule GC, we > can avoid scheduling it during those times. > > The non timing critical code can be delayed tens of milliseconds without an > impact to timing quality. This matches my analysis almost exactly. My current plan is: A) Mitigate window 1 by turning off GC before it and back on after. B) Mitigate window 2 by taking timestamps before and after sample read, asking the Go runtime if a GC has occurred in that interval, and throwing out the sample if it has. This tactic might slow convergence times minutely but should not affect overall sync accuracy. In all other circiumnstances, treat GC-induced latency spikes as though they're just another kind of network weather. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Using Go for NTPsec
it to new language* is an invitation to disaster. The only strategy that works is to do a stupid, literal, unidiomatic port first, verify it, then clean it up and make it idiomatic. This keans that changes oof the kind you're proposing need to be on hold until we have a more or less literal translation of the present C code working. > I know how to split out the server side of ntpd. > > Suppose we come up with an API for refclocks. Would that, or something > similar also work for network servers? > > I think the timing critical code would be small enough that we could write it > in C and inspect it carefully. That may not be valid if we include the > crypto > stuff. > > Converting that that sort of code to Rust seems reasonable. I want to stay away from mixing languages if at all possible. The joints between them are always *serious* defect attractors and major sources of maintainence complexity. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Using Go for NTPsec
Achim Gratz via devel : > Eric S. Raymond via devel writes: > > Talk to me about what you think the effect of very occasional > > stop-the-world pauses of 600 microseconds or less would be on sync > > accuracy. By "very occasionally" let's say once every ten minutes or > > so, that being what I think is a *very* pessimistic estimate of GC > > frequency for a program with NTP's memory-usage pattern. > > It's hard to say what that will look like without having an actual > statistics. The trouble with stalls is that they introduce bias since > they alway shift in the same direction. Once every ten minutes would > likely not make much of a difference for most systems even if you could > not filter these events out. That was my estimation. It ceretainly is *not* the case that occasuinal 600-microsecond stalls would cause a 600-microsecond degradation in typical accuracy; rather, that would be a highly unlikely upper bound on the loss. > > What I want to understand - and have others understand - is whether > > pauses of that size and frequency would mess with sync accuracy > > enough that heroic measures are required to avoid them. What kind > > of distortion would they introduce in comparison with other > > components of the error budget? > > NTP already filters out values that fall out of the ordinary statistic. Good point. It's quite possible that samples distorted by stop-the-world pauses would usually be thrown out by normal filtering! Ironically this becomes less likely as GC times drop. > For the control loop I tend to think that eventually it would make sense > to drop the assumption of a regular interval at which the control > interventions happen. One of the things that's likely to happen if we move to Go is that control-event scheduling becomes a forever-loop shipping wakeup notifications to a channel. Each channel read would then start a worker thread to do whatever the intervention requires. In C this would be brain-melting; in Go it's a few lines of simple code. Once we have this kind of organization, dispensing with the regularity assumption won't even be difficult. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Using Go for NTPsec
Achim Gratz via devel : > Matthew Selsky via devel writes: > > On Tue, Jun 29, 2021 at 04:41:30PM -0400, Eric S. Raymond via devel wrote: > > > >> Well, first, the historical target for accuracy of WAN time service is > >> more than an order of magnitude higher than 1ms. The worst-case jitter > >> that could add would be barely above the measurement-noise floor at worst, > >> and more probably below it. > > > > Our target is < 1 us, even for WAN time service. We would want to > > keep/improve this accuracy target. > > Assuming you really talk about accuracy of the time transfer (i.e. the > maximum time difference between any two systems) that is impossible > given the principle that NTP uses. That's what I thought, but I don't have your level of expertise about the error budget of NTP sync so I couldn't quantify it. Talk to me about what you think the effect of very occasional stop-the-world pauses of 600 microseconds or less would be on sync accuracy. By "very occasionally" let's say once every ten minutes or so, that being what I think is a *very* pessimistic estimate of GC frequency for a program with NTP's memory-usage pattern. What I want to understand - and have others understand - is whether pauses of that size and frequency would mess with sync accuracy enough that heroic measures are required to avoid them. What kind of distortion would they introduce in comparison with other components of the error budget? Mind you, heroic measures are available. The simplest would be to run with GC off by default and schedule times to perform a GC when it can reasonably be expected not to collide with the next polling action. But before I plan something like that I want to be sure it is actually necessary. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Using Go for NTPsec
Richard Laager via devel : > Not particularly. Presumably it's just because of GPS PPS + good network? Having a good local clock can explaiin it, yes. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Support for i386
Hal Murray via devel : > There is the 32bit time_t problem. We've got a few more years. I've been thinking forward about that. One of my objectives in the big cleanup was to make the time width easier to change, and now it would be internally. The remaining blocker is that the NTP packet format would need to be redesigned. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: ntpq, refclocks
Hal Murray : > Maybe ntpd should turn into a mega program that parses the config file and > runs a bunch of other programs and/or threads. That's an extremely natural way to partition in Go. Much, *much* easier than trying to pull off the equivalent in C would be. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Go vs GC
Hal Murray : > > Well, first, the historical target for accuracy of WAN time service is more > > than an order of magnitude higher than 1ms. > > Time marches on. We need to do better today, much better. > > NTP is used on LANs. Then we'll need to go to watching for GC pauses and skipping samples that might have been distorted by them. > > turning GC off > > Is that lightweight or heavyweight? > > How does that interact with threads? It's a fast operation, if that's what you mean. The way Go GC works requires that there is only one GC-enable flag, not one per thread. The flag tells the Go runtime whether or not to GC when the normal memory-usage threshold is reached. > What happens if there are lots of threads and they are all turning > it off/on very frequently and probably overlapping? That flag has to be protected by a mutex, and you have whatever value happened to be set last regardless of how many threads are running. If we think contention for that lock is going to be an issue, there's a pretty standard and simple way of dealing with it using an auxiliary semaphore. > I'm assuming the mainline server path won't require any allocations > or frees. Total CPU time to process a simple request is under 10 > microseconds. The main source of memory churn is going to be allocations for incoming packets, and deallocations when they're no longer referenced anf get GCed. Allocations are fast. GC is slow, but isn't performed very often. > Is there a subset of Go that doesn't use GC? Or someting like that. Not really. If you want to not use GC, you turn GC off. Then everything works as it normally does but your mnemory usage grows without bound until you re-enable GC, which could trigger an immediate GC sweep. I analyzed this years ago and discovered two kinds of code span where unexpected latency spikes could mess things up. One is right around where the adjtimex call or equivalent is done. That's a very narrow code section that's going to run in near constant time and not do any allocations; we can guard it just by turning GC off at the start of the span and on at the end so that any other threasd that *is* doing allocations cannot induce a latency spike during the critical section. The other is during sample collection from local refclocks. That's a little trickier because the read from device is a blocking operation that can and will do memory allocation. I think what we have to do in that case is take a timestamp before the read, then after it check to see if there was a GC between that timestamp and now, and if so discard the sample. Outside those places the code is not really stall-sensitive because all the data flying around has enough timestamping. With these mitigation measures I think performance can be expected to be C-like, except that one in a great while a GC stop will be detected to have occured during refclock sampling and cause a that sample to get tossed out. I say "once in a great while" because a program with ntpd's memory usage pattern is not going to trigger GCs very often. Most of the passes through critical regions won't collide with a GC latency spike. We can log these exceptions to check, of course. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: ntpq, refclocks
Hal Murray : > [context is ntpq via shared memory] > > > Any reason not to use Unix-domain sockets and just reuse the current protcol > > handling, except it's not accessible netwide? That might be simpler. > > I hadn't figured it out when I was typing in my previous message, but using > shared memory forces some build time checking that is missing from our > current > approach. > > Maybe we should split the discussion into two parts. One is how to get build > time checking. The second is how to connect ntpq and ntpd. I'm happy with > Unix-domain sockets for the latter. That would probably be simpler than building another shared-memory interface. > Note that the current ntpq doesn't have any build time checking and we don't > have any version numbers on the ntpq data. My straw may for cleaning this up > is a text file with a line per counter. It will need 2 preprocessors, one > for > ntpd/mode 6 and the other for ntpq. Is there a better way? (It's slightly > more complicated than that since there are several tables, for example one > per > server, and one per refclock.) JSON is often used for this sort of thing. It might vbe overkill here, but it's worth considering. > -- > > [splitting out refclocks] > > > How you handle configuration for separate refclockd and ntpnetd turns out to > > be a nasty tangle. Do they both replicate the entire config parser and > > parse > > the same config file, ignoroong the bits they don't need? Or do you split > > the config and suffer a flag day? > > I was assuming each refclock would be a separate program. It wouldn't need a > config file, just some command line stuff. That's a big jump. Backward config compatibility would go out the window. > Even if we don't split refclocks out into a separate program, we should run > them as a separate thread so we still need an API between a refclock and ntpd. Yes, I agree with that. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Using Go for NTPsec
Richard Laager via devel : > On 6/29/21 3:41 PM, Eric S. Raymond via devel wrote: > > Well, first, the historical target for accuracy of WAN time service is > > more than an order of magnitude higher than 1ms. > > My two NTP servers are +- 0.1 ms and +- 0.2 ms as measured by the NTP pool > monitoring system across the Internet. That's quite exceptionally good. It's normally hard to get to within an order of magnitude of that on a LAN, let alone a WAN. Do you have any theory of why your deviation is that low? -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Using Go for NTPsec
Matthew Selsky : > Our target is < 1 us, even for WAN time service. We would want to > keep/improve this accuracy target. One *microsecond*? Has any version of NTP achieved that kind of accuracy? I don't think we're there. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: ntpq, refclocks
Hal Murray via devel : > > Does anybody have any good ideas on a modern way to handle ntpq/mode 6? > > Background... > > We could split the server side out into a separate process. That leaves a > very tiny attack surface from the network. I think I could do that now > except > for mode 6. > > Does anybody have any good ideas on how to replace the current ntpq > functionality? > > Handwave. My straw man would be to put all the counters into shared memory. > Then a local-only version of ntpq seems reasonable. If SSH isn't good > enough, > we could add a TCP or TLS wrapper. This is probably achievable. But... Any reason not to use Unix-domain sockets and just reuse the current protcol handling, except it's not accessible netwide? That might be simpler. > I also want to be able to quickly add more counters. That gets into a > version > control tangle. Would it work to have 2 version numbers? (maybe call them > version and sub-version) One for the base/stable counters and another for > the > new/experimental ones? The idea is that the new ones get folded into the > base > at release time. I'd have no objection to that. > Similarly, it would be nice to split the refclocks out into a separate > process. I looked at that prospect pretty closely years ago. Here's why it dodn't happen: How you handle configuration for separate refclockd and ntpnetd turns out to be a nasty tangle. Do they both replicate the entire config parser and parse the same config file, ignoroong the bits they don't need? Or do you split the config and suffer a flag day? It's hard to know what the right thing is here. Going with a unitary parser preserves backward compability, but one of the project gols is to reduce attack surface to the dead minimum possible and we certainly would not be accomplishing *that* with a unitary parser. Another problem is that throwing out the obsolete drivers has drastically reduced the expected gain from splitting the daemon. When the drivers were a huge mass of code that dwarfed the networking part, splitting them out and adding a thin refclockd wrapper around them made obvious sense. Now they're only 24KLOC total and the wrapper around them would be a significant fraction of that as a LOC *increase*. I reluctantly concluded that the effort wasn't justified. I'm open to argument on that. > We need something better than the current shared memory approach. > Read only SHM would be OK for data, but we need a clean way to wake up the > receiving side so it can process the data promptly. I'd like to hear more about this. It sounds like a separate issue from the damon split. > More background... > > I'd like to move the current kernel mode PLL to user space. I think modern > CPUs are fast enough for that to make sense. I haven't done any > experimentation. Can't really respond to this as I don't understand the kernel PLL. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Using Go for NTPsec
Sanjeev Gupta via devel : > This is a follow on to Eric's email a few hours ago, I am keeping that > thread clean. > > (The last 3GL I programmed in was Fortran, and not the 77 version. I can > read bash scripts and C pseudo-code) > > The literature I can find speaks of Go GC being improved in 1.5, such that > the STW phase (the "sweep") is now less than 1ms. This is impressive, but > for NTP, this places a lower bound on our jitter. > > What am I missing? Well, first, the historical target for accuracy of WAN time service is more than an order of magnitude higher than 1ms. The worst-case jitter that could add would be barely above the measurement-noise floor at worst, and more probably below it. Second, Go 1.5 was a long time ago. STW pauses are much shorter now. The graph at https://github.com/lni/dragonboat indicates that even under heavy load STW in GO 1.12 never went above 600 microseconds and is usually somewhat below 400. We can expect this figure to decrease rathrr than increase in the future - reducing latency spikes is high on the Go development team's objectives. Third, most of the code isn't stall-sensitive at all. There are a couple of critical regions that need to be guarded, which I think we can accomplish by eirther (a) turning GC off before we enter them and turning it on again after, or (b) some Lampson-like tricks for tetecting when the interval was interrupted by GC and discarding any resulting sample. I don't think we'll ever need to go to that third level, but we can deal with it if we need to. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Work plan prpoosal for the next year
James Browning via devel : > I do not consider myself an expert C developer. You're good enough to contribute materially to *this* project, which is by no means the easiest C gode to grok. That makes you expert in my book. > I think the proposed schedule is overly serial. ntpkeygen and keygone > for example have no dependency on pylib/ IIRC. Also none of the other > ntpclients/ depend on ntpq. This would (in theory) pull CLIENTS up to > month 5 (late November/December) I was deliberately vague about the subproject dependencies and what can be done in parallel, because the point of this document is a scope-of-work estimate that might get us some money for things like a hardware test lab. Also it would be nice if Ian and I could pay our rent and grocery bills while we're doing the Go port. If this work plan looks like it'll fly I'll do a more detailed dependency chart. > I think the open 'issue' count would be lower if people actually > tended to close issues. Agreed. I occasionally do a triage pass to catch these. I guess we're due for one. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Work plan prpoosal for the next year
Developers, please weigh in on this so we can finalize it. The final version will become part of a grant proposal which may get us money for a hardware test lab and code bounties. = NTPsec work plan This is a rough-draft work plan for the NTPsec project over the period July 1st 2021 to July 1st 2022. == Major objective: Our major objective for this year will be to move the NTPsec codebase from C and Python to a single memory-safe language. === Rationale NTPsec is a security-focused project. As with other large, mature C programs, effectively all of its securty issues are consequences of the fact that C is memory-unsafe, it is very easy to accidentally write code with wild-pointer bugs that create exploitable vulnerabilities, and it is very difficult to detect such bugs. Historically the mitigation strategy for this problem has been a combination of tight code discipline with application of code analyzers designed to detect vulnerabilities. This approach is known to be leaky and inadequate, but has long been accepted for lack of a better alternative. There is now a better alternative: the Go language. Go is sufficiently like C and Python to make the code move feasible, but does pointer bounds checking, eliminating pointer-overrun bugs and thus preventing the creation of exploitable security bugs through these overruns. Go does not make the related problem of denial-of-service attacks through null-pointer errors outright impossible, but static type checking and Go's own validation tools will make suvch bugs much easier to prevent. It is expected that this code move would reduce NTPsec's vulnarability to exploits by a large factor, an order of magnitude or more. === Personnel The NTPsec technical lead (Eric Raymond) and his apprentice (Ian Bruene) are expert Go programmers. Other team members (notably Hal Murray, Gary Miller, James Browning, and Richard Laager) are expert C programmers who can be confidently expected to come up to speed in Go very rapidly. === Key performance indicators for this effort An entire port will not be achievable in 12 months. Finishing it is probably an 18-month to 2-year project for the personnel on hand. Nor, due to the Brooks's Law effect, can adding more people be expected to shorten the project. However, we can define milestones that should be achivable within a year and demonstrate the achievability of the entire effort. Milestone PYPACKET: Port and unit-test the NTP packet handling from the client code (pylib/packet.py and pylib/util.py). Estimate: 1 month. Milestone NTPQ: Port ntpq, the principal client, from Python to Go. Test interoperability with ntpd. Estimate: 3 months. Milestone CLIENTS: Port the remaining clients (ntpdig, ntpkeygen, ntpmon, ntpsweep, and ntpwait) from Python to Go. Estimate: 4 months. At completion of milestone CLIENTS (8 months out) we will have a working packet layer and client suite in Go that interoperates not just with ntpd but can be tested for conformance with other NTP implementations. Milestone CONFIG: Configuration parsing for ntpd. Build and test a workalike parser in Go for NTP configuration files. Estimate: 2 months. Milestone FAKED: Build a demonstration fake ntpd that does everything but the actual time-sync and clock driver code, collecting clock samples from upstream NTP servers. Estimate: 4 months. Milestone SYNC: Port the time-synchronization and clock setting code. Estimate: 3 monts. Milestone NTPSHM: This is the most important clock driver for production use. Estimate: 1 month. Milestone LEGACY: Port the legacy clock drivers to Go. This one is is big and messy and difficult to scope, as the driver code is old and crufty and difficult to test. It is probably not achievable in year one and may require budgeting for and building a hardware test lab. Tentative estimate: 5 months, with an unfortunately high probability of being blocked on the availability of test hardware. == Minor goals * Resolve all CVEs rapidly and completely * Reduce outstanding issue count from 38 to less than 20. * Improve unit-test coverage * Maintain a regular point-release schedule -- http://www.catb.org/~esr/";>Eric S. Raymond "Today, we need a nation of Minutemen, citizens who are not only prepared to take arms, but citizens who regard the preservation of freedom as the basic purpose of their daily life and who are willing to consciously work and sacrifice for that freedom."-- John F. Kennedy ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Objectives for the next year
MLewis : >Is it worthwhile improving the current C code to a 'hardened' programming >standard?� > >Example >- Joint Strike Fighter standards >[1]https://www.stroustrup.com/JSF-AV-rules.pdf >- NASA JPL standards > > [2]https://andrewbanks.com/wp-content/uploads/2019/07/JPL_Coding_Standard_C.pdf >- MISRA >[3]https://misra.org.uk/LinkClick.aspx?fileticket=vfArSqzP1d0%3d&tabid=57 > >What effort would be required for 'hardening'? > >If not a full 'hardening', is it worthwhile to use the >hardening/vulnerability/guideline-fail reporting tools to identify and fix >the worst vulnerabilities or to grab the low-hanging fruit? > >Anyone with experience with 'hardening' C code? (I don't) > >But first, what's the problem with the ntpsec C code. Is there an issue >with vulnerabilities in the current C code, uncertainty with possible >unknown vulnerabilities in the current code, or is the concern one of >introducing vulnerabilities in the future as the C code is maintained or >new functionality added? Or is the answer to that "yes". Is 'hardening' a >solution or just an improvement? I assume you're still vulnerable where >the hardening guidelines failed or weren't ideally followed? Is moving to >a new language the better solution? > >If moving to another language is inevitable, if that move is selected as a >goal for the next year, is 'hardening' the ntpsec C code still worthwhile? >- Could 'hardening' be done and in place before the move to another >language is complete. For what benefit. >- Or would the 'hardened' C code be replaced weeks later by code in a new >language. Or would new language code be in place in the same or similar >time (sooner?), if 'hardening' efforts were instead put on moving. >- If a full 'hardening' isn't worthwhile, is some 'hardening' effort >worthwhile. > >Regards, > >Michael > > References > >Visible links >1. https://www.stroustrup.com/JSF-AV-rules.pdf >2. > https://andrewbanks.com/wp-content/uploads/2019/07/JPL_Coding_Standard_C.pdf >3. https://misra.org.uk/LinkClick.aspx?fileticket=vfArSqzP1d0%3d&tabid=57 Here is my judgment: > Is there an issue >with vulnerabilities in the current C code, uncertainty with possible >unknown vulnerabilities in the current code, or is the concern one of >introducing vulnerabilities in the future as the C code is maintained or >new functionality added? All of the above. I *do* have experience with hardening C code - a substabtiual amount of it has been on this project, and there was GPSD before that. In truth, we've already done most of the best practices. More effort would have rapidly diminishing returns. But the real problem is at a deeper level. C is fundamentally unsafe; hardening - either as we've achieved it or as a hypothetical ideal - can only be an improvement, not a solution. I think our time would be better spent moving to a memory-safe language than trying to harden the C code further. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Objectives for the next year
James Browning via devel : > Are there any C to golang or rust transpilers that work > reasonably well? The last time I checked the best rust > transpiler generated rs files that were just shallow glosses > and the golang transpiler was somewhat inadequate and > verbose. This is still the state of play, alas. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Objectives for the next year
Hal Murray : > What was the name for your attempt to get a GPSD style replay of old data? > Did we ever figure out why that didn't work? I did. There's a blog post about it: https://blog.ntpsec.org/2017/02/22/testframe-the-epic-failure.html -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Objectives for the next year
Hal Murray : > > I'll start the ball rolling with this big one: It's time to move out of C. > > I want to threadify things, and taking advantage of that, I want to run at > full wire speed on a gigabit link with a modest server class CPU. > > I have test code running. I'm pretty sure it will work. But my test code is > in C. > > Are other people happy with threads? I am now. Working with Go for the last couple of years has nrought me up to speed on that kind of programming. > I think threadifying things will allow us to clean up the code a lot. I > think > that may be as important for writing safe/clean code as moving out of C. I think you are right. However, I'm pretty sure we should change languages before we threadify. Both Rust and (especially) Go have concurrency primitives that are *far* more tractable than their rough C equivalents. > > My choice for a language to move to would be Go. > > The only other choice I've seen seriously mentioned is Rust. How do they > compare for thread support and for CPU cycles? Do they have full access to > all the options for network I/O? (for example recv time stamps) > > My first reaction is that I don't want reference counted stuff involved with > network I/O. But maybe that is bogus. It's roughy 5 microseconds for a > recv/send pair in C. We should write some test code so we can get some real > timing numbers. There are two major issues with Rust: 1. We have at least two people who are expert Go programmers - Ian and myself. We have nobody, AFAIK, who is up to speed on Rust. Moving the code will be a large amount of work - I don't think any good purpose is siolved by adding "learn to be fluent in an entire new language" on top of that. 2. I don't think Rust is as yet stable enough for our purposes. The language and core libraries are still in some flux - we can't yet count on a feature we're relying on not disappearing over the next decade. This is a very srtark contrast with Go's ironclad forward-compatibility guarantee. For me, issue #2 is the real dealbreaker. As for your questions: Both languages have reasonable threading support, far better than C's. Rust would be more abstemious of processor cycles than Go is, but I dont believe the difference is significant for our deployment. Full access to all aptioons for betwork I/O: I believe pure Go can be beaten into this, but that corner of the API is very poorly documented so I don't know exactly how yet. We have the option of using cgo and writing small C extensions to gety at things the Go API doesan't suppport. I don't knpw of anuy direct support for things like recv time stamps in Rust. But Rust also has a facility to call C extensions. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Objectives for the next year
Developers, please weigh in on what you think the NTPSec project's goals for the next year ought to be. These goals can be coding projects ("Move the Python code to Go") process goals ("Halve the size of the issue list") or project infrastructure goals ("Build a hardware lab so we can live-test supported clocks.") Or anything else you can think of. These goals can be large or small. They can be old issues we haven't afddressed well enough. For purposes of the document I'm trying to put together, clear definition of a goal matters more than how anbitious it is. My only request is that you not argue priorities when you see other peoples' suggestions in the replies - not yet. The initial phase of this discussion should be brainstorming. At some point I'll summarize the suggestions that seem viable (clearly enough defined, possible to complete wityhin a year) and we can start rank orering. I'll start the ball rolling with this big one: It's time to move out of C. C is a terrible implementation language for any project that wants to bne secure and reliable, due to wild-pointer bugs. Our client code, being in Python, is memory-safe; the daemon code, the most crucial part, is not. Let's fix that. My choice for a language to move to would be Go. Possibly one of you can argue for a different choice, though if you agree that Go is a suitable target I would find that information interesting. -- >>esr>> ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: visiting the todo list
Hal Murray via devel : > I'm still planning to threadify the server side. I'm stalled half on other > things and half waiting for Eric to get free so we can cleanup the ntpq stuff. Still on my to-do list. I'm still focused on my conversioon job, though, making me actual money. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: What is the purpose of devel/ifdex-ignores?
Hal Murray : > > e...@thyrsus.com said: > > No, just boring history. I think those are old conditinal macros we no > > longer use; likely they have been renamed to something else. > > The current code tests for SO_TIMESTAMPNS > > Should I just delete the old/unused stuff? Yes. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: What is the purpose of devel/ifdex-ignores?
Hal Murray via devel : > >From ./devel/ifdex-ignores > USE_SCM_BINTIME # to grab timestamp for recv packet > USE_SCM_TIMESTAMP # " > USE_SCM_TIMESTAMPNS # " > > None of those symbols are used by our code. > > Should I just delete them? > > What is the idea for USE_xxx? Is there some interesting history I've > forgotten? No, just boring history. I think those are old conditinal macros we no longer use; likely they have been renamed to something else. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: What I have been doing 2021 post-January
James Browning via devel : > I submitted a couple of patches to gpsd and one to microjson resolving > issues. One where an empty string validated correctly as an object was > already posted to microjson. The other allowed pretty much any string the > same length or shorter to pass a t_check. > > Three of my merge requests made it into the tree. The first resolved a > couple of issues with readvar in ntpq. The second addressed some nits with > the docs and the third resolved ntpdig handling address resolving errors > poorly in Python3. > > I have also five yet unmerged requests submitted. > > 1203. I rewrote the part of the peers output generation and will add a new > mode with the refid, tally code, and peer name/address on the right side of > the graph. > > 1204. I partially address Hals mode 6 wishlist by adding new protocol > fields for the entirety of the current processes running. Also, I added a > new duration helper and dual column stats output. > > The following are the new requests. > > 1207. I changed the name of the is_vn_mode_acceptable function while > dropping NTPv1 support and requiring at least 12 octets (not 1). The tree > version of the function checked for specific modes which draft as published > NTPv1 did not have. > > 1208. I stripped out all handling of the netlink socket and fixed around > the breaks I found. This would reduce NTPsec w/ NTS and IPv4/6 to 5 > sockets. They are UDP4, UPD6, TCP4, TCP6, and netlink which only spuriously > trigger DNS retries. > > I also have a branch[1] that also sweeps away the asynchronous update > updaters and the netlink socket. It is not part of 1208. > > 1213. I tackled another bit of untamed in ntp_control. I took three > *_varlist blocks and reshaped them into a trio of wrapper calls which call > another new function. I reworked many ctl_put* functions to use a > higher-level function call saving a few lines each. Also, new macros were > added and used saving a few lines per invocation. > > I intend to merge 1207 and 1213 Tuesday. Also 1207 and 1213 the following > Saturday. > > Are there any obvious (or not so) reasons why I should not go ahead? > > [1] https://gitlab.com/jamesb_fe80/ntpsec/-/tree/21A31-twinsock No objection from here. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: discrete units
Hal Murray via devel : > > Gary said: > > I think he is referring to reecent proposals to split ntpd up into multiple > > daemons. Daemons for the core, NTS, clients, etc. Each doing a small job. > > Rather than the one big daemon we have now. > > That sort of split looks good on paper, but I'm not sure how well it would > work out in practice. I share Hal's skepticism. Back at the beginning of this project I had a plan to split ntpd in two. Isolate all the network plumbing in one half, all the clock management in another, have them communicate over some kind of local channel. Why I eventually gave it up is instructive. Any time you split up a progream like ntpd you add to the total volume of code, if only because you now need two network wrapper layers around the core logic rather than just one. This can be worth doing, but only if the payoff in reduction of global compleity due to decoupling the core pieces is high enough to pay for the extra code. You cannot take for granted that this will be so, and I eventually concluded that it was not - we were collecting much bigger simplifications simply by removing obsolete drivers. Another headache is service configuration. Woul;d the two pieces have shared the same config file or not? Complications would have arisen either way. If you split the file it's a flag day for all users (no small matter when the uservase is as conservative and risk-averse as ntpd's). OTOH, if you don't split the file you lose some of the simplification you might have collected, as both pices have to carry the same parser and generate errors when ry're fed a piece of configuration that's not theirs to handle. Thus, weaing my architect hat, I'd have to see a very strong and specific case for splitting up ntpd before I'd agree. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: GitLab | Projects forced to "Private" (#294196)
James Browning via devel : > The ntpsec forks belonging to rlaager, selsky, and ianbreune are still > detached. A quick check shows that there are no forks. The page I looked at > claimed that such detached repositories cannot be reattached. TLDR there is > only on MR that still might be mergeable. Annoying, but recoverable. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: GitLab | Projects forced to "Private" (#294196)
Sanjeev Gupta : > As of 20 minutes ago, I can now pull from the repository unauthenticated. Yes, and the visibility is now :Public" in the settings. Looks like the problem is solved. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Blizard of mail from GitLab-Abuse-Automation
Sanjeev Gupta via devel : > Ah, so not my fault. > > I tried updating my fork about 11 hours ago, and was to authenticate to > pull from the NTPsec git repo. I tried with another repo, it worked, so I > assumed one of us was modifying the security settings of the repo. Somwething either very specific o very random is going on. All of the dozen or so of my personal projects Ive had time to check are fine - not taken private and it looks like the config button for public/private would still work. Mark Atwood has been briefed. I think he knows a phone number at Gif:ab. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Unity warnings
Hal Murray : > Please do and/or please fix our local copy. I'm focused on the > restrict/unrestrict tangle. Bug fixed, but I cant finf any way to subnutt uissues on ther bugracker. Yes, I have a validayed account ob the site. Sill looking. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Unity warnings
Hal Murray via devel : > > I assume a fix for this should be pushed upstream. > > ../../tests/unity/unity.c: In function âUnityFailâ: > ../../tests/unity/unity.c:1757:6: warning: function might be candidate for > attribute ânoreturnâ [-Wsuggest-attribute=noreturn] > 1757 | void UnityFail(const char* msg, const UNITY_LINE_TYPE line) > | ^ > ../../tests/unity/unity.c: In function âUnityIgnoreâ: > ../../tests/unity/unity.c:1794:6: warning: function might be candidate for > attribute ânoreturnâ [-Wsuggest-attribute=noreturn] > 1794 | void UnityIgnore(const char* msg, const UNITY_LINE_TYPE line) > | ^~~ > Yes. I'll do it if yiu have not alreadsy idebtified upsrream. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: NEWS for release
Hal Murray : > > > Looking at the changelog, the only one that jumps out at me is James's new > > UI > > options for ntpq. > > Should we have a policy of mentioning all UI changes? I think so. I've been trying to do that whenever I write NEWS entries. > How about config file changes? Well, sure. Not that I'm expecting any in the foreseeable future > > After we ship I''ll get off my butt about the recvbuff cleanup. > > What do you have in mind? > > I think I have fixed all the get/free stupidities that used to be in the main > line server code so it isn't wasting CPU cycles. There is still a free list. > > I forget how it is used. I think some/most refclocks use it, maybe only the > ones that read serial data. It would be good to totally nuke the free list. Totally nuking the free list was what I haf in mind. > We should make a pass through devel/TODO.adoc and devel/TODO-NTS Agreed, -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: I'm giving up on seccomp
Hal Murray : > Is this an opportunity to clean up that area? I don't think so. It's pretty clean now, functionally speaking - I put a lot of work into rationalizing the configurator structures and the way they talk to the protocol machine. I suppose it might be if we were willing to make cimparibility-breaking changes to the configuration grammar, but I have not been presented with any good reason to do that. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: I'm giving up on seccomp
Hal Murray : > > >> If you want to claim your Go program has no buffer overruns, > >> you can't call out to big complicated libraries written in c. You > >> would have to rewrite them in Go. > > > Fair point. That changes my to-do list. > > Could you please say more? What did you add or drop? Means seccomp in Go has to be on it. Though not at high priority. The current blocker on the port is that Flex and Bison can't yet emit Go code. I'm probably going to have to fix that myself. There are roughly equivalent tools such as goyacc that are fine for new projects, but guaranteeing that you get the same accept grmmar from specifications in two different generators is so difficult that I'd rather retarget Flex and Bison than deal with that problem. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: NEWS for release
Hal Murray via devel : > Are any of the recent changes interesting enough to mention in NEWS? Looking at the changelog, the only one that jumps out at me is James's new UI options for ntpq. Otherwise the recent stuff is bug triage and validator cleanups (LGTM looks like a big win). I wouldn't want to try doing anything deeper this close to a release. None of it's user-visible. After we ship I''ll get off my butt about the recvbuff cleanup. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Python support policy
Gary E. Miller via devel : > As previously mentioned here: RHEL 6. Which is about to EOL. > Supporting Python 2 is trivial. Why the hate? Because it's not in fact trivial. It's *doable*; Peter Donis and I are the expers on how to do it. But it's not trivial. It proliferates code and test paths, and therefore increases attack surface. Therefore we should drop Python 2 support as soon as the benefit of keeping it drops below the cost. I think that happens the moment python3 becomes a reliable thing to put in shebangs on all our primary platforms. There's a strong case this has already occurred, and that case will be closed when RHEL 6 goes EOL. -- http://www.catb.org/~esr/";>Eric S. Raymond signature.asc Description: PGP signature ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Python support policy
Gary E. Miller via devel : > Yo Eric! > > On Thu, 3 Sep 2020 14:13:05 -0400 > "Eric S. Raymond" wrote: > > > Gary E. Miller via devel : > > > Say what? This has zero to do with libraries. > > > > Sure it dies. Use different versions of Python, require different > > library paths. > > No more than any other versioned programs use their versioned backend > libraries. But that is all a long solved problem. I never, ever > heard of anyone having problems with Python internal libs, except > cross-compiling. > > Or do you mean PYTHONPATHS? Which is a differnt thing. That's what I meant, yes. -- http://www.catb.org/~esr/";>Eric S. Raymond signature.asc Description: PGP signature ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Python support policy
Gary E. Miller via devel : > Say what? This has zero to do with libraries. Sure it dies. Use different versions of Python, require different library paths. -- http://www.catb.org/~esr/";>Eric S. Raymond signature.asc Description: PGP signature ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Python support policy
Gary E. Miller via devel : > > I don't want to go down this road. I have ugly memories associated > > with a smiliar hack in gpsd, long ago. > > But what about how it works now? All the maintainers like it. Oh dear Goddess. We are *still* mutating shebangs in GPSD? I must have blotted that from my memory. One reason to stop doing this kind of thing is library-path confusion. Yes, I know we have a lash-up that barely works around that in GPSD but our Debian package maintainer damn near has an aneurism any time he thinks about it and I can't blame him a bit. It will come time to end Python 2 support in GPSD soon. -- http://www.catb.org/~esr/";>Eric S. Raymond signature.asc Description: PGP signature ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Python support policy
Richard Laager via devel : > > You realize that the POSIX doc says a pace after "#!", but so many do it > > wrong they accept that variant. > In NTPsec, there are 122 the wrong way and 81 the right way. As you say, > either works. I don't particularly care about the space personally, but > we can use the space if you want to be pedantically correct. I just fixed all of those. This will probably never matter, but part of our project style is supposed to be standards conformance so tight it squeaks. -- http://www.catb.org/~esr/";>Eric S. Raymond signature.asc Description: PGP signature ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Python support policy
Richard Laager via devel : > RHEL 6 support (measured in terms of security updates) ends in November > of this year. So by the time a version of NTPsec releases without Python > 2 support, we'd be looking at RHEL 7. On top of that, it has been Red Hat's official position for some time that RHEL 6 shout *not* block transition to Python 3 only. There is some easy, officially endorsed workaround that I don't remember. Unfortunaly I can't find RH's statement about this; I'd recognize it if I saw it but web searches aren't turning it up. There may be other reasons to keep Python 2 support, but as Richard says RHEL 6 will stop being one of them before our next point release after this one. This is not a judgment I am making casually. Peter Donis and I wrote this: http://www.catb.org/esr/faqs/practical-python-porting/ It's still the best guide on how to write Python that runs under both 2 and 3. Peter and I have been tracking the transition very closely and one of the questions I'm keeping in my mind is when I amend that document to say "there is no longer any point in this for new code". That moment is nearly upon us, and I'm pretty certain it will arrive before 2020 ends. -- http://www.catb.org/~esr/";>Eric S. Raymond signature.asc Description: PGP signature ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Pre-release cleanup
Matthew Selsky : > I also previously setup Codacy in order to see what other SAST systems could > do. See https://app.codacy.com/gl/NTPsec/ntpsec/dashboard > > Let me know what you think. If either are useful, I'll integrate them more > tightly in our workflows. I'm already a fan of LGTM - it picks up Python issues none of the rest of our validators do and seems at near parity on the C stuff. I'll take a look at Codasy. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: I'm giving up on seccomp
Hal Murray : > > e...@thyrsus.com said: > >> I think you have jumped to an unreasonable conclusion by assuming that Go > >> makes seccomp unintestering. Are you going to rewrite OpenSSL in Go? > > No. There's an opennsl binding: ... > > That's the whole point of my comment. OpenSSL is written in c. If there is > a > typical buffer overrun bug in OpenSSL, seccomp would be as helpful for a Go > version of ntpd as it is for the current version. > > If you want to claim your Go program has no buffer overruns, you can't call > out to big complicated libraries written in c. You would have to rewrite > them > in Go. Fair point. That changes my to-do list. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Python support policy
Richard Laager via devel : > Let me start over now that I've reviewed the specifics of the NTPsec > scripts and build system again: > > We are currently using "#!/usr/bin/env python" in all the scripts, and > waf uses the same. The minimum to do to drop Python 2 is: > > 1. Change waf's shebang: > -#!/usr/bin/env python > +#!/usr/bin/env python3 > >That way, on systems where python is Python 2, waf will get Python 3. >While there is a lot of baggage around whether "python" is Python 2, >Python 3, or neither (i.e. doesn't exist), python3 existing is pretty >universal. [Discussion 1] > > 2. Change the other scripts the same way. > > 3. Update the docs and CI accordingly. I'm in favor of this simple, brute-force approach. If nobody comes up with a good argument for retaining Python 2 support, I will ask Mark to include in the release notes that this is our *last* release with Python 2 support. Then we'll rip out the Python 2 code paths before the next one. > An additional improvement (which we could do separately from the Python > 2 vs Python 3 discussion) would be to allow the user to customize the > shebang with a build flag. It turns out we already have `./waf configure > --python` which defaults to sys.executable. This is stock waf behavior. > We are already running the Python scripts through subst. We just aren't > doing the last piece of using @PYTHON@ as the shebang. So we could either: > > A. Change the Python shebangs to: >#!@PYTHON@ > I just tested that on one and it works as expected. > B. Leave them as "/usr/bin/env python3" and write a custom subst > function to replace that (and do the usual subst). > > Option A is trivial. I could have that patch done in 10 minutes. > > Option B is a little bit more work, but keeps the scripts directly > executable from the source tree, which could be helpful for development. > (The other substitutions aren't typically critical, as they are things > like @NTPSEC_VERSION_EXTENDED@.) Is this something people care about? I don't want to go down this road. I have ugly memories associated with a smiliar hack in gpsd, long ago. >python3 exists on Debian and derivatives as well as RedHat and >derivatives. Ubuntu 20.04 optionally allows python to point to >python3, but always still has python3. I use Debian, Ubuntu, and >RedHat, so I have personal knowledge here. Yes. To the best of my knowledge every current Unix-like thing does right with python3 in the shebang line. That makes hacking those shebangs as unnecessary as it would be hazardous. -- http://www.catb.org/~esr/";>Eric S. Raymond signature.asc Description: PGP signature ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: I'm giving up on seccomp
Gary E. Miller via devel : > Buffer overruns are just one way a program might make unexpected system > calls. Even if you can guarantee that a Go program could never be > maliciously corrupted externally, you can never guarantee that the > Go program can not be trojaned. Everything is cost gradients. Yes, a Go program could be Trojaned, but (a) that is far less likely than a buffer overrun is in C, and (b) there are reasonably efficient auditing methods to detect Trojanning, good enough that even static analyzers lilke Coverity and LGTM can usually catch them by looking for shellouts. Syscall blocking is not really the best-fit tool for defense against this kind of attack. Daniel knows more about this sort of thing than I do and might correct me, but it's my impression that syscall blocking is *specifically* a best-fit defence against object-code weird machines prpoduced by buffer-overrun and stack-corruption attacks, and its utility drops off sharply for other kinds of attacks that are better foiked in different ways. -- http://www.catb.org/~esr/";>Eric S. Raymond signature.asc Description: PGP signature ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: I'm giving up on seccomp
Hal Murray : > > e...@thyrsus.com said: > > I think you misunderstand. I don't believe seccomp is objectively very > > important in itself, and never have. My problem with dropping it is that if > > we do that, we could be seen to have abandoned part of a security defense in > > depth because it's too much work. That's not a good look for a project with > > our mission statememt. > > "too much work" in an interesting phrase. How does that compare with "not an > efficient use of our resources"? Sometimes you have make efforts in the direction of being seen to do the right thing that can't be considered strictly efficient. Support for the BSD Unixes is in this category, too. > I didn't mean to suggest that we should drop it totally, just that I was > giving up on tightening things down such that we only allowed the syscalls > needed by a particular distro/version/hardware/??? Oh. I'm fine with that path. I thought you wanted to heave seccomp overboard in its entirety. > I think you have jumped to an unreasonable conclusion by assuming that Go > makes seccomp unintestering. Are you going to rewrite OpenSSL in Go? No. There's an opennsl binding: https://godoc.org/github.com/spacemonkeygo/openssl > Even without that, are you sure there are no bugs in Go? No, I'm not. But neither do I think seccomp is actually *possible* in Go at this point, which tends to relieve us of having to support it. > Maybe we should think harder about splitting NTS-KE out from ntpd. I lean against that - it seems like that split would make deploytment and configuration more complicated. But I could be persuaded. > My comment about early-droproot wasn't clear. There will be a few more > syscalls needed by the code between early and normal droproot. Since we > aren't processing packets during initialization there is low risk of bad guys > getting in. But if they do get in post-initialization, they have a few more > syscalls they can use. Got it. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: I'm giving up on seccomp
Gary E. Miller via devel : > Lost me. seccomp applies to Go as much as it applies to C. Why do you think so? My understanding is that the reason you want to block unexpected system calls is becase C buffer overruns can be used to make weird machines. You can't do that in Go, because there's no pointer arithmetic and array accesses are all bounds-checked. Thus the utility of blocking unexpected system calls pretty much vanishes. Is there something wrong with this reasoning? -- http://www.catb.org/~esr/";>Eric S. Raymond signature.asc Description: PGP signature ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Pre-release cleanup
Sanjeev Gupta : > They support *any* git repository. Huh. Their docs are out of date, then. > Please see: https://lgtm.com/projects/g/ntpsec/ntpsec/?mode=list That's excellent. I'll chew through some of these today. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Pre-release cleanup
folkert : > > > > I've resolved all the Coverity warbings except a new one, "risky > > > > function" related to random(3)/ The supression cookie for that one > > > > is not suppressing; I've sent a requesr to Synopsis about tis. > > > > > > Is ntpsec also checked by 'lgtm.com'? They also do all kinds of > > > verifications on source-code. > > > > First I've heaerd of it. Can you point me at a tutorial on how to use it? > > I only found it recently as well :-) > > They interface with github and bitbucket. > > https://lgtm.com/help/lgtm/getting-started Unfortunately, thast's a blocker. We're on gitlab. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Pre-release cleanup
folkert : > > I've resolved all the Coverity warbings except a new one, "risky > > function" related to random(3)/ The supression cookie for that one > > is not suppressing; I've sent a requesr to Synopsis about tis. > > Is ntpsec also checked by 'lgtm.com'? They also do all kinds of > verifications on source-code. First I've heaerd of it. Can you point me at a tutorial on how to use it? -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Python support policy
Python 2 was end-of-lifed at the beginning of January this year. All our primary target platforms fully support Python 3. Retaining support for Python 2 proliferates test paths and complicates the fix for at least one outstanding bug. Philosophically, I'm a fan of dropping legacy support when it advances our security mission by decreasing attack surface and improving auditability/maintainability. Proposed: We should drop support for Python 2 and use a python3 shebang in all our scripts. Discuss. -- http://www.catb.org/~esr/";>Eric S. Raymond "Say what you like about my bloody murderous government," I says, "but don't insult me poor bleedin' country."-- Edward Abbey ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Pre-release cleanup
I've resolved all the Coverity warbings except a new one, "risky function" related to random(3)/ The supression cookie for that one is not suppressing; I've sent a requesr to Synopsis about tis. I've close two stale issues. I think there are a few more that can be retired. I'm continuing tio work on this. As usual, I don't see anything really serious on the issue list. Good work, everyone! -- http://www.catb.org/~esr/";>Eric S. Raymond If a thousand men were not to pay their tax-bills this year, that would ... [be] the definition of a peaceable revolution, if any such is possible. -- Henry David Thoreau ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: I'm giving up on seccomp
Hal Murray : > You keep saying seccomp is important. What does it buy us? ntpd is a big > complicated program. It reads and writes files. It opens network > connections. What else would a bad guy need to do? I think you misunderstand. I don't believe seccomp is objectively very important in itself, and never have. My problem with dropping it is that if we do that, we could be seen to have abandoned part of a security defense in depth because it's too much work. That's not a good look for a project with our mission statememt. On the other hand, I can't blame you a bit for being tired of this rathole, because it is indeed a huge pain in the ass for marginal gain. I don't think any of your analysis is even a little wrong about that. The solution is simple and obvious, if really annoying for me. You should assign seccomp-related bugs to me and I will deal with them. Think of this as incentive for me to get serious about moving the daemon to Go :-). (In Go, the equivalent of seccomp is neither possible nor necessary. What makes it unnecessary is that while you can crash a Go program with a bad reference, you can't weird-machine it. (Actually technically you can, but it takes a specific evasion of the language's safety rules thriugn the unsafe module)). > [Is early-droproot a bug?] At the current state of my knowledge I don't think so. But you just put auditing the stretch of code between the two droproots high on my priority list. It's an open question whether early droproot is worth its complexity cost. This is not a case like seccomp where the set of exploits closed off is effectively unbounded - in this case the security cost of proliferating code and test paths may not be worth the earlier privilege drop. > We use 2 large libraries that do lots of syscalls: libc and libssl. Most of > libc is a thin wrapper over the obvious. Sometimes it is a little less thin > when translating an old version into a newer syscall. > > The complicated part of libc that we use is DNS lookups. That's a pain to > debug. DNS disables most signals so you can't bail out without letting it > cleanup. So it crashes rather than letting our trap handler print an error > message. > > I have work-in-progress that lets you setup a list of syscalls actually used > by your environment. It does roughly the following: > Splits the list of syscalls to be allowed out to a separate file that gets > #includ-ed. > A handful of scripts that process strace output to make lists of needed > calls. > It needs some waf work to specify the filename. I'm just patching a link > to > point to the right file. > > To collect the data, you have to run ntpd under strace. While it is running, > you have to tickle all the uncommon code paths. Things like switching to a > new ntpd.log after log rotate or reloading the cert file after it has been > updated. I have a list. I don't know how complete it is. > > One of the scripts includes a few syscalls that are hard to tickle. That > would need double checking. > > Basically, making a file is enough of a pain that I don't think it's > practical. You have to be a semi-wizard in order to run the recipe. A new > libc or libssl may break things. > > If somebody else wants to pick up this work I'll be glad to hand over what I > have. Otherwise, I'll drop it. (It's not hard to recreate from scratch if > you understand the above description.) I'm going to say drop it, and here's why. We've already seen the frequency of seccomp bugs drop over time, and that's to be expected. There should be fewer in the fture than there have been in the past, lowering the value of building those specualized tools. Me, I'd rarher you spent your effort on devoising better test protocols as you have been doing. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: How about a release soon?
Hal Murray : > > I'll do a bug-triage pass. > > I've seen a couple of changes go by. Thanks. > > Please let me/us know when you are finished so I can test things. Will do. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: How about a release soon?
Hal Murray via devel : > > Has there been enough user visible improvement to warrent a release of > > 1.1.10? > > I think so. > > 1.1.9 doesn't know about the new port number for NTS-KE. > > There is also a bug fix for a missing error message. Without that message, > it's really tough to debug an obscure case of certificates not working. I'll do a bug-triage pass. Also, we need to decide if we want to switch to hosting the tarballs on GitLab itself using Ian's igor tool this release. If so, that will have a few consequencea on th website. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Has anybody seen a system without STA_NANO?
Daniel Franke : > clock_gettime is. Adjtimex isn't in any standard except for an obscure RFC > that nobody follows. > > On Tue, Aug 25, 2020, 20:47 Eric S. Raymond via devel > wrote: > > > Hal Murray via devel : > > > > > > When was clock_gettime and struct timespec introduced? > > > > > > We can cleanup some cruft if we assume it exists. > > > > Assune it. These are in the Single Unix Standard. Hal, were you asking about adjtimex? Because struct timespec isn't what adjtimex works with - it's associat4d wuj clock_gettime(). -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Has anybody seen a system without STA_NANO?
Hal Murray via devel : > > When was clock_gettime and struct timespec introduced? > > We can cleanup some cruft if we assume it exists. Assune it. These are in the Single Unix Standard. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Seccomp tangle
Hal Murray : > > e...@thyrsus.com said: > > Aaarrgghhh. It;s a huge pain in the ass and I wish it weren't interesting. > > But given our mission statememnnt, it has to be. > > Just to make sure we are on the same wavelength... > > My question/proposal was not to drop seccomp if we didn't do what I sketched > out. It was to allow a slightly tighter/cleaner list of syscalls if you were > willing to put in the work to collect the data. The old merger of all > syscalls ever seen on any system approach would still be the default if you > enabled seccomp and didn't specify your own list. Understood. Now I'm torn between devel/ and contrib/. Use your judgment. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Seccomp tangle
Hal Murray : > The first quirk is that ntpd isn't on the #include search path. > (My hack was to put a link from include to ntpd/seccomp) > What's the right way to handle this? (Maybe I just fatfingered things.) Generating the requred stuff into include, where all the headers live, ought to work. Unless I'm missing something. > Where should the scripts and directions live? devel/, I think. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Seccomp tangle
Hal Murray via devel : > > I've been experimenting with some code to allow custom scccomp lists. > > The idea is to replace the --enable-seccomp configure option with > --enable-seccomp=foo > and ntp_sandbox would include syscomp/foo.c which would be a list of syscalls > used by this system. > > I assume we would maintain a list for each OS/distro/version/hardware > combination that we are interested in. I have a few scripts that turn strace > output into a list. ... > > Is this interesting? If not, I'll drop it. > > If yes, I'll need some help to work out the details. Aaarrgghhh. It;s a huge pain in the ass and I wish it weren't interesting. But given our mission statememnnt, it has to be. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: BSD-4-Clause-UC license usage
Matthew Selsky via devel : > Hi Hal and team, > > Much our of NTS code uses BSD-4-Clause-UC instead of BSD-2-Clause (our > preferred license for new code). > > What this license selection intentional? No. It's a historical accident. > Is BSD-4-Clause-UC intended for code owned by the University of California, > or does it make sense for others to use this license as well? BSD-4-Clause-UC is the original version of the license. It was propagated by the BSD family of Unix variants and came into wide use because of that. Over time, the Board of Regents has gradually removed clauses and now recommends BSD-2-Clause. It's not considered controversial to move from BSD-4-Clause-UC to one of the more recent variants approved by the Board of Regents. Mark, as our external-relations specialist, should sign off on this, but I'm willing to do it. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: The Epic of MRUlist.
Ian Bruene via devel : > * What happens when a packet in the middle of the sequence is dropped? Who > knows! If it is seen as a timeout then the client will adjust packet size > and try again... forever. Or maybe it silently doesn't notice? > > * What happens when the final packet is dropped? Same as before, except that > never seeing the "now" field means that silent failure will result in an > infinite loop. I think. > > * If the server data changes enough while the request sequence is running > the system can just fail for no good reason because the error handling for > that doesn't exist. Arguably that is when things are going well; I can > imagine some subtle and wacky hijinks when dumb luck causes it to not fail > properly. One of these probably exoplains the mystery bug Hal has reoported on WiFi links. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: NTS dropping TLS 1.2
Hal Murray : > We can do several things: > 1) clean out the ifdefs that make things work with older versions of > OpenSSL. > That is drop support for systems that haven't upgraded their OpenSSL to a > supported version. > 2) leave things alone, ignore the RFC. > Or maybe add some nasty warning messages > How long? > 3) make a configure option to disable NTS so that NTPsec builds on older > OSes but doesn't support NTS. > > I propose option 1. Simple and clean. I don't think we will drop many > systems. I concur. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: droproot, seccomp
Hal Murray : > I don't think it's worth the effort to maintain 2 lists. We can revisit that > if you think it's appropriate. No, I agree with you. > There are 46 syscalls in each list and 55 in the merged list. Brings up a question. Is the list of all syscalls used by everybody large relative to any one distro+platform-specific list? Because if not, I could geet behand having *one* list and just whitelisting syscalls until we stop needing to. 46 to 55. If just 9 syscalls are the difference, the very slightly reduced assurance starts to look like a reasonable trade to make the whole problem go away. Which, mind you, I wouldn't say if I didn't think we had done a quite effective job of hardening the rest of the code. But I *do* think that - which makes this worth consideration. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: droproot, seccomp
James Browning via devel : > Is there anything preventing the possibility of an early looser > seccomp setup and then tightening it later possibly with a knob > to generate terse or verbose warnings instead of dying. That is a very interesting idea that I think deserves further examination. Do you have an implementation strategy in mind? -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: seccomp mess, continued, status update
Hal Murray via devel : > > Fedora fixed their problem. seccomp now builds and works on both Fedora and > Arch. > > But now it won't build on Alpine. It looks like the same problem that Fedora > had. The problem is a bug in a header file. Copying the ppoll bits from a > Fedora header file fixes the problem. > > The CI checker has an Alpine step with seccomp. It now fails. > > Can somebody please disable the seccomp option or step until Alpine fixes > things? Wouldn't it be simpler to ude a base image in the CI that isn't buggy? -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: seccomp tangle
Hal Murray via devel : > Should we drop secomp? It's a pain to maintain. We're a security-focused prodict. I don't think it would be good optics to drop a layer of defense just because it's a pain to maintain. > How many people use it? Richard: do you turn it on for the Debian builds? I have no idea hpw many people use it. > How does seccomp compare to a jail? Why don't we have a good web page on how > to setup and use a jail? Does systemd have a jail option? Does anybody run > in a jail? ... We don't have a good page on jails because I'm not experienced at setting them up and mostly other people don't imotiate documenting things. > Testing the version of the seccomp header file is probably cleaner than > testing for Arch. Agreed. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Is there a clean way for waf to test for the distro?
Hal Murray via devel : > > Context is the seccomp tangle. Issue #633 > > Should I just add a helper that looks in /etc/os-release? lsb_release -a might be useful here. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: New warning from NetBSD 9.0
Hal Murray via devel : > > NetBSD just released version 9.0. It now generates this warning: > > ../../ntpd/ntp_control.c:1476:34: warning: '%s' directive output may be > truncated writing up to 255 bytes into a region of size between 0 and 255 > [-Wformat-truncation=] > > char str[256]; > > snprintf(str, sizeof(str), "%s/%s", utsnamebuf.sysname, > utsnamebuf.release); > > Has anybody seen this before and/or know how to fix it? I've seen it before. Rarely. If your format string had only a single %s in it, you could fix it by adding a precision specifier to the string (not a length specifier, a *precision* specifier) which bounds the amount of bytes theb snprinf can write into the buffer. I guess you could give both %s-cookies a precision specifier of 128. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Possible cruft cleanup: clock_gettime vs getitimer
Hal Murray via devel : > devel/hacking.adoc says: > You *may* use clock_gettime(2) and clock_settime(2) calls, and > the related getitimer(2)/setitimer(2), from POSIX-1.2008. > > My man pages say both are in POSIX.1-2001. > > Is there any reason we don't pick one and discard the crufty ifdefs? > > No big deal, just another minor cleanup that makes the code slightly easier > to > read and maintain. I believe those #ifdefs are a port hack for a minor platform that's not completely standards-cimpliant, most likely some version og Mac OS/X. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: --enable-doc waf config option removed
Richard Laager via devel : > On 2/2/20 3:44 PM, Jason Azze via devel wrote: > > It looks like the --enable-doc waf configuration option was removed in the > > commit "Add support for other asciidoc processors". Was there any > > discussion about this change? > > Yes. See the mailing list archive and MR !1037. That MR conflated at least two changes that shouldb have been made separately. And I don't see any rationale for the questionable part, which is changing the configuration default. Thios is partly my fault. U've been concentrating pretty hard on the GCC conversionm for months and havem't exercised the oversight I should have. Well, that epic is over; I'm back. OK, I reached the relevant devs on #ntpsec. Will pursue there. -- http://www.catb.org/~esr/";>Eric S. Raymond signature.asc Description: PGP signature ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Python, testing
Hal Murray via devel : > A year or 2 ago, I put together a script to test as many build time options > as > I thought reasonable. It's in ./tests/option-tester.sh > > Does anybody other than me use it? I've run it once or twice, but's not easty to see how to integraste it into our regularr test process. > It's a bit of a CPU hog -- too much to run routinely. Can we set things up > to > run it on the gitlab OS collection weekly or manually when we get close to a > release? I have to defer to the CI expers on that one. It sounds like something that should be possible. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: ublox refclock
Gary E. Miller via devel : > > So there is nothing you recommend be merged at this time? > > I sorta wish NTPsec had a staging area like the Linux kernel does. > > There is value to a small and clean u-blox driver fully integrated > into NTPsec. But without KPPS it is inferior at timekeeping. > > If the guy that wrote it wanted to work on it under the NTPsec umbrella > that would be good. A little guidance and the guy that wrote that could > be very useful to NTPsec. > > Without that guy, or someone interested in being that guy. I'd pass on > that driver. I'm not personally interested in upgrading that software to > have the smarts that gpsd already does about the u-blox and KPPS. I left an issue on his tracker about merging to upstream. If he responds I will try to bring him into the fold. If he does not, from what you say there os not much loss hrere. -- http://www.catb.org/~esr/";>Eric S. Raymond signature.asc Description: PGP signature ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Do we require clock_gettime()?
Hal Murray : > > >From devel/hacking: > > Only POSIX-1.2001/SUSv3 library functions should be used (a few > specific exceptions are noted below). If a library > function not in that standard is required, then a wrapper function for > backward > compatibility must be provided. One notable case is clock_gettime() > which is used, when available, for increased accuracy, and has a > fallback implementation using native time calls. > > > > I haven't found any hints of a fallback mechanism in the current code. You are quite right. That documentation is stale; it predates the last removal of a non-POSIX time call. IIRC that was a flaky part of an old version of Mac OS X that we learned doesn't even work reliably - depends on which rev of that major version of the OS. You can remove it. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Clock fuzzing bugs
Mark Atwood via devel : > On Sun, Nov 24, 2019, at 19:32, Hal Murray via devel wrote: > > > > e...@thyrsus.com said: > > > If we don't see any evidence of beat-induced quantization, I'm willing to > > > say > > > we drop this code. > > > > How about adding a --disable-fuzz configure option so we can experiment > > without breaking the default case. > > > > Or maybe a runtime configure option. > > I like that idea. Can be done. Not even difficult. About three hours of work, I'd say. For reasons Mark knows, however, I'm booked up to my eyebrows until 16 Dec and Ian maybe longer than that. Anyone ekse willing to srep in? -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Clock fuzzing bugs
Gary E. Miller via devel : > Is there an existing patch to remove the fuzzing? There is not. See my reply to Mark. -- http://www.catb.org/~esr/";>Eric S. Raymond signature.asc Description: PGP signature ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: ublox refclock
Gary E. Miller via devel : > I just took a quick look at refclock_ubx.c > > An interesting start, but followup messsages today on the list are > assuming this driver does things that it does not do. > > 1) It does not, ever, config the u-blox. It does not, ever, write to > the u-blox to query it. > > Configuration is up to the user. > > 2) It decodes UBX-TIM-TM2 (Current time) and UBX-TIM-TIMELS (for the > leap second). Then does some limited sanity checking. > > It will fail to catch known u-blox time failure modes. > > 3) It does some interesting things with TIO that the comments claim > improves the time stability. But it does not use KPPS which would > just work better and simpler. > > Anything that uses KPPS will work much better. > > 4) It does not look at qErr, which combined with KPPS, might eventually, > theoretically, lead to better time. When CPU time quantization gets better. > > In summary, not an improvement on current u-blox best practice. Maybe, > eventually, an improvement, with some work (configuration, KPPS, etc.). So there is nothing you recommend be merged at this time? -- http://www.catb.org/~esr/";>Eric S. Raymond signature.asc Description: PGP signature ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: ublox refclock
Udo van den Heuvel : > On 24-11-2019 15:01, Eric S. Raymond wrote: > > Udo van den Heuvel : > >> I have an M8N on order, would that be compatible enough to this driver? > >> If so: I could help test etc. > > > > That can't hurt - they speak the same protocol - but the big deal with > > the T variant os a stationary mode you don't have. > > Ah, OK. > The M8N was cheap so perhaps a fake; I can try to identify this when it > comes in. Not necessarily a fake; the 8N is the normal variant (without stationary mode) and less expensive. I suspect it's the same hardware with a different firmware load - they price-discriminate because they think the customers for stationary mode will pay more. > As it appears I need a (real) M8T: > What M8T board, cabling etc would I need to buy to interface to RS232? > Would e.g. https://www.gnss.store/12-gnss-gps-modules be reputable enough? Alas, I don't know. In the past the 8T was expensive and difficult to find; I haven't worked with one. I guess you get to be forward scout on this one. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: ublox refclock
Udo van den Heuvel : > I have an M8N on order, would that be compatible enough to this driver? > If so: I could help test etc. That can't hurt - they speak the same protocol - but the big deal with the T variant os a stationary mode you don't have. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: Clock fuzzing bugs
Hal Murray via devel : > I'm tempted to rip out that stuff. I haven't quite convinced myself that it > isn't doing something important. Eric? The clock fuzzing? It's an interesting question. I've thought about it. I'm doubtful myself. The obvious motivation would be to avoid beat effects from the resolution of the system clock. You and Gary have a better feel for signal analysis and our error budget than I do, but for whatever my opinion is worth this seems to me like a mostly theoretical problem. We have pretty good visualization tools these days. Gary, you know best what normal perfornance looks like under ntpviz. Would you be willing to patch-disable fuzzing and see if that induces any suspicious-looking sawtooth patterns in the graphs? If we don't see any evidence of beat-induced quantization, I'm willing to say we drop this code. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: ublox refclock
Udo van den Heuvel via devel : > I cam across this ublox ntpsec refclock: > https://gitlab.com/trv-n/ntpsec-ublox > Would it be usable for incorporation in the ntpsec tree? > (AFAIK this is a 'straight' refclock; no extra lines needed besides > rx/tx and pps) Thank you very much for bringing this to our attention. The M8T is an interesting chip in a product line a couple of our principals like and use. Not only is it a strong candidate for inclusion as a supported driver, I'd be hard-put to think of a stronger one. I have filed an issue on its tracker titled "Work should be merged to upstream". In it I encourage the developer to introduce himself on this list so we can discuss what would be required to integrate. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: ntpq mrulist bug
Hal Murray : > Part of the problem is that there is a lot of cruft in that area. For > example, grep for CERR_ > There is a clump of signals defined as part of a ControlSession, none are > ever > raised, a few are caught. Looks like somebody decided to rename things to > SERR and never got around to finishing the cleanup. > > There is another case were stuff is returned a couple of layers, but then > never used used. I have unfortunate news for you. I think both those "features" were in the C version (I'm sure about the CERR/SERR duplication). The C was so grotty that I did not dare attempt anything but the most literal sort of translation of the lower layers. I got the distinct impression that the C was halfway through someone else's rewrite that never got finished. Looks to me like somebody was moving towards having a C client layer for Mode 6 that could be detached from the ntpq upper level. I completed that part, basically by cutting along the right dotted lines and dsturbing the ugly code as little as I could get away with. I did do some cleanup after the literal translation, but not in the parts I was afraid to touch (the packet-reassembly code in particular). You can be pretty sure I didn't introduce any complexity that wasn't there before. > [for, else] > > That's a Pythonism. An else clause attached to a for executes only if the > > for ran to complewtion without a break. In this case, the code checks for a > > hole in the fragment sequence and sets the response field if there is no > > hole. > > Thanks. > > You have a tendency to use legal but uncommon constructs. Is that a bug or > feature? On the feature side, it makes the code more compact and maybe some > of us learn something. On the bug side, it makes the code harder to > understand for those of us who don't mentally collect features. > > Is there a collection of obscure features and what they do? I'd like to scan > (and bookmark) something like that. *blink* How would I know what matches your map of "uncommon"? How would I know what features not to use? I like you, Hal, but that doesn't mean I can read your freakin' mind. Ask me to solve something *easy*, like the Halting Problem. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel
Re: ntpq mrulist bug
Hal Murray via devel : > I know what's going wrong, but I'm not enough of a python geek to see a clean > fix. > > The basic idea is that the client sends a request and the server sends back a > clump of packets. The client specifies the max clump size. What's happening > is that at least one packet of the clump is getting lost. The code is > supposed to reduce the clump size when that happens, but that's not working. > > Here is the code outline: > mrulist calls doquery to get a clump of data > there is an except phrase to reduce the clump size > doquery calls sendrequest then getresponse > getresponse returns the answer in self.response > it also returns None (and maybe falls off the end with no return?) > > I think getresponse needs to return two things. > one is the data > the second is a flag: none, some, all > > There are lots of raises and excepts in there that I haven't sorted out. > > I think the code should return partial data. It doesn't. Or it doesn't get > processed. > > What's the right way to structure this? Should we fix the current code, or > make a drastic change? I don't think I know yet. I lean towards an incremental fix along the lines you describe, but it's also possible that there's a serious design flaw that merits a rewrite. Please file an issue and assign it to Ian Breune; he's the maintainer for the Python parts. I'll step in if he needs help. > -- > > Where is the if for this else? Can else go with something other than if? > > The code past the else is the normal case. It gets run if the break doesn't > happen. > > This chunk of code is from near the end of getresponse in pylib/packet.py > > # If we've seen the last fragment, look for holes in the sequence. > # If there aren't any, we're done. > if seenlastfrag and fragments[0].offset == 0: > for f in range(1, len(fragments)): > if fragments[f-1].end() != fragments[f].offset: > warndbg("Hole in fragment sequence, %d of %d" > % (f, len(fragments)), 1) > break > else: <=== this one > tempfraglist = [ntp.poly.polystr(f.extension) \ > for f in fragments] > self.response = ntp.poly.polybytes("".join(tempfraglist)) > warndbg("Fragment collection ends. %d bytes " > " in %d fragments" > % (len(self.response), len(fragments)), 1) That's a Pythonism. An else clause attached to a for executes only if the for ran to complewtion without a break. In this case, the code checks for a hole in the fragment sequence and sets the response field if there is no hole. -- http://www.catb.org/~esr/";>Eric S. Raymond ___ devel mailing list devel@ntpsec.org http://lists.ntpsec.org/mailman/listinfo/devel