Re: Is python2 dead?

2023-09-15 Thread Eric S. Raymond via devel
Hal Murray via devel :
> Do you have any data on Go GC times?

Yes. They're pretty miniscule. Most Go GC is performed concurrently
with normal program execution, except for one stop-the-world phase
that typically runs on the close order of 1ms for real production
programs.

https://medium.com/servicetitan-engineering/go-vs-c-part-2-garbage-collection-9384677f86f1

"Nearly all STW pauses in Go are really sub-millisecond ones. If you
look more real-life test case (see e.g. this file), you’ll notice that
16GB static set on a ~ regular 16-core server implies your longest
pause = 50ms (vs 5s for .NET), and 99.99% of pauses are shorter than
7ms (92ms for .NET)!"
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
https://lists.ntpsec.org/mailman/listinfo/devel


Re: Is python2 dead?

2023-09-15 Thread Eric S. Raymond via devel
Hal Murray :
> Maybe it's time to switch to Go?

I've thought about it.  It shouldn't be all that difficuly.  I wrote a tol that 
would help:

https://gitlab.com/esr/pytogo

> How long would it take us to rewrite, from scratch, everything in ntpclients?

Around three man-monts is my estimate.

> I occasionally poke around in ntpq.  I find it very hard to work with.  I 
> think the others are much simpler.

Yes, that's so.  Most of the complexity is in ntpq.

> Is the basic structure right?  If we were starting from scratch, what would 
> pylib look like?

I've learned by hard experience not to try to do a language translation and
a rewrite at the same time, so this is a question I wouldn't want to broach
while doing a lift.

That said, I think the structure of pylib is basically sound.
-- 
        http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
https://lists.ntpsec.org/mailman/listinfo/devel


Re: Is python2 dead?

2023-09-15 Thread Eric S. Raymond via devel
Gary E. Miller via devel :
> > Maybe it's time to switch to Go?
> 
> Please, no.  Go is a garbage collected language.  Just what NTPsec does
> not need, random, unpredictable delays.

We're only takling client-side as yet.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




signature.asc
Description: PGP signature
___
devel mailing list
devel@ntpsec.org
https://lists.ntpsec.org/mailman/listinfo/devel


Re: Is python2 dead?

2023-09-11 Thread Eric S. Raymond via devel
Hal Murray via devel :
> Really really dead?  Or maybe just hiding in some dark corner?

Python 2 was end-of-lifed on 1 Jan 2020.  It looks pretty dead from where I'm 
sitting,
but I'm aware that people who run RHEL have a different opynion.

> Should we drop support for python2 as part of the next release?
> Or announce in the next release that we will drop it as part of the following 
> release?

The policy I havee for my projects these days is that I leave the poly
machinery in place untiil I want to do something Python 3 specific, then
I drop it.

I would be OK with your second alternative.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
https://lists.ntpsec.org/mailman/listinfo/devel


Test #2

2022-03-24 Thread Eric S. Raymond via devel
Checking to make sure I've resubscribed.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
https://lists.ntpsec.org/mailman/listinfo/devel


Re: Python - hard to re-aquire

2021-07-09 Thread Eric S. Raymond via devel
Hal Murray :
> 
> > I'm pretty sure this is a problem with the ntpq code, not with Python -
> > Python in general has a reputation for being *easy* to read six months 
> > later,
> > which I think is deserved.  It's one of the first things I noticed when I
> > started coding in Python back in 1998 or so. 
> 
> The hard to-go-back-to comment came from friends that I trust as much as I 
> trust you. (I worked closely with them for ~10 years.)

The story of the blind men and the elephant comes to mind.

> I don't maintain a lot of python code.  I have a handful of python hacks.  
> The 
> small ones are easy to maintain.  I find ntpq hard to work with.  As you 
> suggest, that could be because it is crappy code.

I'm not sure that it's crappy code; some things have to be as complicated as 
they
are.  What it certainly is is *difficult* code.

>   It could also be because my 
> head depends on type checking and python is full of automatic conversions 
> that 
> conflict with strong type checking.

I think you will be happier working in Go than in Python.  It's strongly tryped,
and the compiler's error messages are so helpful that they come as rather a 
shock
after years of GCC.

> >> There is a bug in our ntpq, or was not-that-long-ago and I'm pretty sure 
> >> it 
> 
> > I'm aware.  Some time back I spent a day hunting for that bug.  I couldn't
> > find it.  That's a nasty thicket of code in there. 
> 
> The bug is in the interface between the code that collects packets and checks 
> the sequence numbers to make sure it got all the packets in the clump and the 
> code up a few layers that decides how big the clump should be.  Tangled in 
> there is that the collect-it code should return a partial clump but doesn't.
> 
> I'll track it down if you want to look again.  It may be in the issue already.

If you find the issue could use more detail, please add it.  Then I'm
willing to take anoither swing at fixing the problem.  It has been niggling
at me ever since.

> Another thing to consider...
> 
> You are planning to convert everything to Go without changing the structure, 
> then go back and clean things up.
> 
> Why didn't ntpq get cleaned up after it was moved to Python?

It did. The code ended up as pretty good Python, not just C awkwardly
recoded as Python.  A way to tell that is, for example, that it makes
proper use of first-class maps rather than replicating the many ugly
and unfortunate ways that C programmers try to approximate them.

Often that kind of cleanup exposes and destroys bugs, but it's
certainly not guaranteed to fix *every* pre-existing bug.  I agree
with your analysis - I think there is some dataflow error, that
it was in the C code as well, and it got faithfully replicated by
the faithful translation.

Of course, the same risk of bug-for-bug compatibility exists in doing
a stupid literal translation of ntpd.  But in both cases that risk has
to be compared to a different one, that trying to translate and refactor
or rewrite at the same time will lead to a complexity explosion.

I have seen such explosions before.  They tend to ruin porting efforts
and leave a large blast crater.  I have learned a healthy fear of
trying to do too much during a port.

> [Go being easy to read.]
> > BTW, I will not strongly assert that this is an advantage over Rust, because
> > I never grokked Rust well enough to make a really fair comparison.  But Rust
> > requires a lot of ceremony in the form of lifetime declarations that neither
> > Python nor Go does; a priori this probably does put it at a readability
> > disadvantage.
> 
> This should be an interesting experiment.  The same info that is a 
> complication also tells you what is going on.

Maybe.  Or it could *obscure* what's going on in a surfeit of syntax.  I've
used and read languages that have that tendency.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Using Go for NTPsec

2021-07-09 Thread Eric S. Raymond via devel
e to work harder to find a code defect
that will pass a zero pointer than you would in C.  I've only ever
gotten there by accidentally dereferencing a pointer-valued variable
that had been declared but not yet initialized.

There is a stricter sense of "memory-safe" that means you can never
generate an invalid pointer at all.  Rust is almost there, but not
quite.  It has static null-pointer safety (you normally have to
initialize a pointer to a valid target location at declaration time)
but not dynamic safety. That would require a runtime check, and
Rust makes a point of not having a runtime.

> I put near-optimal performance high on the list.  My attempt at some
> simple Rust timing code got surprisingly high numbers for reading
> the clock.  It may have been loop overhead.  Manually replacing an
> iterator style loop (the clean way) with an ugly by-hand counter got
> somethig reasonable.  I haven't tried to figure out what is/was
> going on.  (yet)

That's a surprising result.  One of Rust's most touted advantages is
low overhead for close-to-the-metal code.  I wouldn't have expected it
to compile fat code from an iterator.

> > I gave up the experiment when I discovered that Rust had no equivalent of
> > poll(2)/select(2). 
> 
> I poked around a bit and didn't find anything that looked good.  I
> assume you wanted to wait for network data.  Did you consider a
> thread per socket?

I did, but I was trying to write a simple proof of concept.  Not a good
time to introduce threading.

> My experience is that my experiment of similar trivial size was a
> success.  Was that because I picked a problem area that was easier
> in some measure, or maybe I just got lucky and all the library
> support I needed was already there.

My money would be on the latter.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Using Go for NTPsec

2021-07-08 Thread Eric S. Raymond via devel
n
Go's in the sense of foreclosing more errors.  I'd say the other is what they
call "zero-overhead abstractions". Rust may have a more expressive type system
than Go, too - there are some tricky issues around evaluating that.

Unfortunately, I found the price for these virtues too high.  I wish it had
been otherwise.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: GC timing

2021-07-07 Thread Eric S. Raymond via devel
Hal Murray :
> If you pass in a buffer, there is no reason to allocate anything in the case 
> of a server processing a request so this whole discussion is a wild goose 
> chase.

It's a little more complicated than that, because I was describing the
lowest-level recvfrom() in the socket library.

Even if we use that for the dumb phase 1 translation, it might be
right in phase 2 to move to the higher-level interfaces in the ipv4
and ipv6 libraries. Some of those dynamically allocate buffers.

In thinking this through, I want to be reasonably sure that not just
the phase 1 but phase 2 implementation isn't going to take a sync
accuracy hit. Thus, I'm trying to make worst-plausible-case
assumptions.

> Two orders of magnitude is 10K packets/second.  That's in the interesting 
> range.  Our current code can do 400K non-auth packets/second.  That's only 
> 40% 
> of wire speed on a gigabit link.  NTS is 90K packets/sec, 25% of wire speed.
> 
> Memory speeds are ballpark of 20 GB/sec.  So if you have a GB of headroom, it 
> will take at least 50 ms just to scan that.
> 
> NIST servers are running 10E10 packets/day.  That's averaging 115K 
> packets/second.
> 
> https://tf.nist.gov/general/pdf/2818.pdf

A NIST-grade server can afford to dedicate a lot of RAM and then set
the GOGC knob to a high value that trades having a bigger working set
for a longer expected interval between GCs.

You can, actually, pick the mimimum GC interval you think is tolerable
and get it by pushing GOGC high enough.  Of course if you don't have
enough physical RAM memory-retrieval times will blow up due to
swapping, but these days putting a bunch of terabytes of RAM in a
datacenter-grade server isn't even unusual any more

115K 90-byte packets/second will fill a GB in about a tenth of a second.
Drop N terabytes of RAM on the problem and now you're looking at memory-full
intervals of about N * 100 seconds.  It's not going to take a large N
to make latency spikes a rare and minor blip in the traffic. 
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Using Go for NTPsec

2021-07-07 Thread Eric S. Raymond via devel
Hal Murray :
> That sounds like the right ballpark.  Again, if I were working in this area I 
> would be writing hack code to generate numbers.  It's got to have a buffer 
> for 
> each item waiting in the channel.  Does it do an alloc/free on each item or 
> does it avoid that by saving the buffer and reusing it?  We have 3 sizes of 
> packets: no-auth (48 bytes), shared-key (68), and NTS (232).  Can we tell it 
> how much to copy or will it copy worst-case?
> 
> Processing a packet takes a few uSec for no-auth and ~10 usec for NTS.  A 
> single 250ns is not important relative to 10 usec but will be significant 
> relative to a few usec if you do it several times per packet.

Premature optimization.

We should do *nothing* other than writing the simplest, most idiomatic
code possible unless our *measurements* of GC latency spikes show that
they're reaching an uncomfortable frequency.

This measurement is pretty trivial to do.  See

https://golang.org/pkg/runtime/debug/#ReadGCStats

> I still haven't seen how you plan to get the data from the client side to the 
> server side without a lock.  See sample code above.
> 
> The server side does:
>   loop:
> recv_request
> get_info
> fill in reply
> send_reply
> 
> I suppose you could do it if channels have a peek option. That will take a 
> lot 
> more code.  You will need a main channel and a sub-channel per thread and 
> another thread to copy over.  And then you have to handle the case where a 
> server thread is idle so it's not checking its new-data channel...  Ugh.

Yeah, there's got to be a simpler way than that.  But it's just
borrowing trouble to try to specify the way this far in advance.  The
Phase 1 goal has to be a dumb, literal, unthreaded translation that
comes as close as possible to just transcribing trhe existing C into
Go and exhibits sane sync behavior.

If we never get further than that it will still be a win because no more
buffer-overrun exploits.  But we will.

Dunno about you, but I expect phase 2 (stupid Go to properly idiomatic
Go with fully exploited concurrency) to be a helluva lot of fun.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: GC timing

2021-07-07 Thread Eric S. Raymond via devel
Hal Murray :
> 
> >> That doesn't make sense.  Where does your "one second apart" come from?  
> Why
> >> is "currently has 2 threads" interesting?
> > When do we poll at a less than one-secpmd interval? Most allocatopmns wo;l;
> > ber associated with making a packet fra,e for he send, thn dealing with a
> > response that comes bacl less than 100ms later. 
> 
> I was thinking of the server side.  Pool servers can easily get 100s of 
> packets/second.  I assume that means we have to write the server side so that 
> it doesn't do any allocations.

At just 100s of packets per second I don't think there's any real problem
with dynamic allocation.  Taking 96 byters as a typical query length (that
includes auth and digest) let's round up to 1024 to account for allocation
overhead and to make the calculation convenient. 

A memory gigabyte is 2^20 of these packets.  That's 1048576 seconds of traffic,
or 291 hours, or 12 days and change.  Even if your system has just a GiB of
headroom, you're only going to trigger a GC due to queries about that often.
I don't think we get to numbers that even approach being worrying until
two magnitudes up from here.

> What is the API for recvfrom()?  Do you pass in a buffer, like in C, or does 
> it return a newly allocated buffer?

You pass in a buffer.  In theory we could maintain a buffer ring.  I'd want to
see actual benchmarks showing frequent GCs before I'd believe it was necessary,
though.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Using Go for NTPsec

2021-07-07 Thread Eric S. Raymond via devel
Hal Murray :
> > You're thinking in C. This is a problem, because mutex-and-mailbox
> > architectures don't scale up well.  Human brains get overwhelmed at about 4
> > mutexes; beyond that the proliferation of corner cases get to be more than
> > the meat can handle.
> 
> I'm missing something.  What is the problem with lots of mutexes?  I'm 
> guessing you are thinking of deadlocks.  They don't happen if code running 
> with the lock held doesn't call any code that needs another lock -- a leaf 
> node in the tree.  I think my brain can handle lots of leaf nodes.

How do you know something is a leaf node?  Lock claims can be
arbitrarily fall down a call chain, after all.

Humans are simply very, very bad at noticing unplanned mutex
interactions.  This is why finding more tractable concurrency
primitives is a major theme in recent language designs.  Nobody know
what the overall best set of primitives is for defect reduction (and
"best" may not be singular or even well-defined), but pretty much
anything is better than mutex-and-mailbox.

> Does Go have any utilities to scan code and build the lock tree?

No.  Naked locks are available in Go but they're considered an
option of last resort and used seldom.

> How do you avoid analogous deadlocks with channels?

By being careful. :-)

There's no way to automate your way out of this, and that is the major
known problem with CSP (communicating sequential processes) as a
concurrency-primitive set - indeed, early notice of this problem is
the reason there was very little experimentation with CSP between
occam (1983) and Go (2009).

Go is actually something of a surprise; according to the conventional
wisdom in this area CSP should not have been as successful as the Go
implementation in Go has turned out to be.  That is, deadlocks *should*
be a bigger problem in the real world than they are.

I don't know why this is.  The Go designers did something subtle that
I don't understand, and nobody else seems to either.  I have
researched this because I find it puzzling and interesting, and not
found any plausible theory.

Other primitive sets have different problems. In general, the more
different things you can do with a concurrency-primitive set, the more
it challenges human cognitive limitations and blows up defect rates.

So in the history of language design you see a lot of attempts to
invent more limited and tractable primitives - Ada rendezvous, for
example, or generators in Icon - followed by dissatisfaction when they
turn out not to be enough of an improvement over mutex-and-mailbox to
sweep the field.

Since the collapse of Dennard scaling the pace of invention in this
area has increased.  Which is why Rust has library support for several
different primitive sets.  Don't try mixing them if you value your
sanity.

> What is the cost of a lock vs a channel pass?  Given that the channel needs 
> to 
> schedule another thread, I'll guess that costs significantly more.

You're right, it does.

There are occasional complaints about this - here's a representative one.

https://dparrish.com/2016/03/go-channels-slow-for-bulk-data

On the other hand, there isn't enough visible bitching about channel overhead to
suggest that it's a pervasive problem.  I found a good dive into benchmarking
here

https://syslog.ravelin.com/so-just-how-fast-are-channels-anyway-4c156a407e45

that may explain why.  Part of his takeaway is

* When it makes sense to use channels don’t worry overmuch about their
  performance: they’re pretty good! Normally the work you will do per
  item will greatly exceed the 90–250 ns it takes to move the item
  through the channel, so it’s just not worth worrying about.

> The lock I'm actually interested in is the one that protects the data that 
> the 
> server thread needs to fill in the packet.  We need that even if we aren't 
> messing with the no-GC flag.
> 
> There is another problem with your proposal.  You skipped step 3.5, do the 
> authentication.  We need several CPUs working on that step in order to keep 
> up 
> with a gigabit link.

Interesting.  OK, then the critical-region goroutine needs to manage
some subthreads of its own.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: GC timing

2021-07-07 Thread Eric S. Raymond via devel
Hal Murray :
> 
> > I don't know all those numbers yet.  But: given that NTPsec only currently
> > has 2 threads and our allocations are typically occuring one second apart or
> > less per upstream or downstream, I can't even plausibly *imagine* a Raft
> > implementation having lower memory churn than we do. 
> 
> That doesn't make sense.  Where does your "one second apart" come from?  Why 
> is "currently has 2 threads" interesting?

When do we poll at a less than one-secpmd interval? Most allocatopmns
wo;l; ber associated with making a packet fra,e for he send, thn
dealing with a response that comes bacl less than 100ms later.

> One area to keep in mind is the MRU list.

Good point, that is a source of allocations I hadn't thought of.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Interleaved Mode (Was: Re: Using Go for NTPsec)

2021-07-06 Thread Eric S. Raymond via devel
Richard Laager via devel :
> On 7/5/21 8:38 AM, Eric S. Raymond via devel wrote:
> > > There is a close-to-RFC to handle this area.  "Interleave" is the 
> > > buzzword.  I
> > > haven't studied it.  The idea is to grab a transmit time stamp, then 
> > > tweak the
> > > protocol a bit so you can send that on the next packet.
> 
> > Daniel discovered it was broken and removed it from the protcol machine.
> 
> Broken implementation or broken design? If the latter, is the current IETF
> proposal (wherever that is) still broken?

You'll have to ask Daniel that.  I've dforgotten the details and never saw
the IETF proposal you speak of.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Using Go for NTPsec

2021-07-06 Thread Eric S. Raymond via devel
 test suite, *then* I
changed it to exploit concurrency.

The positive takeaway is that this worked. I got the performance
gains I was looking for, and used them to make lifting the history
of GCC practical.

Even so, I had enough trouble with the second phase to convince me
that the port would have foundered if I had *combined* it with trying
to rethink the architecture.  I'm good, but I'm not *that* good - I'd
be surprised if anybody is.

This is why I have the same intention for NTPsec. "First make it work.
Then make it right.  Then make it fast."

> [Split out stuff so we can write the time-critical parts in C or Rust.]
> 
> > I want to stay away from mixing languages if at all possible.  The joints
> > between them are always *serious* defect attractors and major sources of
> > maintainence complexity. 
> 
> I envision the APIs being text strings over stdin/stdout.  I think they will 
> be simple enough so that the joints won't be a serious problem.  Put it on 
> your list of options in case you decide you have to do something about GC.

Noted.  That does reduce my reluctance somewhat.

> > Picking up new languages *is* one of my strong points, yet I found Rust
> > rebarbative in the extreme. This did nothing to make me optimistic about
> > finding developers to work in it. 
> 
> I think that is the root of our "discussion".  Your version of good/clean 
> focuses on the language/environment.  You are willing to (try to) dance 
> around 
> run time quirks.  Mine focuses on the runtime.  I'm willing to struggle with 
> the language/environment.  Or at lease struggle some more until I learn 
> something critical.

No.  You think I didn't struggle with Go when I was doing the reposurgeon port?

Before that I had written exactly one Go program, loccount. 2.1KLOC -
near trivial.  Reposurgeon is extremely algorithmically dense -
porting it was *hard*, took me a year of work.

I'm willing enough to struggle with the language/environment.  Given
that you describe it as "not one of your strong points", I'm probably
more willing than you are.  That would be unsurprising, as in my past
experience I have been more willing to handle that kind of novelty
than almost anybody around me.

No, I'm pushing Rust away - and determined to exit from C - because of
reasons in the larger context. We need to get to a memory-safe language,
we need decadal stability, and we need one with a reasonably low
barrier to entry for new devs.

Rust fails two of those tests.  Go passes all of them.  If that means
we need to do some acrobatics to deal with GC-induced latency spikes,
that's a cost I'm willing to incur.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Using Go for NTPsec

2021-07-06 Thread Eric S. Raymond via devel
Hal Murray :
> You have a new toy.  The only tool needed is a simple lock.

Oh? What about the concurrent DNS thread we already have?

At this point I have two years of heavy experience in Go, so the toy
is no longer new. If a better upgrade from C existed, I would know
about it - Rust comes closest but, alas, doesn't make the cut yet,
though it might with five more years of maturity.

> There are N server threads.  They are each in a loop that does a recvfrom(), 
> fills in the reply, then a sendto().
> 
> Filling in the reply needs to access some data that is in globals in the 
> current code.  That needs a simple lock.  If the packet is authenticated, 
> accessing the key needs another lock.

You're thinking in C. This is a problem, because mutex-and-mailbox
architectures don't scale up well.  Human brains get overwhelmed at
about 4 mutexes; beyond that the proliferation of corner cases get to
be more than the meat can handle.

In obedience to "never write code that is as clever as you can
imagine, because you have to be more clever than that to debug it",
we'd really want to hold NTPsec down to 3 mutexes or fewer.

You may think you can keep a fully concurrentized NTPsec down to 3
mutexes, but I doubt it.  Actually already at 4 in the design you're
talking about, counting DNS and logging mutexes.

This is the universe's way of telling us we're going to need more
tractable concurrency primitives going forward.  Languages in which
"more tractable" has actually been implemented in a production-quality
toolchain that can meet our porting requirements are few and far
between.

In fact, as far as I can tell, that set is a singleton.

> >> You can't put it on an ouput queue.  Handing it to the kernel it is part 
> >> of 
> >> the time critical region.
> > I meant output *channel*.  With a goroutine spinning on the end of it. 
> 
> Either I wasn't clear or you were really focused on using your new toy.
> 
> The case we are trying to protect from the GC is the transmit side.  The 
> critical chunk of code starts with putting a time stamp into the packet, 
> optionally adds the authentication, then sends it.  At that point, the packet 
> is gone.  There is nothing to put on a queue or into a channel.

Yes, we've apparently talking past each other. Let's try to fix that.

In Go (or any other language that supports Communicating Sequential
Processes and lightweight threads, which includes Rust), it's bad
style to use mutexes.  When you have a critical region, good style is
to make a service thread that owns that region, spinning forever
reading requests from a channel and shipping results to another
channel. Channels are thread-safe queues; they provide serialization,
replacing the role mutexes play in C.

Each Go program that uses concurrency is a flock of channel-service
threads flying in formation, tossing state around via channels.  It is
much easier to describe and enforce invariants for this kind of design
than for a mutex-and-mailbox architecture, because each service thread
is a serial state machine.  The major remaining challenge is avoiding
deadlocks.

What you're telling me (and I recognize it is correct) is that one
of our service threads has to have the sequence:

1. Wait on the inmput channel for a packet to send.
2. Turn off GC
3. Timestamp the packet.
4. Ship the packet
5. Turn on GC
6. Loop back to 1.

Look, ma, no mutex!

> > Do you know what our actual bottlenecks are? Have you [rofioled them?
> 
> Have I used Eric's favorite profiling tool?  Probably not.
> 
> But I think I have a pretty good idea of where the CPU cycles are going.  Who 
> do you think wrote all the timing hacks in the attic?

OK, one of the most useful things you could do, then, is write a whitepaper
describing where the bottlenecks are and why you think so.  That knowledge
needs to get out of your head to where other people can use it.

> I have a multi-threaded echo server and enough other hacks to drive it and 
> collect data.  It has a knob to inject spin between the recvfrom and sendto.  
> The idea is to simulate the NTP computations and/or crypto.  Do you want 
> graphs?

Yes, but it's not just me that neds them.  Anyone working on this code
needs to know where the bottlenecks are.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Using Go for NTPsec

2021-07-05 Thread Eric S. Raymond via devel
Hal Murray :
> 
> > We don't have a multithreaded server yet.  Worst case we have two threads,
> > and only one can ever reach the critical region in question. Don't borrow
> > trouble! :-) 
> 
> I'm interested in building a server that will keep a gigabit link running at 
> full speed.  We can do that with multiple threads.

Right.  But the way to get there is certainly *not* to try to design
ahead while you're still thinking in a language like C where
concurrent programming is difficult and error-prone.

Once you get used to being able to program in Go's implementation of
communicating sequential processes you'll gave much better tools for
doing concurrency.

> > If we go to using threads more heavily, the idiomaric Go way to handle this
> > problem would be to have a queue that does only the code in window 1.  When
> > you write to the request queue, it would stop GC, do time-critical magic,
> > restart GC, and ship the result on the output queue. 
> 
> That's adding a bottleneck that we need multiple threads to avoid.

Do you know what our actual bottlenecks are? Have you [rofioled them?

> You can't put it on an ouput queue.  Handing it to the kernel it is part of 
> the time critical region.

I meant output *channel*.  With a goroutine spinning on the end of it.

> > I'm not worries about a DDoS. I don't think the protocol machine gives a
> > hostile client or server any way to force hitting that window. 
> 
> I'll work up some numbers if you think that will be interesting.  (It will 
> take a while.  I have to put a system back together.)

Any possibility of a DDoS being feasible is interesting.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Using Go for NTPsec

2021-07-05 Thread Eric S. Raymond via devel
Hal Murray :
> >> 1. packet tx happening right after tx timestamp for server response
> 
> > A) Mitigate window 1 by turning off GC before it and back on after.
> 
> Things get complicated.  Consider a multi threaded server.  If you have 
> several busy server threads, can they keep the GC off 100% of the time?  
> (Condider a DDoS attack.)

We don't have a multithreaded server yet.  Worst case we have two
threads, and only one can ever reach the critical region in question.
Don't borrow trouble! :-)

If we go to using threads more heavily, the idiomaric Go way to handle
this problem would be to have a queue that does only the code in
window 1.  When you write to the request queue, it would stop GC, do
time-critical magic, restart GC, and ship the result on the output
queue.

I'm not worries about a DDoS. I don't think the protocol machine gives
a hostile client or server any way to force hitting that window.

> If you are going to claim the GC doesn't take long and doesn't happen often, 
> maybe you should just put a big comment in the code and not do anything.  
> Better would be a design note collecting all the info.

I've said several times that I think the most likely outcome is that
no special action is needed!

But you've asked me to justify the position that Go GC would not be
not a performance problem, which is a *completely reasonable* thing
to ask; I made the same demand of myself when I started thinking about
a Go port.

Thus the detaled discussion of mitigation strategy.

> There is a close-to-RFC to handle this area.  "Interleave" is the buzzword.  
> I 
> haven't studied it.  The idea is to grab a transmit time stamp, then tweak 
> the 
> protocol a bit so you can send that on the next packet.

Daniel discovered it was broken and removed it from the protcol machine.

> >> 2. serial NMEA data timestamps
> 
> > B) Mitigate window 2 by taking timestamps before and after sample read,
> > asking the Go runtime if a GC has occurred in that interval, and throwing 
> > out
> > the sample if it has. This tactic might slow convergence times minutely but
> > should not affect overall sync accuracy.
> 
> There isn't a convenient "before".  You want to do something like readline, 
> and that's going to wait a while.  You don't care if it does a GC while you 
> are waiting.  So you will have to do something like read the first character, 
> grab the before stamp, read the second character, grab the after time stamp, 
> grab the rest of the line.  Then work out the begin/end of line timing by 
> counting characters and doing the arithmetic with the baud rate.
> 
> But is that worth the effort?  Are there any serial devices with timing good 
> enough to notice?

Good question. Certainly a Macx-1 gets nowhere near that granularity, not with 
.0125s
of inherent jitter due to USB poll interval; that swamps 600usec by more than 
two
orders of magnitude.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Using Go for NTPsec

2021-07-04 Thread Eric S. Raymond via devel
Dan Drown via devel :
> Let's talk a bit about what time critical sections are currently in the
> code.  I think that will help drive the decision about the impact of garbage
> collection.
> 
> I haven't looked at ntpsec's codebase lately, so some of this might be out
> of date.  Please feel free to correct any mistakes or omissions.
> 
> Time critical code:
> 
> 1. packet tx happening right after tx timestamp for server response
> 2. serial NMEA data timestamps
> 
> Non time critical code:
> 
> 1. packet rx timestamp (assumption: SO_TIMESTAMPNS or alike is being used)
> 2. packet tx happening right after tx timestamp for client request
> (assumption: SO_TIMESTAMPNS or alike is being used)
> 3. receiving SHM data
> 4. receiving PPS data
> 5. calculating/updating local clock offset/frequency
> 
> The time critical code can tolerate some level of delay (~hundreds of
> microseconds), as things like packet tx can be delayed for a multitude of
> kernel and hardware reasons.  The good news is both of the time critical
> code paths are somewhat predictable and if we can manually schedule GC, we
> can avoid scheduling it during those times.
> 
> The non timing critical code can be delayed tens of milliseconds without an
> impact to timing quality.

This matches my analysis almost exactly.

My current plan is:

A) Mitigate window 1 by turning off GC before it and back on after.

B) Mitigate window 2 by taking timestamps before and after sample read, asking
the Go runtime if a GC has occurred in that interval, and throwing out
the sample if it has. This tactic might slow convergence times minutely
but should not affect overall sync accuracy.

In all other circiumnstances, treat GC-induced latency spikes as though
they're just another kind of network weather.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Using Go for NTPsec

2021-07-04 Thread Eric S. Raymond via devel
 it to new
language* is an invitation to disaster.

The only strategy that works is to do a stupid, literal,
unidiomatic port first, verify it, then clean it up and make
it idiomatic.

This keans that changes oof the kind you're proposing need to be on
hold until we have a more or less literal translation of the present
C code working.

> I know how to split out the server side of ntpd.
> 
> Suppose we come up with an API for refclocks.  Would that, or something 
> similar also work for network servers?
> 
> I think the timing critical code would be small enough that we could write it 
> in C and inspect it carefully.  That may not be valid if we include the 
> crypto 
> stuff.
> 
> Converting that that sort of code to Rust seems reasonable.

I want to stay away from mixing languages if at all possible.  The
joints between them are always *serious* defect attractors and major
sources of maintainence complexity.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Using Go for NTPsec

2021-07-02 Thread Eric S. Raymond via devel
Achim Gratz via devel :
> Eric S. Raymond via devel writes:
> > Talk to me about what you think the effect of very occasional
> > stop-the-world pauses of 600 microseconds or less would be on sync
> > accuracy. By "very occasionally" let's say once every ten minutes or
> > so, that being what I think is a *very* pessimistic estimate of GC
> > frequency for a program with NTP's memory-usage pattern.
> 
> It's hard to say what that will look like without having an actual
> statistics.  The trouble with stalls is that they introduce bias since
> they alway shift in the same direction.  Once every ten minutes would
> likely not make much of a difference for most systems even if you could
> not filter these events out.

That was my estimation.  It ceretainly is *not* the case that
occasuinal 600-microsecond stalls would cause a 600-microsecond
degradation in typical accuracy; rather, that would be a highly
unlikely upper bound on the loss.

> > What I want to understand - and have others understand - is whether
> > pauses of that size and frequency would mess with sync accuracy
> > enough that heroic measures are required to avoid them.  What kind
> > of distortion would they introduce in comparison with other
> > components of the error budget?
> 
> NTP already filters out values that fall out of the ordinary statistic.

Good point.  It's quite possible that samples distorted by
stop-the-world pauses would usually be thrown out by normal filtering!
Ironically this becomes less likely as GC times drop.

> For the control loop I tend to think that eventually it would make sense
> to drop the assumption of a regular interval at which the control
> interventions happen.

One of the things that's likely to happen if we move to Go is that
control-event scheduling becomes a forever-loop shipping wakeup
notifications to a channel. Each channel read would then start a
worker thread to do whatever the intervention requires.  In C this
would be brain-melting; in Go it's a few lines of simple code.

Once we have this kind of organization, dispensing with the
regularity assumption won't even be difficult.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Using Go for NTPsec

2021-06-30 Thread Eric S. Raymond via devel
Achim Gratz via devel :
> Matthew Selsky via devel writes:
> > On Tue, Jun 29, 2021 at 04:41:30PM -0400, Eric S. Raymond via devel wrote:
> >
> >> Well, first, the historical target for accuracy of WAN time service is
> >> more than an order of magnitude higher than 1ms.  The worst-case jitter
> >> that could add would be barely above the measurement-noise floor at worst,
> >> and more probably below it.
> >
> > Our target is < 1 us, even for WAN time service. We would want to
> > keep/improve this accuracy target.
> 
> Assuming you really talk about accuracy of the time transfer (i.e. the
> maximum time difference between any two systems) that is impossible
> given the principle that NTP uses.

That's what I thought, but I don't have your level of expertise about the
error budget of NTP sync so I couldn't quantify it.

Talk to me about what you think the effect of very occasional
stop-the-world pauses of 600 microseconds or less would be on sync
accuracy. By "very occasionally" let's say once every ten minutes or
so, that being what I think is a *very* pessimistic estimate of GC
frequency for a program with NTP's memory-usage pattern.

What I want to understand - and have others understand - is whether
pauses of that size and frequency would mess with sync accuracy
enough that heroic measures are required to avoid them.  What kind
of distortion would they introduce in comparison with other
components of the error budget?

Mind you, heroic measures are available.  The simplest would be to
run with GC off by default and schedule times to perform a GC when it
can reasonably be expected not to collide with the next polling
action.

But before I plan something like that I want to be sure it is actually
necessary.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Using Go for NTPsec

2021-06-30 Thread Eric S. Raymond via devel
Richard Laager via devel :
> Not particularly. Presumably it's just because of GPS PPS + good network?

Having a good local clock can explaiin it, yes. 
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Support for i386

2021-06-30 Thread Eric S. Raymond via devel
Hal Murray via devel :
> There is the 32bit time_t problem.  We've got a few more years.

I've been thinking forward about that.  One of my objectives in the
big cleanup was to make the time width easier to change, and now it
would be internally.  The remaining blocker is that the NTP packet
format would need to be redesigned.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: ntpq, refclocks

2021-06-29 Thread Eric S. Raymond via devel
Hal Murray :
> Maybe ntpd should turn into a mega program that parses the config file and 
> runs a bunch of other programs and/or threads.

That's an extremely natural way to partition in Go.  Much, *much* easier than
trying to pull off the equivalent in C would be.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Go vs GC

2021-06-29 Thread Eric S. Raymond via devel
Hal Murray :
> > Well, first, the historical target for accuracy of WAN time service is more
> > than an order of magnitude higher than 1ms.
> 
> Time marches on.  We need to do better today, much better.
> 
> NTP is used on LANs.

Then we'll need to go to watching for GC pauses and skipping samples that might 
have
been distorted by them.

> > turning GC off
> 
> Is that lightweight or heavyweight?
> 
> How does that interact with threads?

It's a fast operation, if that's what you mean.  The way Go GC works
requires that there is only one GC-enable flag, not one per
thread. The flag tells the Go runtime whether or not to GC when the
normal memory-usage threshold is reached.

> What happens if there are lots of threads and they are all turning
> it off/on very frequently and probably overlapping?

That flag has to be protected by a mutex, and you have whatever value
happened to be set last regardless of how many threads are running.

If we think contention for that lock is going to be an issue, there's
a pretty standard and simple way of dealing with it using an auxiliary
semaphore.

> I'm assuming the mainline server path won't require any allocations
> or frees.  Total CPU time to process a simple request is under 10
> microseconds.

The main source of memory churn is going to be allocations for
incoming packets, and deallocations when they're no longer referenced
anf get GCed. Allocations are fast.  GC is slow, but isn't performed
very often.

> Is there a subset of Go that doesn't use GC?  Or someting like that.

Not really.  If you want to not use GC, you turn GC off.  Then everything
works as it normally does but your mnemory usage grows without bound
until you re-enable GC, which could trigger an immediate GC sweep.

I analyzed this years ago and discovered two kinds of code span where
unexpected latency spikes could mess things up.

One is right around where the adjtimex call or equivalent is done.
That's a very narrow code section that's going to run in near
constant time and not do any allocations; we can guard it just by
turning GC off at the start of the span and on at the end so that any
other threasd that *is* doing allocations cannot induce a latency spike
during the critical section.

The other is during sample collection from local refclocks.  That's a
little trickier because the read from device is a blocking operation
that can and will do memory allocation.  I think what we have to do
in that case is take a timestamp before the read, then after it check
to see if there was a GC between that timestamp and now, and if so discard
the sample.

Outside those places the code is not really stall-sensitive because
all the data flying around has enough timestamping.

With these mitigation measures I think performance can be expected to
be C-like, except that one in a great while a GC stop will be detected
to have occured during refclock sampling and cause a that sample to
get tossed out.

I say "once in a great while" because a program with ntpd's memory
usage pattern is not going to trigger GCs very often. Most of the passes through
critical regions won't collide with a GC latency spike. We can log
these exceptions to check, of course.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: ntpq, refclocks

2021-06-29 Thread Eric S. Raymond via devel
Hal Murray :
> [context is ntpq via shared memory]
> 
> > Any reason not to use Unix-domain sockets and just reuse the current protcol
> > handling, except it's not accessible netwide?  That might be simpler. 
> 
> I hadn't figured it out when I was typing in my previous message, but using 
> shared memory forces some build time checking that is missing from our 
> current 
> approach.
> 
> Maybe we should split the discussion into two parts.  One is how to get build 
> time checking.  The second is how to connect ntpq and ntpd.  I'm happy with 
> Unix-domain sockets for the latter.

That would probably be simpler than building another shared-memory interface.

> Note that the current ntpq doesn't have any build time checking and we don't 
> have any version numbers on the ntpq data.  My straw may for cleaning this up 
> is a text file with a line per counter.  It will need 2 preprocessors, one 
> for 
> ntpd/mode 6 and the other for ntpq.  Is there a better way?  (It's slightly 
> more complicated than that since there are several tables, for example one 
> per 
> server, and one per refclock.)

JSON is often used for this sort of thing.  It might vbe overkill here,
but it's worth considering.

> --
> 
> [splitting out refclocks]
> 
> > How you handle configuration for separate refclockd and ntpnetd turns out to
> > be a nasty tangle.  Do they both replicate the entire config parser and 
> > parse
> > the same config file, ignoroong the bits they don't need?  Or do you split
> > the config and suffer a flag day? 
> 
> I was assuming each refclock would be a separate program.  It wouldn't need a 
> config file, just some command line stuff.

That's a big jump.  Backward config compatibility would go out the window.

> Even if we don't split refclocks out into a separate program, we should run 
> them as a separate thread so we still need an API between a refclock and ntpd.

Yes, I agree with that.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Using Go for NTPsec

2021-06-29 Thread Eric S. Raymond via devel
Richard Laager via devel :
> On 6/29/21 3:41 PM, Eric S. Raymond via devel wrote:
> > Well, first, the historical target for accuracy of WAN time service is
> > more than an order of magnitude higher than 1ms.
> 
> My two NTP servers are +- 0.1 ms and +- 0.2 ms as measured by the NTP pool
> monitoring system across the Internet.

That's quite exceptionally good.  It's normally hard to get to within an
order of magnitude of that on a LAN, let alone a WAN.

Do you have any theory of why your deviation is that low?
-- 
    http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Using Go for NTPsec

2021-06-29 Thread Eric S. Raymond via devel
Matthew Selsky :
> Our target is < 1 us, even for WAN time service. We would want to 
> keep/improve this accuracy target.

One *microsecond*?  Has any version of NTP achieved that kind of accuracy?

I don't think we're there.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: ntpq, refclocks

2021-06-29 Thread Eric S. Raymond via devel
Hal Murray via devel :
> 
> Does anybody have any good ideas on a modern way to handle ntpq/mode 6?
> 
> Background...
> 
> We could split the server side out into a separate process.  That leaves a 
> very tiny attack surface from the network.  I think I could do that now 
> except 
> for mode 6.
> 
> Does anybody have any good ideas on how to replace the current ntpq 
> functionality?
> 
> Handwave.  My straw man would be to put all the counters into shared memory.  
> Then a local-only version of ntpq seems reasonable.  If SSH isn't good 
> enough, 
> we could add a TCP or TLS wrapper.

This is probably achievable.  But...

Any reason not to use Unix-domain sockets and just reuse the current
protcol handling, except it's not accessible netwide?  That might be simpler.

> I also want to be able to quickly add more counters.  That gets into a 
> version 
> control tangle.  Would it work to have 2 version numbers?  (maybe call them 
> version and sub-version)  One for the base/stable counters and another for 
> the 
> new/experimental ones?  The idea is that the new ones get folded into the 
> base 
> at release time.

I'd have no objection to that.

> Similarly, it would be nice to split the refclocks out into a separate 
> process.

I looked at that prospect pretty closely years ago.  Here's why it dodn't 
happen:

How you handle configuration for separate refclockd and ntpnetd
turns out to be a nasty tangle.  Do they both replicate the entire
config parser and parse the same config file, ignoroong the bits
they don't need?  Or do you split the config and suffer a flag day?

It's hard to know what the right thing is here.  Going with a unitary
parser preserves backward compability, but one of the project gols is
to reduce attack surface to the dead minimum possible and we certainly
would not be accomplishing *that* with a unitary parser.

Another problem is that throwing out the obsolete drivers has
drastically reduced the expected gain from splitting the daemon.  When
the drivers were a huge mass of code that dwarfed the networking part,
splitting them out and adding a thin refclockd wrapper around them
made obvious sense.  Now they're only 24KLOC total and the wrapper
around them would be a significant fraction of that as a LOC
*increase*.

I reluctantly concluded that the effort wasn't justified.  I'm
open to argument on that.

> We need something better than the current shared memory approach.  
> Read only SHM would be OK for data, but we need a clean way to wake up the 
> receiving side so it can process the data promptly.

I'd like to hear more about this.  It sounds like a separate issue from
the damon split.

> More background...
> 
> I'd like to move the current kernel mode PLL to user space.  I think modern 
> CPUs are fast enough for that to make sense.  I haven't done any 
> experimentation.

Can't really respond to this as I don't understand the kernel PLL.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Using Go for NTPsec

2021-06-29 Thread Eric S. Raymond via devel
Sanjeev Gupta via devel :
> This is a follow on to Eric's email a few hours ago, I am keeping that
> thread clean.
> 
> (The last 3GL I programmed in was Fortran, and not the 77 version.  I can
> read bash scripts and C pseudo-code)
> 
> The literature I can find speaks of Go GC being improved in 1.5, such that
> the STW phase (the "sweep") is now less than 1ms.  This is impressive, but
> for NTP, this places a lower bound on our jitter.
> 
> What am I missing?

Well, first, the historical target for accuracy of WAN time service is
more than an order of magnitude higher than 1ms.  The worst-case jitter
that could add would be barely above the measurement-noise floor at worst,
and more probably below it.

Second, Go 1.5 was a long time ago.  STW pauses are much shorter now.
The graph at https://github.com/lni/dragonboat indicates that even
under heavy load STW in GO 1.12 never went above 600 microseconds and
is usually somewhat below 400.  We can expect this figure to decrease
rathrr than increase in the future - reducing latency spikes is high
on the Go development team's objectives.

Third, most of the code isn't stall-sensitive at all.  There are a
couple of critical regions that need to be guarded, which I think we
can accomplish by eirther (a) turning GC off before we enter them and
turning it on again after, or (b) some Lampson-like tricks for
tetecting when the interval was interrupted by GC and discarding any
resulting sample.

I don't think we'll ever need to go to that third level, but we can
deal with it if we need to.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Work plan prpoosal for the next year

2021-06-29 Thread Eric S. Raymond via devel
James Browning via devel :
> I do not consider myself an expert C developer.

You're good enough to contribute materially to *this* project, which
is by no means the easiest C gode to grok.  That makes you expert in
my book.

> I think the proposed schedule is overly serial. ntpkeygen and keygone
> for example have no dependency on pylib/ IIRC. Also none of the other
> ntpclients/ depend on ntpq. This would (in theory) pull CLIENTS up to
> month 5 (late November/December)

I was deliberately vague about the subproject dependencies and what
can be done in parallel, because the point of this document is a
scope-of-work estimate that might get us some money for things like a
hardware test lab.  Also it would be nice if Ian and I could pay our
rent and grocery bills while we're doing the Go port.

If this work plan looks like it'll fly I'll do a more detailed
dependency chart.

> I think the open 'issue' count would be lower if people actually
> tended to close issues.

Agreed.  I occasionally do a triage pass to catch these.  I guess we're
due for one.
-- 
    http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Work plan prpoosal for the next year

2021-06-24 Thread Eric S. Raymond via devel
Developers, please weigh in on this so we can finalize it.

The final version will become part of a grant proposal which may
get us money for a hardware test lab and code bounties.

= NTPsec work plan

This is a rough-draft work plan for the NTPsec project over the period
July 1st 2021 to July 1st 2022.

== Major objective: 

Our major objective for this year will be to move the NTPsec codebase
from C and Python to a single memory-safe language.

=== Rationale

NTPsec is a security-focused project.  As with other large, mature C
programs, effectively all of its securty issues are consequences of
the fact that C is memory-unsafe, it is very easy to accidentally
write code with wild-pointer bugs that create exploitable
vulnerabilities, and it is very difficult to detect such bugs.

Historically the mitigation strategy for this problem has been a
combination of tight code discipline with application of code
analyzers designed to detect vulnerabilities.  This approach is
known to be leaky and inadequate, but has long been accepted for
lack of a better alternative.

There is now a better alternative: the Go language.  Go is
sufficiently like C and Python to make the code move feasible,
but does pointer bounds checking, eliminating pointer-overrun
bugs and thus preventing the creation of exploitable security
bugs through these overruns.

Go does not make the related problem of denial-of-service attacks
through null-pointer errors outright impossible, but static type
checking and Go's own validation tools will make suvch bugs much
easier to prevent.

It is expected that this code move would reduce NTPsec's vulnarability
to exploits by a large factor, an order of magnitude or more.

=== Personnel

The NTPsec technical lead (Eric Raymond) and his apprentice (Ian
Bruene) are expert Go programmers. Other team members (notably Hal
Murray, Gary Miller, James Browning, and Richard Laager) are expert C
programmers who can be confidently expected to come up to speed in Go
very rapidly.

=== Key performance indicators for this effort

An entire port will not be achievable in 12 months. Finishing
it is probably an 18-month to 2-year project for the personnel on
hand.  Nor, due to the Brooks's Law effect, can adding more
people be expected to shorten the project.  However, we can define
milestones that should be achivable within a year and demonstrate
the achievability of the entire effort.

Milestone PYPACKET: Port and unit-test the NTP packet handling from the
client code (pylib/packet.py and pylib/util.py). Estimate: 1 month.

Milestone NTPQ: Port ntpq, the principal client, from Python to Go.
Test interoperability with ntpd. Estimate: 3 months.

Milestone CLIENTS: Port the remaining clients (ntpdig, ntpkeygen, ntpmon,
ntpsweep, and ntpwait) from Python to Go.  Estimate: 4 months.

At completion of milestone CLIENTS (8 months out) we will have a
working packet layer and client suite in Go that interoperates not
just with ntpd but can be tested for conformance with other NTP
implementations.

Milestone CONFIG: Configuration parsing for ntpd. Build and test a
workalike parser in Go for NTP configuration files.  Estimate: 2
months.

Milestone FAKED: Build a demonstration fake ntpd that does everything
but the actual time-sync and clock driver code, collecting clock
samples from upstream NTP servers.  Estimate: 4 months.

Milestone SYNC: Port the time-synchronization and clock setting
code. Estimate: 3 monts.

Milestone NTPSHM:  This is the most important clock driver for
production use. Estimate: 1 month.
 
Milestone LEGACY: Port the legacy clock drivers to Go.  This one is is
big and messy and difficult to scope, as the driver code is old and
crufty and difficult to test.  It is probably not achievable in year
one and may require budgeting for and building a hardware test lab.
Tentative estimate: 5 months, with an unfortunately high probability
of being blocked on the availability of test hardware.

== Minor goals

* Resolve all CVEs rapidly and completely

* Reduce outstanding issue count from 38 to less than 20.

* Improve unit-test coverage

* Maintain a regular point-release schedule
-- 
http://www.catb.org/~esr/";>Eric S. Raymond

"Today, we need a nation of Minutemen, citizens who are not only prepared to
take arms, but citizens who regard the preservation of freedom as the basic
purpose of their daily life and who are willing to consciously work and
sacrifice for that freedom."-- John F. Kennedy
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Objectives for the next year

2021-06-18 Thread Eric S. Raymond via devel
MLewis :
>Is it worthwhile improving the current C code to a 'hardened' programming
>standard?�
> 
>Example
>- Joint Strike Fighter standards
>[1]https://www.stroustrup.com/JSF-AV-rules.pdf
>- NASA JPL standards
>
> [2]https://andrewbanks.com/wp-content/uploads/2019/07/JPL_Coding_Standard_C.pdf
>- MISRA
>[3]https://misra.org.uk/LinkClick.aspx?fileticket=vfArSqzP1d0%3d&tabid=57
> 
>What effort would be required for 'hardening'?
> 
>If not a full 'hardening', is it worthwhile to use the
>hardening/vulnerability/guideline-fail reporting tools to identify and fix
>the worst vulnerabilities or to grab the low-hanging fruit?
> 
>Anyone with experience with 'hardening' C code? (I don't)
> 
>But first, what's the problem with the ntpsec C code. Is there an issue
>with vulnerabilities in the current C code, uncertainty with possible
>unknown vulnerabilities in the current code, or is the concern one of
>introducing vulnerabilities in the future as the C code is maintained or
>new functionality added? Or is the answer to that "yes". Is 'hardening' a
>solution or just an improvement? I assume you're still vulnerable where
>the hardening guidelines failed or weren't ideally followed? Is moving to
>a new language the better solution?
> 
>If moving to another language is inevitable, if that move is selected as a
>goal for the next year, is 'hardening' the ntpsec C code still worthwhile?
>- Could 'hardening' be done and in place before the move to another
>language is complete. For what benefit.
>- Or would the 'hardened' C code be replaced weeks later by code in a new
>language. Or would new language code be in place in the same or similar
>time (sooner?), if 'hardening' efforts were instead put on moving.
>- If a full 'hardening' isn't worthwhile, is some 'hardening' effort
>worthwhile.
> 
>Regards,
> 
>Michael
> 
> References
> 
>Visible links
>1. https://www.stroustrup.com/JSF-AV-rules.pdf
>2. 
> https://andrewbanks.com/wp-content/uploads/2019/07/JPL_Coding_Standard_C.pdf
>3. https://misra.org.uk/LinkClick.aspx?fileticket=vfArSqzP1d0%3d&tabid=57


Here is my judgment:

>  Is there an issue
>with vulnerabilities in the current C code, uncertainty with possible
>unknown vulnerabilities in the current code, or is the concern one of
>introducing vulnerabilities in the future as the C code is maintained or
>new functionality added?

All of the above.

I *do* have experience with hardening C code - a substabtiual amount
of it has been on this project, and there was GPSD before that.  In
truth, we've already done most of the best practices. More effort
would have rapidly diminishing returns.

But the real problem is at a deeper level. C is fundamentally unsafe;
hardening - either as we've achieved it or as a hypothetical ideal -
can only be an improvement, not a solution.

I think our time would be better spent moving to a memory-safe
language than trying to harden the C code further.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Objectives for the next year

2021-06-18 Thread Eric S. Raymond via devel
James Browning via devel :
> Are there any C to golang or rust transpilers that work
> reasonably well? The last time I checked the best rust
> transpiler generated rs files that were just shallow glosses
> and the golang transpiler was somewhat inadequate and
> verbose.

This is still the state of play, alas.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Objectives for the next year

2021-06-18 Thread Eric S. Raymond via devel
Hal Murray :
> What was the name for your attempt to get a GPSD style replay of old data?  
> Did we ever figure out why that didn't work?

I did.  There's a blog post about it:

https://blog.ntpsec.org/2017/02/22/testframe-the-epic-failure.html
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Objectives for the next year

2021-06-18 Thread Eric S. Raymond via devel
Hal Murray :
> > I'll start the ball rolling with this big one:  It's time to move out of C.
> 
> I want to threadify things, and taking advantage of that, I want to run at 
> full wire speed on a gigabit link with a modest server class CPU.
> 
> I have test code running.  I'm pretty sure it will work.  But my test code is 
> in C.
> 
> Are other people happy with threads?

I am now.  Working with Go for the last couple of years has nrought me up to
speed on that kind of programming.

> I think threadifying things will allow us to clean up the code a lot.  I 
> think 
> that may be as important for writing safe/clean code as moving out of C.

I think you are right.  However, I'm pretty sure we should change
languages before we threadify.  Both Rust and (especially) Go have concurrency
primitives that are *far* more tractable than their rough C equivalents.

> > My choice for a language to move to would be Go.
> 
> The only other choice I've seen seriously mentioned is Rust.  How do they 
> compare for thread support and for CPU cycles?  Do they have full access to 
> all the options for network I/O?  (for example recv time stamps)
> 
> My first reaction is that I don't want reference counted stuff involved with 
> network I/O.  But maybe that is bogus.  It's roughy 5 microseconds for a 
> recv/send pair in C. We should write some test code so we can get some real 
> timing numbers.

There are two major issues with Rust:

1. We have at least two people who are expert Go programmers - Ian and
myself.  We have nobody, AFAIK, who is up to speed on Rust.  Moving
the code will be a large amount of work - I don't think any good
purpose is siolved by adding "learn to be fluent in an entire new
language" on top of that.

2. I don't think Rust is as yet stable enough for our purposes. The
language and core libraries are still in some flux - we can't yet
count on a feature we're relying on not disappearing over the next
decade.  This is a very srtark contrast with Go's ironclad
forward-compatibility guarantee.

For me, issue #2 is the real dealbreaker.

As for your questions: Both languages have reasonable threading
support, far better than C's.  Rust would be more abstemious of
processor cycles than Go is, but I dont believe the difference
is significant for our deployment.

Full access to all aptioons for betwork I/O: I believe pure Go can be
beaten into this, but that corner of the API is very poorly
documented so I don't know exactly how yet.  We have the option of
using cgo and writing small C extensions to gety at things the Go
API doesan't suppport.

I don't knpw of anuy direct support for things like recv time stamps
in Rust.  But Rust also has a facility to call C extensions.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Objectives for the next year

2021-06-18 Thread Eric S. Raymond via devel
Developers, please weigh in on what you think the NTPSec project's
goals for the next year ought to be. These goals can be coding
projects ("Move the Python code to Go") process goals ("Halve the size
of the issue list") or project infrastructure goals ("Build a hardware
lab so we can live-test supported clocks.")  Or anything else you
can think of.

These goals can be large or small. They can be old issues we haven't
afddressed well enough. For purposes of the document I'm trying to put
together, clear definition of a goal matters more than how anbitious
it is.

My only request is that you not argue priorities when you see other
peoples' suggestions in the replies - not yet.  The initial phase of
this discussion should be brainstorming.  At some point I'll summarize
the suggestions that seem viable (clearly enough defined, possible to
complete wityhin a year) and we can start rank orering.

I'll start the ball rolling with this big one:  It's time to move out of C.

C is a terrible implementation language for any project that wants to
bne secure and reliable, due to wild-pointer bugs.  Our client code,
being in Python, is memory-safe; the daemon code, the most crucial
part, is not.  Let's fix that.

My choice for a language to move to would be Go. Possibly one of you
can argue for a different choice, though if you agree that Go is a
suitable target I would find that information interesting.
--
>>esr>>
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: visiting the todo list

2021-06-08 Thread Eric S. Raymond via devel
Hal Murray via devel :
> I'm still planning to threadify the server side.  I'm stalled half on other 
> things and half waiting for Eric to get free so we can cleanup the ntpq stuff.

Still on my to-do list.  I'm still focused on my conversioon job,
though, making me actual money.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: What is the purpose of devel/ifdex-ignores?

2021-03-07 Thread Eric S. Raymond via devel
Hal Murray :
> 
> e...@thyrsus.com said:
> > No, just boring history.  I think those are old conditinal macros we no
> > longer use; likely they have been renamed to something else. 
> 
> The current code tests for SO_TIMESTAMPNS
> 
> Should I just delete the old/unused stuff?

Yes.
-- 
    http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: What is the purpose of devel/ifdex-ignores?

2021-03-06 Thread Eric S. Raymond via devel
Hal Murray via devel :
> >From ./devel/ifdex-ignores
> USE_SCM_BINTIME # to grab timestamp for recv packet
> USE_SCM_TIMESTAMP   # "
> USE_SCM_TIMESTAMPNS # "
> 
> None of those symbols are used by our code.
> 
> Should I just delete them?
> 
> What is the idea for USE_xxx?  Is there some interesting history I've 
> forgotten?

No, just boring history.  I think those are old conditinal macros we
no longer use; likely they have been renamed to something else.
-- 
    http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: What I have been doing 2021 post-January

2021-02-04 Thread Eric S. Raymond via devel
James Browning via devel :
> I submitted a couple of patches to gpsd and one to microjson resolving
> issues. One where an empty string validated correctly as an object was
> already posted to microjson. The other allowed pretty much any string the
> same length or shorter to pass a t_check.
> 
> Three of my merge requests made it into the tree. The first resolved a
> couple of issues with readvar in ntpq. The second addressed some nits with
> the docs and the third resolved ntpdig handling address resolving errors
> poorly in Python3.
> 
> I have also five yet unmerged requests submitted.
> 
> 1203. I rewrote the part of the peers output generation and will add a new
> mode with the refid, tally code, and peer name/address on the right side of
> the graph.
> 
> 1204. I partially address Hals mode 6 wishlist by adding new protocol
> fields for the entirety of the current processes running. Also, I added a
> new duration helper and dual column stats output.
> 
> The following are the new requests.
> 
> 1207. I changed the name of the is_vn_mode_acceptable function while
> dropping NTPv1 support and requiring at least 12 octets (not 1). The tree
> version of the function checked for specific modes which draft as published
> NTPv1 did not have.
> 
> 1208. I stripped out all handling of the netlink socket and fixed around
> the breaks I found. This would reduce NTPsec w/ NTS and IPv4/6 to 5
> sockets. They are UDP4, UPD6, TCP4, TCP6, and netlink which only spuriously
> trigger DNS retries.
> 
> I also have a branch[1] that also sweeps away the asynchronous update
> updaters and the netlink socket. It is not part of 1208.
> 
> 1213. I tackled another bit of untamed in ntp_control. I took three
> *_varlist blocks and reshaped them into a trio of wrapper calls which call
> another new function. I reworked many ctl_put* functions to use a
> higher-level function call saving a few lines each. Also, new macros were
> added and used saving a few lines per invocation.
> 
> I intend to merge 1207 and 1213 Tuesday. Also 1207 and 1213 the following
> Saturday.
> 
> Are there any obvious (or not so) reasons why I should not go ahead?
> 
> [1] https://gitlab.com/jamesb_fe80/ntpsec/-/tree/21A31-twinsock

No objection from here.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: discrete units

2021-01-20 Thread Eric S. Raymond via devel
Hal Murray via devel :
> 
> Gary said:
> > I think he is referring to reecent proposals to split ntpd up into multiple
> > daemons.  Daemons for the core, NTS, clients, etc.  Each doing a small job.
> > Rather than the one big daemon we have now. 
> 
> That sort of split looks good on paper, but I'm not sure how well it would 
> work out in practice.

I share Hal's skepticism.

Back at the beginning of this project I had a plan to split ntpd in
two.  Isolate all the network plumbing in one half, all the clock
management in another, have them communicate over some kind of local
channel.  Why I eventually gave it up is instructive.

Any time you split up a progream like ntpd you add to the total volume
of code, if only because you now need two network wrapper layers around the
core logic rather than just one.  This can be worth doing, but only if the
payoff in reduction of global compleity due to decoupling the core
pieces is high enough to pay for the extra code.

You cannot take for granted that this will be so, and I eventually
concluded that it was not - we were collecting much bigger
simplifications simply by removing obsolete drivers.

Another headache is service configuration.  Woul;d the two pieces have
shared the same config file or not? Complications would have arisen
either way. If you split the file it's a flag day for all users (no
small matter when the uservase is as conservative and risk-averse as
ntpd's).  OTOH, if you don't split the file you lose some of the
simplification you might have collected, as both pices have to carry
the same parser and generate errors when ry're fed a piece of
configuration that's not theirs to handle.

Thus, weaing my architect hat, I'd have to see a very strong and specific
case for splitting up ntpd before I'd agree.  
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: GitLab | Projects forced to "Private" (#294196)

2020-12-17 Thread Eric S. Raymond via devel
James Browning via devel :
> The ntpsec forks belonging to rlaager, selsky, and ianbreune are still
> detached. A quick check shows that there are no forks. The page I looked at
> claimed that such detached repositories cannot be reattached. TLDR there is
> only on MR that still might be mergeable.

Annoying, but recoverable.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: GitLab | Projects forced to "Private" (#294196)

2020-12-17 Thread Eric S. Raymond via devel
Sanjeev Gupta :
> As of 20 minutes ago, I can now pull from the repository unauthenticated.

Yes, and the visibility is now :Public" in the settings.

Looks like the problem is solved.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Blizard of mail from GitLab-Abuse-Automation

2020-12-16 Thread Eric S. Raymond via devel
Sanjeev Gupta via devel :
> Ah, so not my fault.
> 
> I tried updating my fork about 11 hours ago, and was to authenticate to
> pull from the NTPsec git repo. I tried with another repo, it worked, so I
> assumed one of us was modifying the security settings of the repo.

Somwething either very specific o very random is going on. All of the
dozen or so of my personal projects Ive had time to check are fine -
not taken private and it looks like the config button for public/private
would still work.

Mark Atwood has been briefed. I think he knows a phone number at Gif:ab. 
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Unity warnings

2020-09-06 Thread Eric S. Raymond via devel
Hal Murray :
> Please do and/or please fix our local copy.  I'm focused on the 
> restrict/unrestrict tangle.

Bug fixed, but I cant finf any way to subnutt uissues on ther
bugracker.  Yes, I have a validayed account ob the site.
Sill looking.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Unity warnings

2020-09-06 Thread Eric S. Raymond via devel
Hal Murray via devel :
> 
> I assume a  fix for this should be pushed upstream.
> 
> ../../tests/unity/unity.c: In function ‘UnityFail’:
> ../../tests/unity/unity.c:1757:6: warning: function might be candidate for 
> attribute ‘noreturn’ [-Wsuggest-attribute=noreturn]
>  1757 | void UnityFail(const char* msg, const UNITY_LINE_TYPE line)
>   |  ^
> ../../tests/unity/unity.c: In function ‘UnityIgnore’:
> ../../tests/unity/unity.c:1794:6: warning: function might be candidate for 
> attribute ‘noreturn’ [-Wsuggest-attribute=noreturn]
>  1794 | void UnityIgnore(const char* msg, const UNITY_LINE_TYPE line)
>   |  ^~~
> 

Yes.  I'll do it if yiu have not alreadsy idebtified upsrream.
-- 
        http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: NEWS for release

2020-09-04 Thread Eric S. Raymond via devel
Hal Murray :
> 
> > Looking at the changelog, the only one that jumps out at me is James's new 
> > UI
> > options for ntpq. 
> 
> Should we have a policy of mentioning all UI changes?

I think so.  I've been trying to do that whenever I write NEWS entries.

> How about config file changes?

Well, sure. Not that I'm expecting any in the foreseeable future

> > After we ship I''ll get off my butt about the recvbuff cleanup.
> 
> What do you have in mind?
> 
> I think I have fixed all the get/free stupidities that used to be in the main 
> line server code so it isn't wasting CPU cycles.  There is still a free list. 
>  
> I forget how it is used.  I think some/most refclocks use it, maybe only the 
> ones that read serial data.  It would be good to totally nuke the free list.

Totally nuking the free list was what I haf in mind.
 
> We should make a pass through devel/TODO.adoc and devel/TODO-NTS

Agreed,
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: I'm giving up on seccomp

2020-09-04 Thread Eric S. Raymond via devel
Hal Murray :
> Is this an opportunity to clean up that area?

I don't think so. It's pretty clean now, functionally speaking - I put
a lot of work into rationalizing the configurator structures and the
way they talk to the protocol machine.

I suppose it might be if we were willing to make
cimparibility-breaking changes to the configuration grammar, but I
have not been presented with any good reason to do that.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: I'm giving up on seccomp

2020-09-04 Thread Eric S. Raymond via devel
Hal Murray :
> 
> >> If you want to claim your Go program has no buffer overruns,
> >> you can't call out to big complicated libraries written in c.  You
> >> would have to rewrite them in Go.
> 
> > Fair point. That changes my to-do list. 
> 
> Could you please say more?  What did you add or drop?

Means seccomp in Go has to be on it.  Though not at high priority.

The current blocker on the port is that Flex and Bison can't yet emit
Go code.  I'm probably going to have to fix that myself.  There are roughly
equivalent tools such as goyacc that are fine for new projects, but
guaranteeing that you get the same accept grmmar from specifications in
two different generators is so difficult that I'd rather retarget
Flex and Bison than deal with that problem.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: NEWS for release

2020-09-04 Thread Eric S. Raymond via devel
Hal Murray via devel :
> Are any of the recent changes interesting enough to mention in NEWS?

Looking at the changelog, the only one that jumps out at me is James's new
UI options for ntpq.

Otherwise the recent stuff is bug triage and validator cleanups (LGTM
looks like a big win).  I wouldn't want to try doing anything deeper
this close to a release. None of it's user-visible.

After we ship I''ll get off my butt about the recvbuff cleanup.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Python support policy

2020-09-04 Thread Eric S. Raymond via devel
Gary E. Miller via devel :
> As previously mentioned here: RHEL 6.

Which is about to EOL.

> Supporting Python 2 is trivial.  Why the hate?

Because it's not in fact trivial. It's *doable*; Peter Donis and I are
the expers on how to do it.  But it's not trivial.  It proliferates
code and test paths, and therefore increases attack surface.

Therefore we should drop Python 2 support as soon as the benefit of
keeping it drops below the cost.  I think that happens the moment
python3 becomes a reliable thing to put in shebangs on all our
primary platforms.

There's a strong case this has already occurred, and that case will
be closed when RHEL 6 goes EOL.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




signature.asc
Description: PGP signature
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Python support policy

2020-09-03 Thread Eric S. Raymond via devel
Gary E. Miller via devel :
> Yo Eric!
> 
> On Thu, 3 Sep 2020 14:13:05 -0400
> "Eric S. Raymond"  wrote:
> 
> > Gary E. Miller via devel :
> > > Say what?  This has zero to do with libraries.  
> > 
> > Sure it dies.  Use different versions of Python, require different
> > library paths.
> 
> No more than any other versioned programs use their versioned backend
> libraries.  But that is all a long solved problem.  I never, ever
> heard of anyone having problems with Python internal libs, except
> cross-compiling.
> 
> Or do you mean PYTHONPATHS?  Which is a differnt thing.

That's what I meant, yes.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




signature.asc
Description: PGP signature
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Python support policy

2020-09-03 Thread Eric S. Raymond via devel
Gary E. Miller via devel :
> Say what?  This has zero to do with libraries.

Sure it dies.  Use different versions of Python, require different
library paths.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




signature.asc
Description: PGP signature
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Python support policy

2020-09-03 Thread Eric S. Raymond via devel
Gary E. Miller via devel :
> > I don't want to go down this road.  I have ugly memories associated
> > with a smiliar hack in gpsd, long ago.
> 
> But what about how it works now?  All the maintainers like it.

Oh dear Goddess. We are *still* mutating shebangs in GPSD?  I must have
blotted that from my memory.

One reason to stop doing this kind of thing is library-path confusion.
Yes, I know we have a lash-up that barely works around that in GPSD
but our Debian package maintainer damn near has an aneurism any time
he thinks about it and I can't blame him a bit.

It will come time to end Python 2 support in GPSD soon.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




signature.asc
Description: PGP signature
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Python support policy

2020-09-03 Thread Eric S. Raymond via devel
Richard Laager via devel :
> > You realize that the POSIX doc says a pace after "#!", but so many do it
> > wrong they accept that variant.
> In NTPsec, there are 122 the wrong way and 81 the right way. As you say,
> either works. I don't particularly care about the space personally, but
> we can use the space if you want to be pedantically correct.

I just fixed all of those.

This will probably never matter, but part of our project style is supposed
to be standards conformance so tight it squeaks.
-- 
    http://www.catb.org/~esr/";>Eric S. Raymond




signature.asc
Description: PGP signature
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Python support policy

2020-09-03 Thread Eric S. Raymond via devel
Richard Laager via devel :
> RHEL 6 support (measured in terms of security updates) ends in November
> of this year. So by the time a version of NTPsec releases without Python
> 2 support, we'd be looking at RHEL 7.

On top of that, it has been Red Hat's official position for some time
that RHEL 6 shout *not* block transition to Python 3 only.

There is some easy, officially endorsed workaround that I don't
remember. Unfortunaly I can't find RH's statement about this; I'd
recognize it if I saw it but web searches aren't turning it up.

There may be other reasons to keep Python 2 support, but as Richard
says RHEL 6 will stop being one of them before our next point release
after this one.

This is not a judgment I am making casually.  Peter Donis and I wrote this:
http://www.catb.org/esr/faqs/practical-python-porting/  It's still the best 
guide
on how to write Python that runs under both 2 and 3.  Peter and I have been 
tracking
the transition very closely and one of the questions I'm keeping in my mind
is when I amend that document to say "there is no longer any point in this 
for new code".  

That moment is nearly upon us, and I'm pretty certain it will arrive before
2020 ends.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




signature.asc
Description: PGP signature
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Pre-release cleanup

2020-09-03 Thread Eric S. Raymond via devel
Matthew Selsky :
> I also previously setup Codacy in order to see what other SAST systems could 
> do. See https://app.codacy.com/gl/NTPsec/ntpsec/dashboard
> 
> Let me know what you think.  If either are useful, I'll integrate them more 
> tightly in our workflows.

I'm already a fan of LGTM - it picks up Python issues none of the rest of our 
validators do
and seems at near parity on the C stuff.

I'll take a look at Codasy.
-- 
    http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: I'm giving up on seccomp

2020-09-03 Thread Eric S. Raymond via devel
Hal Murray :
> 
> e...@thyrsus.com said:
> >> I think you have jumped to an unreasonable conclusion by assuming that Go 
> >> makes seccomp unintestering.  Are you going to rewrite OpenSSL in Go?
> > No.  There's an opennsl binding: ...
> 
> That's the whole point of my comment.  OpenSSL is written in c.  If there is 
> a 
> typical buffer overrun bug in OpenSSL, seccomp would be as helpful for a Go 
> version of ntpd as it is for the current version.
> 
> If you want to claim your Go program has no buffer overruns, you can't call 
> out to big complicated libraries written in c.  You would have to rewrite 
> them 
> in Go.

Fair point. That changes my to-do list.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Python support policy

2020-09-02 Thread Eric S. Raymond via devel
Richard Laager via devel :
> Let me start over now that I've reviewed the specifics of the NTPsec
> scripts and build system again:
> 
> We are currently using "#!/usr/bin/env python" in all the scripts, and
> waf uses the same. The minimum to do to drop Python 2 is:
> 
> 1. Change waf's shebang:
>  -#!/usr/bin/env python
>  +#!/usr/bin/env python3
> 
>That way, on systems where python is Python 2, waf will get Python 3.
>While there is a lot of baggage around whether "python" is Python 2,
>Python 3, or neither (i.e. doesn't exist), python3 existing is pretty
>universal. [Discussion 1]
> 
> 2. Change the other scripts the same way.
> 
> 3. Update the docs and CI accordingly.

I'm in favor of this simple, brute-force approach.

If nobody comes up with a good argument for retaining Python 2
support, I will ask Mark to include in the release notes that this
is our *last* release with Python 2 support.  Then we'll rip out the
Python 2 code paths before the next one.

> An additional improvement (which we could do separately from the Python
> 2 vs Python 3 discussion) would be to allow the user to customize the
> shebang with a build flag. It turns out we already have `./waf configure
> --python` which defaults to sys.executable. This is stock waf behavior.
> We are already running the Python scripts through subst. We just aren't
> doing the last piece of using @PYTHON@ as the shebang. So we could either:
> 
>   A. Change the Python shebangs to:
>#!@PYTHON@
>  I just tested that on one and it works as expected.
>   B. Leave them as "/usr/bin/env python3" and write a custom subst
>  function to replace that (and do the usual subst).
> 
> Option A is trivial. I could have that patch done in 10 minutes.
> 
> Option B is a little bit more work, but keeps the scripts directly
> executable from the source tree, which could be helpful for development.
> (The other substitutions aren't typically critical, as they are things
> like @NTPSEC_VERSION_EXTENDED@.) Is this something people care about?

I don't want to go down this road.  I have ugly memories associated
with a smiliar hack in gpsd, long ago.

>python3 exists on Debian and derivatives as well as RedHat and
>derivatives. Ubuntu 20.04 optionally allows python to point to
>python3, but always still has python3. I use Debian, Ubuntu, and
>RedHat, so I have personal knowledge here.

Yes.  To the best of my knowledge every current Unix-like thing
does right with python3 in the shebang line.  That makes hacking those
shebangs as unnecessary as it would be hazardous.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




signature.asc
Description: PGP signature
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: I'm giving up on seccomp

2020-09-02 Thread Eric S. Raymond via devel
Gary E. Miller via devel :
> Buffer overruns are just one way a program might make unexpected system
> calls.  Even if you can guarantee that a Go program could never be
> maliciously corrupted externally, you can never guarantee that the
> Go program can not be trojaned.

Everything is cost gradients.

Yes, a Go program could be Trojaned, but (a) that is far less likely
than a buffer overrun is in C, and (b) there are reasonably efficient
auditing methods to detect Trojanning, good enough that even static
analyzers lilke Coverity and LGTM can usually catch them by looking
for shellouts.  Syscall blocking is not really the best-fit tool for
defense against this kind of attack.

Daniel knows more about this sort of thing than I do and might correct
me, but it's my impression that syscall blocking is *specifically* a
best-fit defence against object-code weird machines prpoduced by
buffer-overrun and stack-corruption attacks, and its utility drops off
sharply for other kinds of attacks that are better foiked in different
ways.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




signature.asc
Description: PGP signature
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: I'm giving up on seccomp

2020-09-02 Thread Eric S. Raymond via devel
Hal Murray :
> 
> e...@thyrsus.com said:
> > I think you misunderstand.  I don't believe seccomp is objectively very
> > important in itself, and never have.  My problem with dropping it is that if
> > we do that, we could be seen to have abandoned part of a security defense in
> > depth because it's too much work.  That's not a good look for a project with
> > our mission statememt. 
> 
> "too much work" in an interesting phrase.  How does that compare with "not an 
> efficient use of our resources"?

Sometimes you have make efforts in the direction of being seen to do the right
thing that can't be considered strictly efficient.  Support for the BSD Unixes
is in this category, too.

> I didn't mean to suggest that we should drop it totally, just that I was 
> giving up on tightening things down such that we only allowed the syscalls 
> needed by a particular distro/version/hardware/???

Oh.  I'm fine with that path.  I thought you wanted to heave seccomp
overboard in its entirety.

> I think you have jumped to an unreasonable conclusion by assuming that Go 
> makes seccomp unintestering.  Are you going to rewrite OpenSSL in Go?

No.  There's an opennsl binding: 
https://godoc.org/github.com/spacemonkeygo/openssl

> Even without that, are you sure there are no bugs in Go?

No, I'm not.  But neither do I think seccomp is actually *possible* in Go
at this point, which tends to relieve us of having to support it.

> Maybe we should think harder about splitting NTS-KE out from ntpd.

I lean against that - it seems like that split would make deploytment
and configuration more complicated.  But I could be persuaded.

> My comment about early-droproot wasn't clear.  There will be a few more 
> syscalls needed by the code between early and normal droproot.  Since we 
> aren't processing packets during initialization there is low risk of bad guys 
> getting in.  But if they do get in post-initialization, they have a few more 
> syscalls they can use.

Got it.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: I'm giving up on seccomp

2020-09-02 Thread Eric S. Raymond via devel
Gary E. Miller via devel :
> Lost me.  seccomp applies to Go as much as it applies to C.

Why do you think so?  My understanding is that the reason you want to
block unexpected system calls is becase C buffer overruns can be used
to make weird machines.

You can't do that in Go, because there's no pointer arithmetic and
array accesses are all bounds-checked. Thus the utility of blocking
unexpected system calls pretty much vanishes.

Is there something wrong with this reasoning?
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




signature.asc
Description: PGP signature
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Pre-release cleanup

2020-09-02 Thread Eric S. Raymond via devel
Sanjeev Gupta :
> They support *any* git repository.

Huh.  Their docs are out of date, then.

> Please see: https://lgtm.com/projects/g/ntpsec/ntpsec/?mode=list

That's excellent.  I'll chew through some of these today.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Pre-release cleanup

2020-09-02 Thread Eric S. Raymond via devel
folkert :
> > > > I've resolved all the Coverity warbings except a new one, "risky
> > > > function" related to random(3)/  The supression cookie for that one
> > > > is not suppressing; I've sent a requesr to Synopsis about tis.
> > > 
> > > Is ntpsec also checked by 'lgtm.com'? They also do all kinds of
> > > verifications on source-code.
> > 
> > First I've heaerd of it. Can you point me at a tutorial on how to use it? 
> 
> I only found it recently as well :-)
> 
> They interface with github and bitbucket.
> 
> https://lgtm.com/help/lgtm/getting-started

Unfortunately, thast's a blocker.  We're on gitlab.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Pre-release cleanup

2020-09-02 Thread Eric S. Raymond via devel
folkert :
> > I've resolved all the Coverity warbings except a new one, "risky
> > function" related to random(3)/  The supression cookie for that one
> > is not suppressing; I've sent a requesr to Synopsis about tis.
> 
> Is ntpsec also checked by 'lgtm.com'? They also do all kinds of
> verifications on source-code.

First I've heaerd of it. Can you point me at a tutorial on how to use it? 
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Python support policy

2020-09-02 Thread Eric S. Raymond via devel
Python 2 was end-of-lifed at the beginning of January this year.
All our primary target platforms fully support Python 3.

Retaining support for Python 2 proliferates test paths and
complicates the fix for at least one outstanding bug.

Philosophically, I'm a fan of dropping legacy support
when it advances our security mission by decreasing
attack surface and improving auditability/maintainability.

Proposed: We should drop support for Python 2 and use a python3
shebang in all our scripts.

Discuss.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond

"Say what you like about my bloody murderous government," I says,
"but don't insult me poor bleedin' country."-- Edward Abbey
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Pre-release cleanup

2020-09-02 Thread Eric S. Raymond via devel
I've resolved all the Coverity warbings except a new one, "risky
function" related to random(3)/  The supression cookie for that one
is not suppressing; I've sent a requesr to Synopsis about tis.

I've close two stale issues.  I think there are a few more that can
be retired. I'm continuing tio work on this.

As usual, I don't see anything really serious on the issue list.
Good work, everyone!
-- 
    http://www.catb.org/~esr/";>Eric S. Raymond

If a thousand men were not to pay their tax-bills this year, that would
... [be] the definition of a peaceable revolution, if any such is possible.
-- Henry David Thoreau
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: I'm giving up on seccomp

2020-09-02 Thread Eric S. Raymond via devel
Hal Murray :
> You keep saying seccomp is important.  What does it buy us?  ntpd is a big 
> complicated program.  It reads and writes files.  It opens network 
> connections.  What else would a bad guy need to do?

I think you misunderstand.  I don't believe seccomp is objectively
very important in itself, and never have.  My problem with dropping it
is that if we do that, we could be seen to have abandoned part of a
security defense in depth because it's too much work.  That's not a
good look for a project with our mission statememt.

On the other hand, I can't blame you a bit for being tired of this
rathole, because it is indeed a huge pain in the ass for marginal
gain. I don't think any of your analysis is even a little wrong about
that.

The solution is simple and obvious, if really annoying for me.  You
should assign seccomp-related bugs to me and I will deal with them.
Think of this as incentive for me to get serious about moving the
daemon to Go :-).

(In Go, the equivalent of seccomp is neither possible nor necessary.
What makes it unnecessary is that while you can crash a Go program
with a bad reference, you can't weird-machine it. (Actually
technically you can, but it takes a specific evasion of the language's
safety rules thriugn the unsafe module)).

> [Is early-droproot a bug?]

At the current state of my knowledge I don't think so.  But you just
put auditing the stretch of code between the two droproots high on my
priority list.  It's an open question whether early droproot is worth
its complexity cost. This is not a case like seccomp where the set of
exploits closed off is effectively unbounded - in this case the
security cost of proliferating code and test paths may not be worth
the earlier privilege drop.

> We use 2 large libraries that do lots of syscalls: libc and libssl.  Most of 
> libc is a thin wrapper over the obvious.  Sometimes it is a little less thin 
> when translating an old version into a newer syscall.
> 
> The complicated part of libc that we use is DNS lookups.  That's a pain to 
> debug.  DNS disables most signals so you can't bail out without letting it 
> cleanup.  So it crashes rather than letting our trap handler print an error 
> message.
> 
> I have work-in-progress that lets you setup a list of syscalls actually used 
> by your environment.  It does roughly the following:
>   Splits the list of syscalls to be allowed out to a separate file that gets 
> #includ-ed.
>   A handful of scripts that process strace output to make lists of needed 
> calls.
>   It needs some waf work to specify the filename.  I'm just patching a link 
> to 
> point to the right file.
> 
> To collect the data, you have to run ntpd under strace.  While it is running, 
> you have to tickle all the uncommon code paths.  Things like switching to a 
> new ntpd.log after log rotate or reloading the cert file after it has been 
> updated.  I have a list.  I don't know how complete it is.
> 
> One of the scripts includes a few syscalls that are hard to tickle.  That 
> would need double checking.
> 
> Basically, making a file is enough of a pain that I don't think it's 
> practical.  You have to be a semi-wizard in order to run the recipe.  A new 
> libc or libssl may break things.
> 
> If somebody else wants to pick up this work I'll be glad to hand over what I 
> have.  Otherwise, I'll drop it.  (It's not hard to recreate from scratch if 
> you understand the above description.)

I'm going to say drop it, and here's why.

We've already seen the frequency of seccomp bugs drop over time, and that's
to be expected. There should be fewer in the fture than there have been
in the past, lowering the value of building those specualized tools. Me,
I'd rarher you spent your effort on devoising better test protocols
as you have been doing.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: How about a release soon?

2020-09-01 Thread Eric S. Raymond via devel
Hal Murray :
> > I'll do a bug-triage pass.
> 
> I've seen a couple of changes go by.  Thanks.
> 
> Please let me/us know when you are finished so I can test things.

Will do.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: How about a release soon?

2020-09-01 Thread Eric S. Raymond via devel
Hal Murray via devel :
> > Has there been enough user visible improvement to warrent a release of 
> > 1.1.10?
> 
> I think so.
> 
> 1.1.9 doesn't know about the new port number for NTS-KE.
> 
> There is also a bug fix for a missing error message.  Without that message, 
> it's really tough to debug an obscure case of certificates not working.

I'll do a bug-triage pass.

Also, we need to decide if we want to switch to hosting the tarballs on
GitLab itself using Ian's igor tool this release.  If so, that will have a
few consequencea on th website.
-- 
    http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Has anybody seen a system without STA_NANO?

2020-08-25 Thread Eric S. Raymond via devel
Daniel Franke :
> clock_gettime is. Adjtimex isn't in any standard except for an obscure RFC
> that nobody follows.
> 
> On Tue, Aug 25, 2020, 20:47 Eric S. Raymond via devel 
> wrote:
> 
> > Hal Murray via devel :
> > >
> > > When was clock_gettime and struct timespec introduced?
> > >
> > > We can cleanup some cruft if we assume it exists.
> >
> > Assune it.  These are in the Single Unix Standard.

Hal, were you asking about adjtimex?  Because struct timespec isn't what
adjtimex works with - it's associat4d wuj clock_gettime().
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Has anybody seen a system without STA_NANO?

2020-08-25 Thread Eric S. Raymond via devel
Hal Murray via devel :
> 
> When was clock_gettime and struct timespec introduced?
> 
> We can cleanup some cruft if we assume it exists.

Assune it.  These are in the Single Unix Standard.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Seccomp tangle

2020-05-27 Thread Eric S. Raymond via devel
Hal Murray :
> 
> e...@thyrsus.com said:
> > Aaarrgghhh.  It;s a huge pain in the ass and I wish it weren't interesting.
> > But given our mission statememnnt, it has to be. 
> 
> Just to make sure we are on the same wavelength...
> 
> My question/proposal was not to drop seccomp if we didn't do what I sketched 
> out.  It was to allow a slightly tighter/cleaner list of syscalls if you were 
> willing to put in the work to collect the data.  The old merger of all 
> syscalls ever seen on any system approach would still be the default if you 
> enabled seccomp and didn't specify your own list.

Understood.

Now I'm torn between devel/ and contrib/. Use your judgment.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Seccomp tangle

2020-05-27 Thread Eric S. Raymond via devel
Hal Murray :
> The first quirk is that ntpd isn't on the #include search path.
> (My hack was to put a link from include to ntpd/seccomp)
> What's the right way to handle this?  (Maybe I just fatfingered things.)

Generating the requred stuff into include, where all the headers live,
ought to work. Unless I'm missing something.

> Where should the scripts and directions live?

devel/, I think.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Seccomp tangle

2020-05-26 Thread Eric S. Raymond via devel
Hal Murray via devel :
> 
> I've been experimenting with some code to allow custom scccomp lists.
> 
> The idea is to replace the --enable-seccomp configure option with
>   --enable-seccomp=foo
> and ntp_sandbox would include syscomp/foo.c which would be a list of syscalls 
> used by this system.
> 
> I assume we would maintain a list for each OS/distro/version/hardware 
> combination that we are interested in.  I have a few scripts that turn strace 
> output into a list.  ...
> 
> Is this interesting?  If not, I'll drop it.
> 
> If yes, I'll need some help to work out the details.

Aaarrgghhh.  It;s a huge pain in the ass and I wish it weren't interesting.
But given our mission statememnnt, it has to be.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: BSD-4-Clause-UC license usage

2020-04-30 Thread Eric S. Raymond via devel
Matthew Selsky via devel :
> Hi Hal and team,
> 
> Much our of NTS code uses BSD-4-Clause-UC instead of BSD-2-Clause (our 
> preferred license for new code).
> 
> What this license selection intentional?

No. It's a historical accident.

> Is BSD-4-Clause-UC intended for code owned by the University of California, 
> or does it make sense for others to use this license as well?

BSD-4-Clause-UC is the original version of the license.  It was propagated by 
the BSD family
of Unix variants and came into wide use because of that.  Over time, the Board 
of Regents has
gradually removed clauses and now recommends BSD-2-Clause.

It's not considered controversial to move from BSD-4-Clause-UC to one of the 
more recent variants
approved by the Board of Regents.  Mark, as our external-relations specialist, 
should sign off
on this, but I'm willing to do it.
-- 
        http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: The Epic of MRUlist.

2020-04-28 Thread Eric S. Raymond via devel
Ian Bruene via devel : 
> * What happens when a packet in the middle of the sequence is dropped? Who
> knows! If it is seen as a timeout then the client will adjust packet size
> and try again... forever. Or maybe it silently doesn't notice?
> 
> * What happens when the final packet is dropped? Same as before, except that
> never seeing the "now" field means that silent failure will result in an
> infinite loop. I think.
> 
> * If the server data changes enough while the request sequence is running
> the system can just fail for no good reason because the error handling for
> that doesn't exist. Arguably that is when things are going well; I can
> imagine some subtle and wacky hijinks when dumb luck causes it to not fail
> properly.

One of these probably exoplains the mystery bug Hal has reoported on
WiFi links.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: NTS dropping TLS 1.2

2020-03-23 Thread Eric S. Raymond via devel
Hal Murray :
> We can do several things:
>   1) clean out the ifdefs that make things work with older versions of 
> OpenSSL.
> That is drop support for systems that haven't upgraded their OpenSSL to a 
> supported version.
>   2) leave things alone, ignore the RFC.
> Or maybe add some nasty warning messages
> How long?
>   3) make a configure option to disable NTS so that NTPsec builds on older 
> OSes but doesn't support NTS.
> 
> I propose option 1.  Simple and clean.  I don't think we will drop many 
> systems.

I concur.
-- 
    http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: droproot, seccomp

2020-02-25 Thread Eric S. Raymond via devel
Hal Murray :
> I don't think it's worth the effort to maintain 2 lists.  We can revisit that 
> if you think it's appropriate.

No, I agree with you.

> There are 46 syscalls in each list and 55 in the merged list.

Brings up a question. Is the list of all syscalls used by everybody
large relative to any one distro+platform-specific list?

Because if not, I could geet behand having *one* list and just
whitelisting syscalls until we stop needing to.

46 to 55.  If just 9 syscalls are the difference, the very slightly
reduced assurance starts to look like a reasonable trade to make the
whole problem go away.

Which, mind you, I wouldn't say if I didn't think we had done a
quite effective job of hardening the rest of the code.  But I *do*
think that - which makes this worth consideration.
-- 
    http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: droproot, seccomp

2020-02-25 Thread Eric S. Raymond via devel
James Browning via devel :
> Is there anything preventing the possibility of an early looser
> seccomp setup and then tightening it later possibly with a knob
> to generate terse or verbose warnings instead of dying.

That is a very interesting idea that I think deserves further
examination.

Do you have an implementation strategy in mind?
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: seccomp mess, continued, status update

2020-02-23 Thread Eric S. Raymond via devel
Hal Murray via devel :
> 
> Fedora fixed their problem.  seccomp now builds and works on both Fedora and 
> Arch.
> 
> But now it won't build on Alpine.  It looks like the same problem that Fedora 
> had.  The problem is a bug in a header file.  Copying the ppoll bits from a 
> Fedora header file fixes the problem.
> 
> The CI checker has an Alpine step with seccomp.  It now fails.
> 
> Can somebody please disable the seccomp option or step until Alpine fixes 
> things?

Wouldn't it be simpler to ude a base image in the CI that isn't buggy? 
-- 
    http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: seccomp tangle

2020-02-23 Thread Eric S. Raymond via devel
Hal Murray via devel :
> Should we drop secomp?  It's a pain to maintain.

We're a security-focused prodict.  I don't think it would be good optics
to drop a layer of defense just because it's a pain to maintain.

> How many people use it?  Richard: do you turn it on for the Debian builds?

I have no idea hpw many people use it.

> How does seccomp compare to a jail?  Why don't we have a good web page on how 
> to setup and use a jail?  Does systemd have a jail option?  Does anybody run 
> in a jail?  ...

We don't have a good page on jails because I'm not experienced at setting them 
up
and mostly other people don't imotiate documenting things.

> Testing the version of the seccomp header file is probably cleaner than 
> testing for Arch.

Agreed.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Is there a clean way for waf to test for the distro?

2020-02-22 Thread Eric S. Raymond via devel
Hal Murray via devel :
> 
> Context is the seccomp tangle.  Issue #633
> 
> Should I just add a helper that looks in /etc/os-release?

lsb_release -a  might be useful here.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: New warning from NetBSD 9.0

2020-02-19 Thread Eric S. Raymond via devel
Hal Murray via devel :
> 
> NetBSD just released version 9.0.  It now generates this warning:
> 
> ../../ntpd/ntp_control.c:1476:34: warning: '%s' directive output may be 
> truncated writing up to 255 bytes into a region of size between 0 and 255 
> [-Wformat-truncation=]
> 
> char str[256];
> 
> snprintf(str, sizeof(str), "%s/%s", utsnamebuf.sysname,
>  utsnamebuf.release);
> 
> Has anybody seen this before and/or know how to fix it?

I've seen it before.  Rarely.

If your format string had only a single %s in it, you could fix it by
adding a precision specifier to the string (not a length specifier, a
*precision* specifier) which bounds the amount of bytes theb snprinf
can write into the buffer.

I guess you could give both %s-cookies a precision specifier of 128.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Possible cruft cleanup: clock_gettime vs getitimer

2020-02-16 Thread Eric S. Raymond via devel
Hal Murray via devel :
> devel/hacking.adoc says:
>   You *may* use clock_gettime(2) and clock_settime(2) calls, and
>   the related getitimer(2)/setitimer(2), from POSIX-1.2008.
> 
> My man pages say both are in POSIX.1-2001.
> 
> Is there any reason we don't pick one and discard the crufty ifdefs?
> 
> No big deal, just another minor cleanup that makes the code slightly easier 
> to 
> read and maintain.

I believe those #ifdefs are a port hack for a minor platform that's not
completely standards-cimpliant, most likely some version og Mac OS/X. 
-- 
        http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: --enable-doc waf config option removed

2020-02-02 Thread Eric S. Raymond via devel
Richard Laager via devel :
> On 2/2/20 3:44 PM, Jason Azze via devel wrote:
> > It looks like the --enable-doc waf configuration option was removed in the 
> > commit "Add support for other asciidoc processors". Was there any 
> > discussion about this change?
> 
> Yes. See the mailing list archive and MR !1037.

That MR conflated at least two changes that shouldb have been made
separately. And I don't see any rationale for the questionable part,
which is changing the configuration default.

Thios is partly my fault.  U've been concentrating pretty hard on the
GCC conversionm for months and havem't exercised the oversight I
should have.  Well, that epic is over; I'm back.

OK, I reached the relevant devs on #ntpsec.  Will pursue there.
-- 
    http://www.catb.org/~esr/";>Eric S. Raymond




signature.asc
Description: PGP signature
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Python, testing

2020-01-13 Thread Eric S. Raymond via devel
Hal Murray via devel :
> A year or 2 ago, I put together a script to test as many build time options 
> as 
> I thought reasonable.  It's in ./tests/option-tester.sh
> 
> Does anybody other than me use it?

I've run it once or twice, but's not easty to see how to integraste
it into our regularr test process.

> It's a bit of a CPU hog -- too much to run routinely.  Can we set things up 
> to 
> run it on the gitlab OS collection weekly or manually when we get close to a 
> release?

I have to defer to the CI expers on that one. It sounds like something
that should be possible.
-- 
        http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: ublox refclock

2019-11-25 Thread Eric S. Raymond via devel
Gary E. Miller via devel :
> > So there is nothing you recommend be merged at this time?
> 
> I sorta wish NTPsec had a staging area like the Linux kernel does.
> 
> There is value to a small and clean u-blox driver fully integrated
> into NTPsec.  But without KPPS it is inferior at timekeeping.
> 
> If the guy that wrote it wanted to work on it under the NTPsec umbrella
> that would be good.  A little guidance and the guy that wrote that could
> be very useful to NTPsec.
> 
> Without that guy, or someone interested in being that guy. I'd pass on
> that driver.  I'm not personally interested in upgrading that software to
> have the smarts that gpsd already does about the u-blox and KPPS.

I left an issue on his tracker about merging to upstream.  If he responds
I will try to bring him into the fold. If he does not, from what you say there
os not much loss hrere.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




signature.asc
Description: PGP signature
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Do we require clock_gettime()?

2019-11-25 Thread Eric S. Raymond via devel
Hal Murray :
> 
> >From devel/hacking:
> 
> Only POSIX-1.2001/SUSv3 library functions should be used (a few
> specific exceptions are noted below).  If a library
> function not in that standard is required, then a wrapper function for 
> backward
> compatibility must be provided.  One notable case is clock_gettime()
> which is used, when available, for increased accuracy, and has a
> fallback implementation using native time calls.
> 
> 
> 
> I haven't found any hints of a fallback mechanism in the current code.

You are quite right.  That documentation is stale; it predates the last
removal of a non-POSIX time call. IIRC that was a flaky part of an old version
of Mac OS X that we learned doesn't even work reliably - depends on which
rev of that major version of the OS.

You can remove it.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Clock fuzzing bugs

2019-11-25 Thread Eric S. Raymond via devel
Mark Atwood via devel :
> On Sun, Nov 24, 2019, at 19:32, Hal Murray via devel wrote:
> > 
> > e...@thyrsus.com said:
> > > If we don't see any evidence of beat-induced quantization, I'm willing to 
> > > say
> > > we drop this code. 
> > 
> > How about adding a --disable-fuzz configure option so we can experiment 
> > without breaking the default case.
> > 
> > Or maybe a runtime configure option.
> 
> I like that idea.

Can be done.  Not even difficult.  About three hours of work, I'd say.

For reasons Mark knows, however, I'm booked up to my eyebrows 
until 16 Dec and Ian maybe longer than that.

Anyone ekse willing to srep in?
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Clock fuzzing bugs

2019-11-25 Thread Eric S. Raymond via devel
Gary E. Miller via devel :
> Is there an existing patch to remove the fuzzing?

There is not. See my reply to Mark.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




signature.asc
Description: PGP signature
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: ublox refclock

2019-11-25 Thread Eric S. Raymond via devel
Gary E. Miller via devel :
> I just took a quick look at refclock_ubx.c
> 
> An interesting start, but followup messsages today on the list are
> assuming this driver does things that it does not do.
> 
> 1) It does not, ever, config the u-blox.  It does not, ever, write to
> the u-blox to query it.
> 
> Configuration is up to the user.
> 
> 2) It decodes UBX-TIM-TM2 (Current time) and UBX-TIM-TIMELS (for the
> leap second).  Then does some limited sanity checking.
> 
> It will fail to catch known u-blox time failure modes.
> 
> 3) It does some interesting things with TIO that the comments claim
> improves the time stability.  But it does not use KPPS which would
> just work better and simpler.
> 
> Anything that uses KPPS will work much better.
> 
> 4) It does not look at qErr, which combined with KPPS, might eventually,
> theoretically, lead to better time.  When CPU time quantization gets better.
> 
> In summary, not an improvement on current u-blox best practice.  Maybe,
> eventually, an improvement, with some work (configuration, KPPS, etc.).

So there is nothing you recommend be merged at this time?
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




signature.asc
Description: PGP signature
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: ublox refclock

2019-11-24 Thread Eric S. Raymond via devel
Udo van den Heuvel :
> On 24-11-2019 15:01, Eric S. Raymond wrote:
> > Udo van den Heuvel :
> >> I have an M8N on order, would that be compatible enough to this driver?
> >> If so: I could help test etc.
> > 
> > That can't hurt - they speak the same protocol - but the big deal with
> > the T variant os a stationary mode you don't have.
> 
> Ah, OK.
> The M8N was cheap so perhaps a fake; I can try to identify this when it
> comes in.

Not necessarily a fake; the 8N is the normal variant (without
stationary mode) and less expensive.  I suspect it's the same hardware
with a different firmware load - they price-discriminate because
they think the customers for stationary mode will pay more.

> As it appears I need a (real) M8T:
> What M8T board, cabling etc would I need to buy to interface to RS232?
> Would e.g. https://www.gnss.store/12-gnss-gps-modules be reputable enough?

Alas, I don't know. In the past the 8T was expensive and difficult to
find; I haven't worked with one.  I guess you get to be forward scout 
on this one.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: ublox refclock

2019-11-24 Thread Eric S. Raymond via devel
Udo van den Heuvel :
> I have an M8N on order, would that be compatible enough to this driver?
> If so: I could help test etc.

That can't hurt - they speak the same protocol - but the big deal with
the T variant os a stationary mode you don't have.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Clock fuzzing bugs

2019-11-24 Thread Eric S. Raymond via devel
Hal Murray via devel :
> I'm tempted to rip out that stuff.  I haven't quite convinced myself that it 
> isn't doing something important.  Eric?

The clock fuzzing?  It's an interesting question. I've thought about it.

I'm doubtful myself.  The obvious motivation would be to avoid beat
effects from the resolution of the system clock. You and Gary have a
better feel for signal analysis and our error budget than I do, but
for whatever my opinion is worth this seems to me like a mostly
theoretical problem.

We have pretty good visualization tools these days.  Gary, you know
best what normal perfornance looks like under ntpviz.  Would you be
willing to patch-disable fuzzing and see if that induces any
suspicious-looking sawtooth patterns in the graphs?

If we don't see any evidence of beat-induced quantization, I'm willing
to say we drop this code.
-- 
            http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: ublox refclock

2019-11-24 Thread Eric S. Raymond via devel
Udo van den Heuvel via devel :
> I cam across this ublox ntpsec refclock:
> https://gitlab.com/trv-n/ntpsec-ublox
> Would it be usable for incorporation in the ntpsec tree?
> (AFAIK this is a 'straight' refclock; no extra lines needed besides
> rx/tx and pps)

Thank you very much for bringing this to our attention.  The M8T is an
interesting chip in a product line a couple of our principals like
and use.  Not only is it a strong candidate for inclusion as a supported
driver, I'd be hard-put to think of a stronger one.

I have filed an issue on its tracker titled "Work should be merged to
upstream".  In it I encourage the developer to introduce himself on this 
list so we can discuss what would be required to integrate.
-- 
    http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: ntpq mrulist bug

2019-11-20 Thread Eric S. Raymond via devel
Hal Murray :
> Part of the problem is that there is a lot of cruft in that area.  For 
> example, grep for CERR_
> There is a clump of signals defined as part of a ControlSession, none are 
> ever 
> raised, a few are caught.  Looks like somebody decided to rename things to 
> SERR and never got around to finishing the cleanup.
> 
> There is another case were stuff is returned a couple of layers, but then 
> never used used.

I have unfortunate news for you.  I think both those "features" were
in the C version (I'm sure about the CERR/SERR duplication). The C was
so grotty that I did not dare attempt anything but the most literal
sort of translation of the lower layers. 

I got the distinct impression that the C was halfway through someone
else's rewrite that never got finished.  Looks to me like somebody was
moving towards having a C client layer for Mode 6 that could be
detached from the ntpq upper level.  I completed that part, basically
by cutting along the right dotted lines and dsturbing the ugly code
as little as I could get away with.

I did do some cleanup after the literal translation, but not in the
parts I was afraid to touch (the packet-reassembly code in
particular). You can be pretty sure I didn't introduce any complexity
that wasn't there before.

> [for, else]
> > That's a Pythonism.  An else clause attached to a for executes only if the
> > for ran to complewtion without a break.  In this case, the code checks for a
> > hole in the fragment sequence and sets the response field if there is no
> > hole. 
> 
> Thanks.
> 
> You have a tendency to use legal but uncommon constructs.  Is that a bug or 
> feature?  On the feature side, it makes the code more compact and maybe some 
> of us learn something.  On the bug side, it makes the code harder to 
> understand for those of us who don't mentally collect features.
>
> Is there a collection of obscure features and what they do?  I'd like to scan 
> (and bookmark) something like that.

*blink* How would I know what matches your map of "uncommon"?  How
would I know what features not to use?  I like you, Hal, but
that doesn't mean I can read your freakin' mind.

Ask me to solve something *easy*, like the Halting Problem.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: ntpq mrulist bug

2019-11-20 Thread Eric S. Raymond via devel
Hal Murray via devel :
> I know what's going wrong, but I'm not enough of a python geek to see a clean 
> fix.
> 
> The basic idea is that the client sends a request and the server sends back a 
> clump of packets.  The client specifies the max clump size.  What's happening 
> is that at least one packet of the clump is getting lost.  The code is 
> supposed to reduce the clump size when that happens, but that's not working.
> 
> Here is the code outline:
>   mrulist calls doquery to get a clump of data
> there is an except phrase to reduce the clump size
>   doquery calls sendrequest then getresponse
>   getresponse returns the answer in self.response
> it also returns None (and maybe falls off the end with no return?)
> 
> I think getresponse needs to return two things.
>   one is the data
>   the second is a flag: none, some, all
> 
> There are lots of raises and excepts in there that I haven't sorted out.
> 
> I think the code should return partial data.  It doesn't.  Or it doesn't get 
> processed.
> 
> What's the right way to structure this?  Should we fix the current code, or 
> make a drastic change?

I don't think I know yet.  I lean towards an incremental fix along the 
lines you describe, but it's also possible that there's a serious design
flaw that merits a rewrite.

Please file an issue and assign it to Ian Breune; he's the maintainer
for the Python parts.  I'll step in if he needs help.

> --
> 
> Where is the if for this else?  Can else go with something other than if?
> 
> The code past the else is the normal case.  It gets run if the break doesn't 
> happen.
> 
> This chunk of code is from near the end of getresponse in pylib/packet.py
> 
> # If we've seen the last fragment, look for holes in the sequence.
> # If there aren't any, we're done.
> if seenlastfrag and fragments[0].offset == 0:
> for f in range(1, len(fragments)):
> if fragments[f-1].end() != fragments[f].offset:
> warndbg("Hole in fragment sequence, %d of %d"
> % (f, len(fragments)), 1)
> break
> else:   <=== this one
> tempfraglist = [ntp.poly.polystr(f.extension) \
> for f in fragments]
> self.response = ntp.poly.polybytes("".join(tempfraglist))
> warndbg("Fragment collection ends. %d bytes "
> " in %d fragments"
> % (len(self.response), len(fragments)), 1)

That's a Pythonism.  An else clause attached to a for executes only if the for 
ran
to complewtion without a break.  In this case, the code checks for a hole in the
fragment sequence and sets the response field if there is no hole.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


  1   2   3   4   5   6   7   8   9   10   >