head broken if no refclocks

2016-06-26 Thread Hal Murray
after a simple ./waf configure

[murray@fed raw]$ ./waf build
--- building host --- 
Waf: Entering directory `/home/murray/ntpsec/raw/build/host'
[1/5] Processing ntpd/ntp_parser.y
[2/5] Compiling build/host/ntpd/ntp_parser.tab.c
/home/murray/ntpsec/raw/ntpd/ntp_parser.y: In function ‘yyparse’:
/home/murray/ntpsec/raw/ntpd/ntp_parser.y:996:33: error: 
‘num_refclock_conf’ undeclared (first use in this function)
for (dtype = 1; dtype < (int)num_refclock_conf; dtype++)
 ^
/home/murray/ntpsec/raw/ntpd/ntp_parser.y:996:33: note: each undeclared 
identifier is reported only once for each function it appears in
/home/murray/ntpsec/raw/ntpd/ntp_parser.y:997:12: error: ‘refclock_conf’ 
undeclared (first use in this function)
if (refclock_conf[dtype]->basename != NULL && 
!strcasecmp(refclock_conf[dtype]->basename, $2) == 0)
^

Waf: Leaving directory `/home/murray/ntpsec/raw/build/host'


-- 
These are my opinions.  I hate spam.



___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Wonky NTP startup and the incremental-configuration problem

2016-06-26 Thread Hal Murray

An alternative option would be to implement rereading ntp.conf.

For each line in ntp.conf, there are 3 possibilities.  It's new or the value 
has changed, nothing has changed, or the item was dropped.  The latter is the 
tricky case.

The idea is to save a parsed copy of the old ntp.conf.  As the new file is read 
in, kick out the old items (if any) as they get replaced.  (Actually, move them 
to what will be the new saved info.)  Anything left on the old saved-list needs 
to be set back to the default.

That works for simple things like setting a parameter.  It gets more 
complicated for things like server/pool/refclock.

It feels like something that's reasonably clean with the appropriate table.

We would need a way to test things.  I wonder if we could do that from a script 
driving the debugger?


-- 
These are my opinions.  I hate spam.



___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Wonky NTP startup and the incremental-configuration problem

2016-06-26 Thread Eric S. Raymond
Heads up, Mark!

Achim Gratz :
> > It would be better for code verifiability and security if the
> > only source of configuration information for the daemon were the
> > ntp.conf file.  (We can't quite get there due to the requirement
> > to store drift state, but closer would be better.)
> 
> If you do that, you need some way to change the configuration without a
> wholesale restart of ntpd or at least determining and tweaking fudge
> factors gets a lot harder than it already is.

I didn't really understand that a week ago.  Now, having studied ntpq
in the process of getting rid of magic addresses, I do.

Mark, this bears on a conversation we were having at Penguicon.  There
are good complexity-reduction reasons for wanting to deep-six all the
run-time configuration stuff.  But until we've beaten the slow-convergence
problem into the ground I now think we shouldn't do it.

The problem is that at least some people are used to changing fudge
times and driver options on the fly in order to find a
jitter-minimizing set.  Shutdown/startup can introduce nasty
transients that screw up this process.

If we remove the ability to avoid those transients, some experienced
operators will be pissed off, and not without reason.  It's a fight
I think we should avoid; I'd rather spend peoples' limited tolerance
for change elsewhere.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Fwd: New Defects reported by Coverity Scan for ntpsec

2016-06-26 Thread Eric S. Raymond
Mark Atwood :
> ** CID 149750:  Uninitialized variables  (UNINIT)
> /ntpd/ntp_intercept.c: 855 in intercept_replay()

Known.  Not fixed because that code isn't in the new TESTFRAME branch;
it''s going to go away.

> ** CID 149749:(UNINIT)
> /ntpq/ntpq-subs.c: 1751 in doprintpeers()
> /ntpq/ntpq-subs.c: 1794 in doprintpeers()

> ** CID 149748:  Null pointer dereferences  (NULL_RETURNS)
> /ntpd/refclock_jjy.c: 2374 in jjy_receive_seiko_tsys_tdc_300()

I repaired these last night in a commit commented 'Fix glitches detected by
Coverity."  And I'd think that was the end of it, except for GitLab 
Issue #86: "latest ntpq segfaults", which suggests that Dave Morgan's
repo still has one of the ntpq-subs bugs in it.

I'm a little puzzled.  Is everybody properly synced up?  Mark, have you
got 'Fix glitches detected by Coverity.' in your history? Where and when
was this scan taken?
-- 
http://www.catb.org/~esr/;>Eric S. Raymond
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Use of pool servers reveals unacceptable crash rate in async DNS

2016-06-26 Thread Eric S. Raymond
Hal Murray :
> 
> e...@thyrsus.com said:
> > Ugh.  Our options have just narrowed.  I've just seen
> > libgcc_s.so.1 must be installed for pthread_cancel to work Aborted (core
> > dumped)
> 
> > with memlock off in the build.
> 
> Can you reproduce it?
> 
> My guess is that you didn't really get memlock turned off.  How about putting 
> a break on mlockall or the call to it.  (There is only one in ntpd.c)

This is possible.  I will attempt to reproduce.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Use of pool servers reveals unacceptable crash rate in async DNS

2016-06-26 Thread Eric S. Raymond
Hal Murray :
> If it uses threads, we still have the problem of not being able to load the 
> thread cleanup code.

Maybe.  We don't know if the libc implementation is vulnerable to that bug or
not.  I should do an experimental implementation on a branch and find out.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Use of pool servers reveals unacceptable crash rate in async DNS

2016-06-26 Thread Hal Murray

e...@thyrsus.com said:
>> Is getaddrinfo_a() in RTEMS?  QNX?   BSD?
> It's not an OS thing, it's a toolchain thing.  getaddrinfo_a() is
> implemented using standard C and POSIX threads, it doesn't need OS-specific
> support.

Or it's in an optional extra library.

> Linux has it because Linux uses libc whether you're compiling with gcc or
> clang.  Any of those other platforms will have it *if* they have (gcc ||
> clang) && glibc. 

My Linux man page says:
   #define _GNU_SOURCE /* See feature_test_macros(7) */
   Link with -lanl.

I couldn't find it in /usr/include/ on NetBSD or FreeBSD.  On Linux, it's in 
netdb.h.

--

If it uses threads, we still have the problem of not being able to load the 
thread cleanup code.


-- 
These are my opinions.  I hate spam.



___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Use of pool servers reveals unacceptable crash rate in async DNS

2016-06-26 Thread Eric S. Raymond
Mark Atwood :
> Is getaddrinfo_a() in RTEMS?  QNX?   BSD?

It's not an OS thing, it's a toolchain thing.  getaddrinfo_a() is implemented
using standard C and POSIX threads, it doesn't need OS-specific support.

Linux has it because Linux uses libc whether you're compiling with gcc
or clang.  Any of those other platforms will have it *if* they have
(gcc || clang) && glibc.

There is at least one other implementation out there, in a GPL-licensed
plackage called "adns".
-- 
http://www.catb.org/~esr/;>Eric S. Raymond
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Use of pool servers reveals unacceptable crash rate in async DNS

2016-06-26 Thread Mark Atwood
Is getaddrinfo_a() in RTEMS?  QNX?   BSD?

On Sun, Jun 26, 2016 at 7:06 AM Eric S. Raymond  wrote:

> Eric S. Raymond :
> > > What would you do if we discovered a case where we wanted it?
> >
> > Cry a lot.  Then add logic to force synchronous DNS when memlocking is
> > selected, and document this as a workaround for a bug we haven't fixed
> yet.
>
> Ugh.  Our options have just narrowed.  I've just seen
>
> libgcc_s.so.1 must be installed for pthread_cancel to work
> Aborted (core dumped)
>
> with memlock off in the build.
>
> I think the homebrew async-lookup code has to go.  Even if we installed
> the warmup fix, I don't think I'd trust it.
> --
> http://www.catb.org/~esr/;>Eric S. Raymond
> ___
> devel mailing list
> devel@ntpsec.org
> http://lists.ntpsec.org/mailman/listinfo/devel
>
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel

Re: My first positive structural change to NTP

2016-06-26 Thread Achim Gratz
Eric S. Raymond writes:
> The reason I disagree is I think you're overfocusing on the fact that
> both refclocks are the same physical device and underfocusing on the
> fact that they're two different data channels, possibly with different
> fudges and modes.

No, it's exactly my contention that this is an implementation detail
that ntpd doesn't really need to care about.  Besides, there are
refclocks that use multiple sources of time and can optionally deliver
either the combination of those sources or information on each source
seperately.  So ntpd conceivably needs to deal with multiple references
via the same data channel or the the same reference via multiple data
channels.  The only thing it should care about is whether that reference
is an absolute or relative time source.

> *Because* fudges and modes may differ, I think it is right for the
> configuration syntax to be data-channel-focused rather than
> device-focused.  Doing it the other way could land you in a spot
> where you want to specify differing per-channel behavior but cannot.

Ideally that part of the setup should be moved to a separate
per-refclock configuration file.  That file could then specify which
data channels are used and what their relations are, including any
"fudges" that relate to this specific instance.

> The prefer keyword may be a separate issue, and dispensable.  But
> I think that is a different, more specific argument.

The prefer keyword should get it's original meaning back.  The
association of an absolute refclock with a relative one in the sense of
"these pieces of time information are related to this stream of PPS
timestamps" should be a separate keyword.


Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Factory and User Sound Singles for Waldorf Blofeld:
http://Synth.Stromeko.net/Downloads.html#WaldorfSounds

___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: My first positive structural change to NTP

2016-06-26 Thread Achim Gratz
Hal Murray writes:
> strom...@nexgo.de said:
>> I think that's still perpetuating a mistake.  This whole business of having
>> to specify two servers (or refclocks) for the same thing should go away.
>
> There is a fundamental issue.  With a PPS, there really are two sources of 
> time.  Internally, ntpd needs two different handles so you can see both sets 
> of info on ntpq -peers and clockstats.

I think that at least initially, the ATOM driver was supposed to only
deliver a frequency to calibrate the other time sources to.  That is the
only case where it still makes sense to have a separate refclock entry
for, IMHO.  But for ntpd that clock can't be used directly since it only
provides relaztive time information.

> Normally, each PPS has an associated serial stream.  It would be good if 
> there were a clean way to specify that rather than using the prefer kludge.

That's a result of how getting the clock information into the ntpd has
been handled and which facilities are used to support it.  But
fundamentally, all ntpd cares about is that it gets a high resolution
timestamp in kernel time for some piece of absolute time information
from the refclock.  If a refclock driver provides hardware
timestamped memory buffers to ntpd, it wouldn't need to know or care
about the implementation detail that the timestamp was delivered on a
different line than the bits in hte memory buffer.

> Is there a udev equivalent on other OSes?

I guess anything that supports plug has something like it.  But if
not, a symbolic link or device creation script serves the same purpose.
With udev you can automate the device creation and removal and can do it
at runtime and not just at boot, which is more convenient, though.


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Waldorf MIDI Implementation & additional documentation:
http://Synth.Stromeko.net/Downloads.html#WaldorfDocs

___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Use of pool servers reveals unacceptable crash rate in async DNS

2016-06-26 Thread Eric S. Raymond
Eric S. Raymond :
> > What would you do if we discovered a case where we wanted it?
> 
> Cry a lot.  Then add logic to force synchronous DNS when memlocking is
> selected, and document this as a workaround for a bug we haven't fixed yet.

Ugh.  Our options have just narrowed.  I've just seen

libgcc_s.so.1 must be installed for pthread_cancel to work
Aborted (core dumped)

with memlock off in the build.

I think the homebrew async-lookup code has to go.  Even if we installed
the warmup fix, I don't think I'd trust it.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Use of pool servers reveals unacceptable crash rate in async DNS

2016-06-26 Thread Eric S. Raymond
Hal Murray :
> 
> e...@thyrsus.com said:
> > In this case, we have two possible complexity-reducing fixes.  One is to
> > drop the memlock feature entirely.  The other is to drop the buggy homebrew
> > asynchronous-DNS lookup from Classic and use libc's.
> 
> Dropping memlock is an interesting idea.  I can't think of any place where it 
> is required today but my crystal ball for what we will need tomorrow has 
> never been very good.

Crypto security *might* be it.  I'll wait for Daniel to weigh in once
he's done climbing mountains or whatever.

> What would you do if we discovered a case where we wanted it?

Cry a lot.  Then add logic to force synchronous DNS when memlocking is
selected, and document this as a workaround for a bug we haven't fixed yet.

> We could try simplifying things to only supporting lock-everything-I-need 
> rather than specifying how much.  There might be a slippery slope if 
> something like a thread stack needs a sane size specified.

I'm not intimate with mlockall, but it looks like it works that way now.

if (do_memlock) {
/*
 * lock the process into memory
 */
if (!dumpopts &&
0 != mlockall(MCL_CURRENT|MCL_FUTURE))
msyslog(LOG_ERR, "mlockall(): %m");
}

> Is there a simple way to count page faults for a process?  Or measure swapped 
> out data and/or code that isn't swapped in?

I don't know.  I can do some research, but I'm not sure "enough page faults
to merit memory locking" would be a well-defined threshold even if I knew how
to count them.

> I don't think your use-libc approach will be as simple as you would
> like.  It's not available on NetBSD or FreeBSD.  Maybe I just didn't
> look in the right place.  It's not in netdb.h where it is for Linux.

I believe you're right that these platforms don't have it.  The question is,
how important is that fact?  Is the performance hit from synchronous DNS
really a showstopper?  I don't know the answer.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: My first positive structural change to NTP

2016-06-26 Thread Eric S. Raymond
Hal Murray :
> 
> strom...@nexgo.de said:
> > I think that's still perpetuating a mistake.  This whole business of having
> > to specify two servers (or refclocks) for the same thing should go away.
> 
> There is a fundamental issue.  With a PPS, there really are two sources of 
> time.  Internally, ntpd needs two different handles so you can see both sets 
> of info on ntpq -peers and clockstats.

Agreed.  This is a specialization of my case that the declaration language
should be channel-focused, not deviced-focused.

> Normally, each PPS has an associated serial stream.  It would be good if 
> there were a clean way to specify that rather than using the prefer kludge.

I'm open to proposals.  I love designing minilanguages and DSLs, so I
am totally up for a friendly technical wrangle about this. :-)

The current new syntax has just these changes:

1. Magic driver-type numbers are replaced by driver shortnames

2. server 127.127.{t}.{u} -> refclock {typename} unit {u}

3. fudge ceases to be a separate command; its option grammar is glued
   to the end of the refclock part.

This is a pretty minimal change. It has the dual advantage that it is (a) easy
to explain to people familiar with the old syntax, and (b) significantly
simpler for newbies.  I think it is therefore at a sweet spot that we
shouldn't wander away from without good reason.

That said, I'm totally willing to hear good reasons.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: My first positive structural change to NTP

2016-06-26 Thread Hal Murray

strom...@nexgo.de said:
> I think that's still perpetuating a mistake.  This whole business of having
> to specify two servers (or refclocks) for the same thing should go away.

There is a fundamental issue.  With a PPS, there really are two sources of 
time.  Internally, ntpd needs two different handles so you can see both sets 
of info on ntpq -peers and clockstats.

Normally, each PPS has an associated serial stream.  It would be good if 
there were a clean way to specify that rather than using the prefer kludge.


strom...@nexgo.de said:
> It's easy enough these days to tell udev what each device should be named,
> so in principle there wouldn't even be a need to use anything but the
> default names. 

Is there a udev equivalent on other OSes?

I don't think it is necessary.  A boot time script can setup symbolic links.


-- 
These are my opinions.  I hate spam.



___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: My first positive structural change to NTP

2016-06-26 Thread Hal Murray
> Here's how I think it should look:

> --
> refclock shm unit 0 refid GPS
> refclock shm unit 1 prefer refid PPS
> --

I think you should start a list of that sort of change.

Currently, we can switch between our code and ntpd classic.  The same 
ntp.conf works for both.

I think we should preserve that until we make an explicit decision that it's 
the right time to make the break.

---

> Oh well...almost everyone disables remote querying anyway.  

It may be disabled for general IP addresses, but it's used all the time for 
monitoring your own servers.


-- 
These are my opinions.  I hate spam.



___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel