Re: crash

2020-04-14 Thread Udo van den Heuvel via devel
Hal,

On 14-04-2020 05:07, Hal Murray wrote:
> I just pushed a fix.  Please test.

With this fix the ntpd appears to be running a few hours now without issue.


Udo
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: crash

2020-04-13 Thread Hal Murray via devel


> -rw--- 1 root root 1708 Dec 13 11:05 ./keys/_key-certbot.pem
> Anything wrong in here? 

Your configure line includes early-droproot.
Your command line includes -u ntp:ntp

With that combination, it's probably trying to read the key after switching to 
user ntp.



-- 
These are my opinions.  I hate spam.



___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: crash

2020-04-13 Thread Udo van den Heuvel via devel
On 14-04-2020 07:22, Hal Murray wrote:
> Given that you have tested most of the rest of your ntp.conf, my guess would 
> be file permissions on the certificate or key.  The key is most likely since 
> there is no reason to hide the certificate.

# cd /etc/letsencrypt/
# find . -exec ls -ld {} \;
drwxr-xr-x 7 root root 4096 Mar  5 09:37 .
drwxr-xr-x 2 root root 4096 Dec 13 11:05 ./csr
-rw-r--r-- 1 root root 932 Dec 13 11:05 ./csr/_csr-certbot.pem
drwxr-xr-x 2 root root 4096 Dec 13 11:05 ./renewal
drwxr-xr-x 3 root root 4096 Dec 13 11:05 ./accounts
drwxr-xr-x 3 root root 4096 Dec 13 11:05
./accounts/acme-v02.api.letsencrypt.org
drwx-- 3 root root 4096 Dec 13 11:05
./accounts/acme-v02.api.letsencrypt.org/directory
drwx-- 2 root root 4096 Dec 13 11:05
./accounts/acme-v02.api.letsencrypt.org/directory/020c96242a59060882fc55ae933cc35e
-rw-r--r-- 1 root root 78 Dec 13 11:05
./accounts/acme-v02.api.letsencrypt.org/directory/020c96242a59060882fc55ae933cc35e/regr.json
-r 1 root root 1632 Dec 13 11:05
./accounts/acme-v02.api.letsencrypt.org/directory/020c96242a59060882fc55ae933cc35e/private_key.json
-rw-r--r-- 1 root root 77 Dec 13 11:05
./accounts/acme-v02.api.letsencrypt.org/directory/020c96242a59060882fc55ae933cc35e/meta.json
drwxr-xr-x 5 root root 4096 Dec 13 11:05 ./renewal-hooks
drwxr-xr-x 2 root root 4096 Dec 13 11:05 ./renewal-hooks/deploy
drwxr-xr-x 2 root root 4096 Dec 13 11:05 ./renewal-hooks/post
drwxr-xr-x 2 root root 4096 Dec 13 11:05 ./renewal-hooks/pre
drwx-- 2 root root 4096 Dec 13 11:05 ./keys
-rw--- 1 root root 1708 Dec 13 11:05 ./keys/_key-certbot.pem

Anything wrong in here?

Udo
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: crash

2020-04-13 Thread Hal Murray via devel


udo...@xs4all.nl said:
>> If you want the server side to support NTS, you need to add "nts enable"
> With that in ntp.conf the ntpd does not start. Config needed I guess. 

The log file should have a useful message.  It may take more than a few 
seconds to find due to all the cruft that is useful in other contexts.  Start 
at the end and work back.

Given that you have tested most of the rest of your ntp.conf, my guess would 
be file permissions on the certificate or key.  The key is most likely since 
there is no reason to hide the certificate.


-- 
These are my opinions.  I hate spam.



___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: crash

2020-04-13 Thread Udo van den Heuvel via devel
On 14-04-2020 05:07, Hal Murray wrote:
> 
>> # grep nts /etc/ntp.conf
>> nts key /etc/letsencrypt/keys/_key-certbot.pem
>> nts cert /etc/letsencrypt/csr/_csr-certbot.pem
>> server time.cloudflare.com:1234 nts  # TLS1.3 only
> ...
> 
> Thanks.
> 
> I just pushed a fix.  Please test.

Will do, building rpm right now.

> If you want the server side to support NTS, you need to add "nts enable"

With that in ntp.conf the ntpd does not start.
Config needed I guess.

Udo
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: crash

2020-04-13 Thread Udo van den Heuvel via devel
On 13-04-2020 20:18, Hal Murray wrote:
> It's dying while trying to reload the certificate file.
> 
> Is that happening after running for an hour?

Yes.

> 
> That turns into 2 questions.  Why is it trying to reload the certificates, 
> and 
> why is it crashing?
> 
> What's in your ntp.conf?  I don't need the whole thing, just the lines with 
> "nts".

# grep nts /etc/ntp.conf
nts key /etc/letsencrypt/keys/_key-certbot.pem
nts cert /etc/letsencrypt/csr/_csr-certbot.pem
server time.cloudflare.com:1234 nts  # TLS1.3 only
server ntpmon.dcs1.biz nts
server pi4.rellim.com nts
server ntp1.glypnod.com nts
server ntp2.glypnod.com nts


> Did this configuration work before a recent git pull?

No.

Udo
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: crash

2020-04-13 Thread Udo van den Heuvel via devel
On 13-04-2020 19:39, Hal Murray wrote:
>> Or will I do the debug build?
> 
> Please do it again with symbols.
> 
> How long does it run before it crashes?  Seconds?  Hours?  ...


(gdb) bt
#0  use_certificate_chain_file (ctx=ctx@entry=0x0, ssl=ssl@entry=0x0,
file=file@entry=0x555f9640
"/etc/letsencrypt/csr/_csr-certbot.pem") at ssl/ssl_rsa.c:604
#1  0x77c5c36e in SSL_CTX_use_certificate_chain_file
(ctx=ctx@entry=0x0, file=file@entry=0x555f9640
"/etc/letsencrypt/csr/_csr-certbot.pem") at ssl/ssl_rsa.c:688
#2  0x5558312e in nts_load_certificate (ctx=ctx@entry=0x0) at
../../ntpd/nts.c:225
#3  0x555832bc in nts_reload_certificate (ctx=0x0) at
../../ntpd/nts.c:204
#4  0x555840d5 in check_cert_file () at ../../ntpd/nts_server.c:171
#5  0x5558414d in nts_cert_timer () at ../../ntpd/nts_server.c:163
#6  0x55582d59 in nts_timer () at ../../ntpd/nts.c:107
#7  0x555739cd in timer () at ../../ntpd/ntp_timer.c:284
#8  0x55562051 in mainloop () at ../../ntpd/ntpd.c:940
#9  main (argc=, argv=) at
../../ntpd/ntpd.c:884
(gdb)

An hour or so?

Udo


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: crash

2020-04-13 Thread Hal Murray via devel


I think I've found a way for that to happen.

Were you missing a "nts enable" in your config file?
but did have a "nts cert ..." pointing to a valid file?


-- 
These are my opinions.  I hate spam.



___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: crash

2020-04-13 Thread Hal Murray via devel
Thanks.

It's dying while trying to reload the certificate file.

Is that happening after running for an hour?

That turns into 2 questions.  Why is it trying to reload the certificates, and 
why is it crashing?

What's in your ntp.conf?  I don't need the whole thing, just the lines with 
"nts".
Did this configuration work before a recent git pull?

One of your earlier messages had some logging, but I didn't see the NTS 
messages I expect.  With the latest run, did it say anything about loading 
certificates during initialization?  I expect 3 lines like this:
 2 Apr 13:12:11 ntpd[685]: NTSs: loaded certificate (chain) from xxx
 2 Apr 13:12:11 ntpd[685]: NTSs: loaded private key from xxx
 2 Apr 13:12:11 ntpd[685]: NTSs: Private Key OK

Is there anything interesting with the permissions on the certificate or key 
files?  You built with early-droproot, so I think it has already switched to 
user ntp when it loads them during initialization.  I'm trying to figure out 
why it's trying to reload them.  Either there is a bug in the reload logic, or 
it didn't load them the first try and the error didn't get handled correctly.


-- 
These are my opinions.  I hate spam.



___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: crash

2020-04-13 Thread Hal Murray via devel
> Or will I do the debug build?

Please do it again with symbols.

How long does it run before it crashes?  Seconds?  Hours?  ...



-- 
These are my opinions.  I hate spam.



___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: crash

2020-04-13 Thread Udo van den Heuvel via devel
On 13-04-2020 16:01, Hal Murray wrote:
> 
> udo...@xs4all.nl said:
>> Started things this way. One gdb line worries me a bit: (No debugging symbols
>> found in build/main/ntpd/ntpd)
> 
>> Perhaps a different build is needed? 
> 
> I'm not sure how that stuff works.
> 
> configure has an --enable-debug-gdb option.  That may do it.

With that option and some debuginfo's installed I get:

Thread 1 "ntpd" received signal SIGSEGV, Segmentation fault.
use_certificate_chain_file (ctx=ctx@entry=0x0, ssl=ssl@entry=0x0,
file=file@entry=0x555f9640
"/etc/letsencrypt/csr/_csr-certbot.pem") at ssl/ssl_rsa.c:604
604 passwd_callback = ssl->default_passwd_callback;
Missing separate debuginfos, use: dnf debuginfo-install
libgpg-error-1.36-2.fc31.x86_64
(gdb) bt
#0  use_certificate_chain_file (ctx=ctx@entry=0x0, ssl=ssl@entry=0x0,
file=file@entry=0x555f9640
"/etc/letsencrypt/csr/_csr-certbot.pem") at ssl/ssl_rsa.c:604
#1  0x77c5c36e in SSL_CTX_use_certificate_chain_file
(ctx=ctx@entry=0x0, file=file@entry=0x555f9640
"/etc/letsencrypt/csr/_csr-certbot.pem") at ssl/ssl_rsa.c:688
#2  0x5558312e in nts_load_certificate (ctx=ctx@entry=0x0) at
../../ntpd/nts.c:225
#3  0x555832bc in nts_reload_certificate (ctx=0x0) at
../../ntpd/nts.c:204
#4  0x555840d5 in check_cert_file () at ../../ntpd/nts_server.c:171
#5  0x5558414d in nts_cert_timer () at ../../ntpd/nts_server.c:163
#6  0x55582d59 in nts_timer () at ../../ntpd/nts.c:107
#7  0x555739cd in timer () at ../../ntpd/ntp_timer.c:284
#8  0x55562051 in mainloop () at ../../ntpd/ntpd.c:940
#9  main (argc=, argv=) at
../../ntpd/ntpd.c:884
(gdb)

Hopefully this helps fixing the issue.

Kind regards,
Udo

___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: crash

2020-04-13 Thread folkert via devel
> > udo...@xs4all.nl said:
> >> Started things this way. One gdb line worries me a bit: (No debugging 
> >> symbols
> >> found in build/main/ntpd/ntpd)
> > 
> >> Perhaps a different build is needed? 
> > 
> > I'm not sure how that stuff works.
> > 
> > configure has an --enable-debug-gdb option.  That may do it.

> Without that option I get:
> 
> Thread 1 "ntpd" received signal SIGSEGV, Segmentation fault.
> 0x77c5ba70 in use_certificate_chain_file () from
> /lib64/libssl.so.1.1
> Missing separate debuginfos, use: dnf debuginfo-install
> avahi-compat-libdns_sd-0.7-20.fc31.x86_64 avahi-libs-0.7-20.fc31.x86_64
> dbus-libs-1.12.16-3.fc31.x86_64 libcap-2.26-6.fc31.x86_64
> libgcc-9.3.1-1.fc31.x86_64 libgcrypt-1.8.5-1.fc31.x86_64
> lz4-libs-1.9.1-1.fc31.x86_64 nss-mdns-0.14.1-7.fc31.x86_64
> openssl-libs-1.1.1d-2.fc31.x86_64 systemd-libs-243.8-1.fc31.x86_64
> xz-libs-5.2.4-6.fc31.x86_64 zlib-1.2.11-20.fc31.x86_64
> (gdb) bt
> #0  0x77c5ba70 in use_certificate_chain_file ()
>from /lib64/libssl.so.1.1
> #1  0x5558310e in ?? ()
> #2  0x5558329c in ?? ()
> #3  0x555840b5 in ?? ()
> #4  0x5558412d in ?? ()
> #5  0x55582d39 in ?? ()
> #6  0x555739ad in ?? ()
> #7  0x55562031 in ?? ()
> #8  0x778f01a3 in __libc_start_main () from /lib64/libc.so.6
> #9  0x5556232e in ?? ()
> (gdb)
> 
> Does this help enough?

doubt it :)

> Or will I do the debug build?

Please do.
Because this backtrace, it won't show in what lineor even sourcefile the
segfault occured.
Yeah, it is possible to figure this out via those addresses (in theory)
but that is rather complicated.
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: crash

2020-04-13 Thread Udo van den Heuvel via devel
On 13-04-2020 16:01, Hal Murray wrote:
> 
> udo...@xs4all.nl said:
>> Started things this way. One gdb line worries me a bit: (No debugging symbols
>> found in build/main/ntpd/ntpd)
> 
>> Perhaps a different build is needed? 
> 
> I'm not sure how that stuff works.
> 
> configure has an --enable-debug-gdb option.  That may do it.

Without that option I get:

Thread 1 "ntpd" received signal SIGSEGV, Segmentation fault.
0x77c5ba70 in use_certificate_chain_file () from
/lib64/libssl.so.1.1
Missing separate debuginfos, use: dnf debuginfo-install
avahi-compat-libdns_sd-0.7-20.fc31.x86_64 avahi-libs-0.7-20.fc31.x86_64
dbus-libs-1.12.16-3.fc31.x86_64 libcap-2.26-6.fc31.x86_64
libgcc-9.3.1-1.fc31.x86_64 libgcrypt-1.8.5-1.fc31.x86_64
lz4-libs-1.9.1-1.fc31.x86_64 nss-mdns-0.14.1-7.fc31.x86_64
openssl-libs-1.1.1d-2.fc31.x86_64 systemd-libs-243.8-1.fc31.x86_64
xz-libs-5.2.4-6.fc31.x86_64 zlib-1.2.11-20.fc31.x86_64
(gdb) bt
#0  0x77c5ba70 in use_certificate_chain_file ()
   from /lib64/libssl.so.1.1
#1  0x5558310e in ?? ()
#2  0x5558329c in ?? ()
#3  0x555840b5 in ?? ()
#4  0x5558412d in ?? ()
#5  0x55582d39 in ?? ()
#6  0x555739ad in ?? ()
#7  0x55562031 in ?? ()
#8  0x778f01a3 in __libc_start_main () from /lib64/libc.so.6
#9  0x5556232e in ?? ()
(gdb)


Does this help enough?
Or will I do the debug build?

Udo
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: crash

2020-04-13 Thread Hal Murray via devel


udo...@xs4all.nl said:
> I could disable NTSc for now to avoid crashes. Or if you have a patch I can
> test with that one? 

Changing that may break (fix?) the crash.  I'd like to understand that before 
we change anything else.

Fixing Cloudflare will break all other NTS servers unless they make the same 
change as Cloudflare.  I'm hoping somebody on the IEFT list will pick a date.

There is rate limiting on those messages.  It shouldn't clutter up the log 
file too much.


-- 
These are my opinions.  I hate spam.



___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: crash

2020-04-13 Thread Udo van den Heuvel via devel
On 13-04-2020 14:48, Hal Murray wrote:
>> Apr 13 06:10:27 doos ntpd[204063]: EX-REP: Count=1 Print=1, Score=0.500, M4
>> V4 from [2606:4700:f1::1]:123, lng=84
> 
> That's saying the NTS stuff isn't working.  2606:4700:f1::1 is Cloudflare.  
> They have updated their servers to use the latest tweak from the draft RFC.  
> It's incompatible.

I could disable NTSc for now to avoid crashes.
Or if you have a patch I can test with that one?


Udo
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: crash

2020-04-13 Thread Hal Murray via devel


udo...@xs4all.nl said:
> Started things this way. One gdb line worries me a bit: (No debugging symbols
> found in build/main/ntpd/ntpd)

> Perhaps a different build is needed? 

I'm not sure how that stuff works.

configure has an --enable-debug-gdb option.  That may do it.



-- 
These are my opinions.  I hate spam.



___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: crash

2020-04-13 Thread Udo van den Heuvel via devel
On 13-04-2020 15:23, Hal Murray wrote:
> when it crashes, you should get back to gdb
> then
> bt should give you a stack trace

Started things this way.
One gdb line worries me a bit:
(No debugging symbols found in build/main/ntpd/ntpd)

Perhaps a different build is needed?


Udo


___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: crash

2020-04-13 Thread Hal Murray via devel


udo...@xs4all.nl said:
> I did not find a core dump. How else can I get a stack dump? 

use gdb.

You need to add -n to the command line args ot ntpd will detach itself.

cd build dir
gdb build/main/ntpd/ntpd

run -n http://lists.ntpsec.org/mailman/listinfo/devel


Re: crash

2020-04-13 Thread Udo van den Heuvel via devel
On 13-04-2020 14:48, Hal Murray wrote:
> Can you get a stack trace?

I did not find a core dump.
How else can I get a stack dump?

> What were your configure options?


CFLAGS="-O2" %{__python3} ./waf configure \
--prefix=/usr\
--enable-early-droproot\
--refclock=nmea,generic\
--libdir=%{_libdir}\
--docdir=%{_docdir}/ntpsec\
--enable-doc

>> Apr 13 06:10:27 doos ntpd[204063]: EX-REP: Count=1 Print=1, Score=0.500, M4
>> V4 from [2606:4700:f1::1]:123, lng=84
> 
> That's saying the NTS stuff isn't working.  2606:4700:f1::1 is Cloudflare.  
> They have updated their servers to use the latest tweak from the draft RFC.  
> It's incompatible.

Ah.
Would it help if I downgrade to a version a few weeks old?

-rw-r--r-- 1 root root 450224 Nov 22 07:32
RPMS/x86_64/ntpsec-1.1.8-0.fc31.x86_64.rpm
-rw-r--r-- 1 root root 450328 Dec  1 10:46
RPMS/x86_64/ntpsec-1.1.8-1.fc31.x86_64.rpm
-rw-r--r-- 1 root root 451781 Dec 13 10:53
RPMS/x86_64/ntpsec-1.1.8-2.fc31.x86_64.rpm
-rw-r--r-- 1 root root 451820 Dec 13 10:55
RPMS/x86_64/ntpsec-1.1.8-3.fc31.x86_64.rpm
-rw-r--r-- 1 root root 451848 Jan  4 17:49
RPMS/x86_64/ntpsec-1.1.8-4.fc31.x86_64.rpm
-rw-r--r-- 1 root root 452552 Feb 23 07:00
RPMS/x86_64/ntpsec-1.1.8-5.fc31.x86_64.rpm
-rw-r--r-- 1 root root 453601 Mar 14 14:39
RPMS/x86_64/ntpsec-1.1.8-6.fc31.x86_64.rpm
-rw-r--r-- 1 root root 453415 Apr  3 09:53
RPMS/x86_64/ntpsec-1.1.8-7.fc31.x86_64.rpm
-rw-r--r-- 1 root root 453667 Apr 12 09:30
RPMS/x86_64/ntpsec-1.1.8-8.fc31.x86_64.rpm

Or older?


Kind regards,
Udo

___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: crash

2020-04-13 Thread Hal Murray via devel


> Apr 13 07:10:23 doos kernel: ntpd[204063]: segfault at 17f8 ip
> 7f9d70252a70 sp 7ffe3665adc0 error 4 in libssl.so.1.1.1d[7f9d7022e000+
> 5] 

Can you get a stack trace?

What were your configure options?


> Apr 13 06:10:27 doos ntpd[204063]: EX-REP: Count=1 Print=1, Score=0.500, M4
> V4 from [2606:4700:f1::1]:123, lng=84

That's saying the NTS stuff isn't working.  2606:4700:f1::1 is Cloudflare.  
They have updated their servers to use the latest tweak from the draft RFC.  
It's incompatible.




-- 
These are my opinions.  I hate spam.



___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: crash

2020-04-13 Thread Udo van den Heuvel via devel
On 13-04-2020 14:13, Udo van den Heuvel via devel wrote:
> All,
> 
> This happens since yesterday:

This is with a fairly recent 1.1.8 git build.
Fedora is up to date.

Udo
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


crash

2020-04-13 Thread Udo van den Heuvel via devel
All,

This happens since yesterday:

Apr 13 06:10:23 doos ntpd[204062]: INIT: ntpd ntpsec-1.1.8
2019-08-02T00:00:00Z: Starting
Apr 13 06:10:23 doos ntpd[204062]: INIT: Command line: /usr/sbin/ntpd -u
ntp:ntp -g -N -p /var/run/ntpd.pid
Apr 13 06:10:23 doos ntpd[204063]: INIT: precision = 1.397 usec (-19)
Apr 13 06:10:23 doos ntpd[204063]: INIT: successfully locked into RAM
Apr 13 06:10:23 doos ntpd[204063]: CONFIG: readconfig: parsing file:
/etc/ntp.conf
Apr 13 06:10:23 doos ntpd[204063]: AUTH: authreadkeys: reading /etc/ntp/keys
Apr 13 06:10:23 doos ntpd[204063]: AUTH: authreadkeys: added 0 keys
Apr 13 06:10:23 doos ntpd[204063]: CONFIG: 'monitor' cannot be disabled
while 'limited' is enabled
Apr 13 06:10:23 doos ntpd[204063]: INIT: Using SO_TIMESTAMPNS
Apr 13 06:10:23 doos ntpd[204063]: IO: Listen and drop on 0 v6wildcard
[::]:123
Apr 13 06:10:23 doos ntpd[204063]: IO: Listen and drop on 1 v4wildcard
0.0.0.0:123
Apr 13 06:10:23 doos ntpd[204063]: IO: Listen normally on 2 lo 127.0.0.1:123
Apr 13 06:10:23 doos ntpd[204063]: IO: Listen normally on 3 eth0
192.168.10.70:123
Apr 13 06:10:23 doos ntpd[204063]: IO: Listen normally on 4 lo [::1]:123
Apr 13 06:10:23 doos ntpd[204063]: IO: Listen normally on 5 eth0
[fd00:c0a8:a00:1::70]:123
Apr 13 06:10:23 doos ntpd[204063]: IO: Listen normally on 6 eth0
[2001:981:a812:0:b62e:99ff:fe92:5264]:123
Apr 13 06:10:23 doos ntpd[204063]: IO: Listen normally on 7 eth0
[fe80::b62e:99ff:fe92:5264%2]:123
Apr 13 06:10:23 doos ntpd[204063]: IO: Listening on routing socket on fd
#24 for interface updates
Apr 13 06:10:23 doos ntpd[204063]: SYNC: Found 14 servers, suggest
minsane at least 3
Apr 13 06:10:23 doos ntpd[204063]: INIT: MRU 10922 entries, 13 hash
bits, 65536 bytes
Apr 13 06:10:23 doos ntpd[204063]: INIT: OpenSSL 1.1.1d FIPS  10 Sep
2019, 1010104f
Apr 13 06:10:23 doos ntpd[204062]: 2020-04-13T06:10:23 ntpd[204062]:
INIT: ntpd ntpsec-1.1.8 2019-08-02T00:00:00Z: Starting
Apr 13 06:10:23 doos ntpd[204062]: 2020-04-13T06:10:23 ntpd[204062]:
INIT: Command line: /usr/sbin/ntpd -u ntp:ntp -g -N -p /var/run/ntpd.pid
Apr 13 06:10:27 doos ntpd[204063]: EX-REP: Count=1 Print=1, Score=0.500,
M4 V4 from [2606:4700:f1::1]:123, lng=84
Apr 13 06:10:27 doos ntpd[204063]: EX-REP:  e400  
4e54534e   6274db5e 604c0da9   
 01040024 31ecdfb4 899336cb a523b5e6 0f5d614a 766ff4ca 99384f4d
a84fd23e 7e959900
Apr 13 06:11:34 doos ntpd[204063]: EX-REP: Count=2 Print=2, Score=0.995,
M4 V4 from [2606:4700:f1::1]:123, lng=84
Apr 13 06:11:34 doos ntpd[204063]: EX-REP:  e400  
4e54534e   c0d6d0b1 7ff9fd3c   
 01040024 0ab1fdb5 06d34008 4477e202 f7c726b0 e8f662cc ef488b09
c4b3100b 8fc83793
Apr 13 06:12:38 doos ntpd[204063]: EX-REP: Count=3 Print=3, Score=1.487,
M4 V4 from [2606:4700:f1::1]:123, lng=84
Apr 13 06:12:38 doos ntpd[204063]: EX-REP:  e400  
4e54534e   79b3755d fcf61bc4   
 01040024 6502774b aafc5e82 fb0692fc 2ab219c9 05be1d8a 8db3d63d
61d2591d 08fe9f00
Apr 13 06:13:42 doos ntpd[204063]: CLOCK: time stepped by -0.227362
Apr 13 06:13:42 doos ntpd[204063]: INIT: MRU 10922 entries, 13 hash
bits, 65536 bytes
Apr 13 06:14:07 doos ntpd[204063]: EX-REP: Count=4 Print=4, Score=1.968,
M4 V4 from [2606:4700:f1::1]:123, lng=84
Apr 13 06:14:07 doos ntpd[204063]: EX-REP:  e400  
4e54534e   735b5415 6c6af8af   
 01040024 e136691f 681af7eb 58590394 3fe8b189 4d7ec4cb 00658d17
a88bd4d7 542dd7da
Apr 13 06:15:14 doos ntpd[204063]: EX-REP: Count=5 Print=5, Score=2.450,
M4 V4 from [2606:4700:f1::1]:123, lng=84
Apr 13 06:15:14 doos ntpd[204063]: EX-REP:  e400  
4e54534e   02999b2d 3e0fd596   
 01040024 3d2a4325 067694c7 4fce200e 841a6932 d94001f5 0fe4aa4f
09dd46d4 33149497
Apr 13 07:10:23 doos kernel: [52367.896238] ntpd[204063]: segfault at
17f8 ip 7f9d70252a70 sp 7ffe3665adc0 error 4 in
libssl.so.1.1.1d[7f9d7022e000+5]
Apr 13 07:10:23 doos kernel: ntpd[204063]: segfault at 17f8 ip
7f9d70252a70 sp 7ffe3665adc0 error 4 in
libssl.so.1.1.1d[7f9d7022e000+5]

openssl rpm is intact.

Kind regards,
Udo
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Usefuleness of noval (Was: Re: NTS crash...)

2019-03-27 Thread Gary E. Miller via devel
Yo Richard!

On Wed, 27 Mar 2019 16:23:19 -0500
Richard Laager via devel  wrote:

> On 3/26/19 4:27 PM, Gary E. Miller via devel wrote:
> > I added noval, still can not connect:
> > 
> > server 204.17.205.23 maxpoll 5 nts noval # pi3  
> 
> I wonder if we should revisit "noval". I think I originally argued in
> favor of having it, as a standard TLS client knob. But IIRC, Daniel
> suggested it was pointless.

The pointless proven wrong by the Hackathon.  And still needed today.
 
> Does NTS with noval actually buy us anything over plain NTP?

Yes, it was 100% essential for the NTS hackathon.  Otherwise NTPsec
would not be able to connect to the ostfalia servers.

Still needed today.

Maybe, just maybe, if one of the may flavors of cert pinning worked with
NTPsec then it might only be useful for debugging.

RGDS
GARY
---
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97703
g...@rellim.com  Tel:+1 541 382 8588

Veritas liberabit vos. -- Quid est veritas?
"If you can’t measure it, you can’t improve it." - Lord Kelvin


pgpmygb9h4VKK.pgp
Description: OpenPGP digital signature
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Usefuleness of noval (Was: Re: NTS crash...)

2019-03-27 Thread Richard Laager via devel
On 3/26/19 4:27 PM, Gary E. Miller via devel wrote:
> I added noval, still can not connect:
> 
> server 204.17.205.23 maxpoll 5 nts noval # pi3

I wonder if we should revisit "noval". I think I originally argued in
favor of having it, as a standard TLS client knob. But IIRC, Daniel
suggested it was pointless.

Does NTS with noval actually buy us anything over plain NTP?

-- 
Richard



signature.asc
Description: OpenPGP digital signature
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: NTS crash...

2019-03-26 Thread Gary E. Miller via devel
Yo Hal!

On Tue, 26 Mar 2019 14:49:52 -0700
Hal Murray via devel  wrote:

> > Now it does not crash, anyway to make it work?
> > I need to use some IPs for private, offgrid, networking.   
> 
> I use /etc/hosts, so that hasn't been a problem for me.

Now I have two files to copy/merge over a dozen places instead of one.

> > If you only need the name for the cert, and you are not checking
> > the cert, it should work.  
> 
> Yes, but I need to get some quiet time.  It's not hard, just not
> simple, at least until I see an easy way to do it.

No rush.  Take your time and do it right.

RGDS
GARY
---
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97703
g...@rellim.com  Tel:+1 541 382 8588

Veritas liberabit vos. -- Quid est veritas?
"If you can’t measure it, you can’t improve it." - Lord Kelvin


pgpqQJ_DVcV9D.pgp
Description: OpenPGP digital signature
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: NTS crash...

2019-03-26 Thread Hal Murray via devel
> Now it does not crash, anyway to make it work?
> I need to use some IPs for private, offgrid, networking. 

I use /etc/hosts, so that hasn't been a problem for me.


> If you only need the name for the cert, and you are not checking the cert, it
> should work.

Yes, but I need to get some quiet time.  It's not hard, just not simple, at 
least until I see an easy way to do it.

The early config stuff handles the IP Address case and throws away the 
hostname string.  I want to understand that area before I try to fix it.


-- 
These are my opinions.  I hate spam.



___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: NTS crash...

2019-03-26 Thread Gary E. Miller via devel
Yo Hal!

On Tue, 26 Mar 2019 14:22:11 -0700
Hal Murray via devel  wrote:

> >> Are you trying to use NTS on an IP Address?  Known bug.  [That
> >> "(null)" happens on that case.]  
> > Nope.  Here is the line from ntp.conf that crashes my slow RasPi.
> > But not my fast RasPi:  
> 
> > server 204.17.205.23 maxpoll 5 nts # pi3   
> 
> That sure looks like at IP Address to me.

Yup, I guess not enough coffee yet.

Now it does not crash, anyway to make it work?

I need to use some IPs for private, offgrid, networking.

I added noval, still can not connect:

server 204.17.205.23 maxpoll 5 nts noval # pi3

If you only need the name for the cert, and you are not checking the
cert, it should work.


RGDS
GARY
---
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97703
g...@rellim.com  Tel:+1 541 382 8588

Veritas liberabit vos. -- Quid est veritas?
"If you can’t measure it, you can’t improve it." - Lord Kelvin


pgpczCP6ZWuGL.pgp
Description: OpenPGP digital signature
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: NTS crash...

2019-03-26 Thread Hal Murray via devel


>> Are you trying to use NTS on an IP Address?  Known bug.  [That
>> "(null)" happens on that case.]
> Nope.  Here is the line from ntp.conf that crashes my slow RasPi.  But not my
> fast RasPi:

> server 204.17.205.23 maxpoll 5 nts # pi3 

That sure looks like at IP Address to me.



-- 
These are my opinions.  I hate spam.



___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: NTS crash...

2019-03-26 Thread Gary E. Miller via devel
Yo Hal!

On Tue, 26 Mar 2019 13:44:41 -0700
Hal Murray  wrote:

> > Always fails for me.  On seversl different RasPi.  
> 
> 2019-03-26T13:35:09 ntpd[26050]: DNS: dns_probe: (null),
> cast_flags:1, flag= s:21001
> 
> Are you trying to use NTS on an IP Address?  Known bug.  [That
> "(null)" happens on that case.]

Nope.  Here is the line from ntp.conf that crashes my slow RasPi.  But
not my fast RasPi:

server 204.17.205.23 maxpoll 5 nts # pi3

> I thought I mentioned that case before but I guess I wasn't loud
> enough.

Is crashing your idea of loud?

I know that fails, now.  I know it will get fixed, eventually.  

But this is something else, a race.

RGDS
GARY
---
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97703
g...@rellim.com  Tel:+1 541 382 8588

Veritas liberabit vos. -- Quid est veritas?
"If you can’t measure it, you can’t improve it." - Lord Kelvin


pgpGhmAco5sdN.pgp
Description: OpenPGP digital signature
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: NTS crash...

2019-03-26 Thread Hal Murray via devel
> Always fails for me.  On seversl different RasPi.

2019-03-26T13:35:09 ntpd[26050]: DNS: dns_probe: (null), cast_flags:1, flag=
s:21001

Are you trying to use NTS on an IP Address?  Known bug.  [That "(null)" 
happens on that case.]

I thought I mentioned that case before but I guess I wasn't loud enough.


-- 
These are my opinions.  I hate spam.



___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: NTS crash...

2019-03-26 Thread Gary E. Miller via devel
Yo Hal!

On Tue, 26 Mar 2019 13:29:45 -0700
Hal Murray via devel  wrote:

> > I applied today's NTPsec git head, with the mysyslog patches.
> > Older and slower RasPi still crash on startup if the try to be an
> > NTS client.   
> 
> Works for me.  It's old enough that it has only 2 USB ports.

Always fails for me.  On seversl different RasPi.

2019-03-26T13:35:06 ntpd[26050]: PROTO: SHM(1) 8014 84 reachable
2019-03-26T13:35:06 ntpd[26050]: PROTO: SHM(1) 901a 8a sys_peer
2019-03-26T13:35:06 ntpd[26050]: PROTO: 0.0.0.0 c415 05 clock_sync
2019-03-26T13:35:07 ntpd[26050]: PROTO: 2001:470:e815::8 a014 84 reachable
2019-03-26T13:35:08 ntpd[26050]: PROTO: 204.17.205.17 a014 84 reachable
2019-03-26T13:35:09 ntpd[26050]: DNS: dns_probe: (null), cast_flags:1, 
flags:21001
[New Thread 0x75b1a460 (LWP 26062)]

Thread 4 "ntpd" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x75b1a460 (LWP 26062)]
0x76cfb6fc in strlcpy () from /usr/lib/libbsd.so.0


Any idea how to make gdb more useful here?

RGDS
GARY
---
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97703
g...@rellim.com  Tel:+1 541 382 8588

Veritas liberabit vos. -- Quid est veritas?
"If you can’t measure it, you can’t improve it." - Lord Kelvin


pgpIPV9cs9WdR.pgp
Description: OpenPGP digital signature
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


NTS crash...

2019-03-26 Thread Hal Murray via devel


> I applied today's NTPsec git head, with the mysyslog patches.  Older and
> slower RasPi still crash on startup if the try to be an NTS client. 

Works for me.  It's old enough that it has only 2 USB ports.



-- 
These are my opinions.  I hate spam.



___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


✘NTS crash...

2019-03-26 Thread Gary E. Miller via devel
Yo All!

I applied today's NTPsec git head, with the mysyslog patches.  Older
and slower RasPi still crash on startup if the try to be an NTS client.

RGDS
GARY
---
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97703
g...@rellim.com  Tel:+1 541 382 8588

Veritas liberabit vos. -- Quid est veritas?
"If you can’t measure it, you can’t improve it." - Lord Kelvin


pgpRtWBu2t_M9.pgp
Description: OpenPGP digital signature
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Dave Morgan's report on the mystery crash

2017-09-06 Thread Eric S. Raymond via devel
Dave Morgan :
> All,
> I am at work at moment. If logs still needed I will send in about 10
> hours when back home.

Thanks, we've found and fixed the problem.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond

Please consider contributing to my Patreon page at https://www.patreon.com/esr
so I can keep the invisible wheels of the Internet turning. Give generously -
the civilization you save might be your own.

___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Dave Morgan's report on the mystery crash

2017-09-06 Thread Dave Morgan via devel
All,
I am at work at moment. If logs still needed I will send in about 10
hours when back home.

Dave

On 05/09/2017, Eric S. Raymond via devel <devel@ntpsec.org> wrote:
> Dave Morgan sent me a report on two instances of the mystery crash
> tghat hapened to him last week (he also said the installation had been
> stable since).  Alas, I somehow fat-fingered my copy of that mail.
>
> Dave, please repost to the list so we can all stare at your logs and
> config.
> --
>   http://www.catb.org/~esr/;>Eric S. Raymond
>
> Non-cooperation with evil is as much a duty as cooperation with good.
>   -- Mohandas Gandhi
> ___
> devel mailing list
> devel@ntpsec.org
> http://lists.ntpsec.org/mailman/listinfo/devel
>


-- 
http://www.morgad.co.uk/index.html
DP: http://www.pgdp.net   NTP: http://www.pool.ntp.org
L: http://www.lynton-rail.co.uk
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Dave Morgan's report on the mystery crash

2017-09-05 Thread Eric S. Raymond via devel
Dave Morgan sent me a report on two instances of the mystery crash
tghat hapened to him last week (he also said the installation had been
stable since).  Alas, I somehow fat-fingered my copy of that mail.

Dave, please repost to the list so we can all stare at your logs and
config.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond

Non-cooperation with evil is as much a duty as cooperation with good.
-- Mohandas Gandhi
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


All hands alert - crash of unknown origin

2017-09-05 Thread Eric S. Raymond via devel
Everyone should read this thread:

https://gitlab.com/NTPsec/ntpsec/issues/375

The only empirical clue we have is that it only seems to manifest
under the kind of high load characterestic of pool service.

I have a suspicion that somrthing is causing memory usage to spike
and the OOM killer is reaping the process.

This is a serious bug and we need everyone with test facilities trying
to reproduce it.  If there is any way you can set up and watch a
pool server, please do so.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond

The Bible is not my book, and Christianity is not my religion.  I could never
give assent to the long, complicated statements of Christian dogma.
-- Abraham Lincoln
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: ✘pyntpq crash

2016-10-27 Thread Eric S. Raymond
Gary E. Miller :
> Yo Eric!
> 
> Whoops:
> 
> #  ntpq/pyntpq -p
> Traceback (most recent call last):
>   File "ntpq/pyntpq", line 1441, in 
> interpreter.onecmd(cmd)
>   File "/usr/lib64/python2.7/cmd.py", line 221, in onecmd
> return func(arg)
>   File "ntpq/pyntpq", line 1051, in do_peers
> self.__dopeers(showall=False, mode="peers")
>   File "ntpq/pyntpq", line 270, in __dopeers
> if not self.__dogetassoc():
>   File "ntpq/pyntpq", line 173, in __dogetassoc
> self.peers = self.session.readstat()
>   File "/u1/src/NTP/ntpsec/ntpq/ntp/packet.py", line 594, in readstat
> self.doquery(opcode=CTL_OP_READSTAT, associd=associd)
>   File "/u1/src/NTP/ntpsec/ntpq/ntp/packet.py", line 581, in doquery
> res = self.getresponse(opcode, associd, not retry)
>   File "/u1/src/NTP/ntpsec/ntpq/ntp/packet.py", line 461, in getresponse
> rawdata = polystr(self.sock.recv(4096))
> socket.error: [Errno 111] Connection refused
> 
> This happens when I kill the ntpd on the localhost.  Prolly easy to fix.

Trivial.  Pushed.  You should now get a "connection timed out" message.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond


signature.asc
Description: PGP signature
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel

✘pyntpq crash

2016-10-27 Thread Gary E. Miller
Yo Eric!

Whoops:

#  ntpq/pyntpq -p
Traceback (most recent call last):
  File "ntpq/pyntpq", line 1441, in 
interpreter.onecmd(cmd)
  File "/usr/lib64/python2.7/cmd.py", line 221, in onecmd
return func(arg)
  File "ntpq/pyntpq", line 1051, in do_peers
self.__dopeers(showall=False, mode="peers")
  File "ntpq/pyntpq", line 270, in __dopeers
if not self.__dogetassoc():
  File "ntpq/pyntpq", line 173, in __dogetassoc
self.peers = self.session.readstat()
  File "/u1/src/NTP/ntpsec/ntpq/ntp/packet.py", line 594, in readstat
self.doquery(opcode=CTL_OP_READSTAT, associd=associd)
  File "/u1/src/NTP/ntpsec/ntpq/ntp/packet.py", line 581, in doquery
res = self.getresponse(opcode, associd, not retry)
  File "/u1/src/NTP/ntpsec/ntpq/ntp/packet.py", line 461, in getresponse
rawdata = polystr(self.sock.recv(4096))
socket.error: [Errno 111] Connection refused

This happens when I kill the ntpd on the localhost.  Prolly easy to fix.


RGDS
GARY
---
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97703
g...@rellim.com  Tel:+1 541 382 8588


pgp4_TLXSJUyC.pgp
Description: OpenPGP digital signature
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel

Re: Use of pool servers reveals unacceptable crash rate in async DNS

2016-08-29 Thread Eric S. Raymond
Processing old mail...

Hal Murray :
> > I believe you're right that these platforms don't have it.  The question is,
> > how important is that fact?  Is the performance hit from synchronous DNS
> > really a showstopper?  I don't know the answer. 
> 
> There are two cases I know of where ntpd does a DNS lookup after it gets 
> started.
> 
> One is the try again when DNS for the normal server case doesn't work during 
> initialization.  It will try again occasionally until it gets an answer. 
> (which might be negative)
> 
> The main one is the pool code trying for a new server.  I think we should be 
> extending this rather than dropping it.  There are several possibles in this 
> area.  The main one would be to verify that a server you are using is still 
> in the pool.  (There isn't a way to do that yet - the pool doesn't have any 
> DNS support for that.)  The other would be to try replacing the poorest 
> server rather than only replacing dead servers.
> 
> DNS lookups can take a LONG time.  I think I've seen 40 seconds on a failing 
> case.
> 
> If we get the recv time stamp from the OS, I think the DNS delays won't 
> introduce any lies on the normal path.  We could test that by putting a sleep 
> in the main loop.  (There is a filter to reject packets that take too long, 
> but I think that's time-in-flight and excludes time sitting on the server.)
> 
> There are two cases I can think of where a pause in ntpd would cause 
> troubles.  One is that it would mess up refclocks.  The other is that packets 
> will get dropped if too many of them arrive.
> 
> I think that means we could use the pool command on a system without 
> refclocks.  That covers end nodes and maybe lightly loaded servers.
> 
> ---
> 
> It's worth checking out the input buffering side of things.  There may be 
> some code there that we don't need.  I think there is a pool of buffers.  
> Where can a buffer sit other than on the free queue.   Why do we need a pool?

The project has more important priorities than chasing this down.  But: I have
edited this text, adding a few details I have learned since, into a new
section for the internals tour (devel/tour.txt).  That will give somebody
a better-than-nothing place to start if we ever again try something like
the cAres replacement.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Use of pool servers reveals unacceptable crash rate in async DNS

2016-06-29 Thread Matthew Selsky
On Tue, Jun 28, 2016 at 11:39:16PM -0700, Hal Murray wrote:
> 
> matthew.sel...@twosigma.com said:
> > "rlimit memlock 0" using Classic causes ntpd to died after 3 minutes with
> > this error 2016-06-29T00:13:21.903+00:00 host.example.com ntpd[27206]:
> > libgcc_s.so.1 must be installed for pthread_cancel to work 
> 
> What version of Classic are you running?  I though they had fixed that.

This lab system happens to be running 4.2.7.0p368  Super-old, I know.  I'll 
upgrade it over the weekend.

> > I've attached 15 minute graphs for "rlimit memlock -1" and "rlimit memlock
> > 128" using Classic.  Locking memory seems to result in more stable graphs
> > over the time period that I was able to collect quickly.
>
> What are you plotting?

Y-axis is offset as measured by ntpq -p in microseconds.  X-axis is time.  And 
the 3 lines represent 3 different remote refclocks that my ntp client is 
pointing at.


Cheers,
-Matt

___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Use of pool servers reveals unacceptable crash rate in async DNS

2016-06-29 Thread Hal Murray

matthew.sel...@twosigma.com said:
> "rlimit memlock 0" using Classic causes ntpd to died after 3 minutes with
> this error 2016-06-29T00:13:21.903+00:00 host.example.com ntpd[27206]:
> libgcc_s.so.1 must be installed for pthread_cancel to work 

What version of Classic are you running?  I though they had fixed that.


> I've attached 15 minute graphs for "rlimit memlock -1" and "rlimit memlock
> 128" using Classic.  Locking memory seems to result in more stable graphs
> over the time period that I was able to collect quickly. 

What are you plotting?

-- 
These are my opinions.  I hate spam.



___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Use of pool servers reveals unacceptable crash rate in async DNS

2016-06-28 Thread Matthew Selsky
On Tue, Jun 28, 2016 at 07:26:39PM -0400, Eric S. Raymond wrote:
> Hal Murray :
> > I think you have extrapolated from some modern systems to our whole target 
> > environment.  I don't remember any discussion supporting memlock not being 
> > interesting/important.
> 
> There were actually two threads about this attached to memlock-related bug
> reports in Classic.  They initially thought memlocking was important, then
> figured out it wasn't.  Matt Selsky has been following those bugs; he and I
> discussed the issue on #ntpsec.

"rlimit memlock 0" using Classic causes ntpd to died after 3 minutes with this 
error
2016-06-29T00:13:21.903+00:00 host.example.com ntpd[27206]: libgcc_s.so.1 must 
be installed for pthread_cancel to work

I've attached 15 minute graphs for "rlimit memlock -1" and "rlimit memlock 128" 
using Classic.  Locking memory seems to result in more stable graphs over the 
time period that I was able to collect quickly.


Cheers,
-Matt
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel

Re: Use of pool servers reveals unacceptable crash rate in async DNS

2016-06-28 Thread Gary E. Miller
Yo Eric!

On Tue, 28 Jun 2016 19:47:14 -0400
"Eric S. Raymond"  wrote:

> Gary E. Miller :
> > Yo Eric!
> > 
> > On Tue, 28 Jun 2016 19:26:39 -0400
> > "Eric S. Raymond"  wrote:
> >   
> > > (You should camp on #ntpsec.  Also join our Signal channel -
> > > because that's secured, most of the vuln discussions happen
> > > there.)  
> > 
> > Ah, how do we joing the Signal channel?  
> 
> Install Signal on your smartphone and/or Chrome instance.  One of us
> can and will add you.

Done.  Tied to my cell phone: 541-390-3793.

RGDS
GARY
---
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97703
g...@rellim.com  Tel:+1 541 382 8588


pgpdPdpyb5JHH.pgp
Description: OpenPGP digital signature
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel

Re: Use of pool servers reveals unacceptable crash rate in async DNS

2016-06-28 Thread Eric S. Raymond
Gary E. Miller :
> Yo Eric!
> 
> On Tue, 28 Jun 2016 19:26:39 -0400
> "Eric S. Raymond"  wrote:
> 
> > (You should camp on #ntpsec.  Also join our Signal channel - because
> > that's secured, most of the vuln discussions happen there.)
> 
> Ah, how do we joing the Signal channel?

Install Signal on your smartphone and/or Chrome instance.  One of us can
and will add you.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond


signature.asc
Description: Digital signature
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel

Re: Use of pool servers reveals unacceptable crash rate in async DNS

2016-06-28 Thread Gary E. Miller
Yo Eric!

On Tue, 28 Jun 2016 19:26:39 -0400
"Eric S. Raymond"  wrote:

> (You should camp on #ntpsec.  Also join our Signal channel - because
> that's secured, most of the vuln discussions happen there.)

Ah, how do we joing the Signal channel?

RGDS
GARY
---
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97703
g...@rellim.com  Tel:+1 541 382 8588


pgpQXsYUMZaBI.pgp
Description: OpenPGP digital signature
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel

Re: Use of pool servers reveals unacceptable crash rate in async DNS

2016-06-28 Thread Eric S. Raymond
Hal Murray :
> I think you have extrapolated from some modern systems to our whole target 
> environment.  I don't remember any discussion supporting memlock not being 
> interesting/important.

There were actually two threads about this attached to memlock-related bug
reports in Classic.  They initially thought memlocking was important, then
figured out it wasn't.  Matt Selsky has been following those bugs; he and I
discussed the issue on #ntpsec.

(You should camp on #ntpsec.  Also join our Signal channel - because that's
secured, most of the vuln discussions happen there.)

> I'd be a lot happier if you had a plan for what to do if it turned out to be 
> a problem and/or a way to verify that we don't need it or detect that it 
> causes trouble.

I have a plan.  Love is the plan, the plan is git (classical reference).
The two patches that removed it are pretty well isolated and should be
easily revertible.  As in, the work of minutes.

As for how to tell if it causes problems, that's not very difficult either.
If it's suspected, graph jitter against memory utilization.

> Consider ntpd running on an old system that is mostly lightly loaded and 
> doesn't have a lot of memory.  I could easily imagine ntpd getting swapped 
> out when some load did come along.  I don't know how to evaluate if that will 
> cause problems and I don't think we have a test environment that is likely to 
> blunder into it.

I remember page faults causing enough processing lag to be a real issue here,
but not since the mid-1990s at the latest.  And certainly not with SSDs.

But I think you've brought another issue to the surface which I'll start a
separate thread about.

> I poked around a bit.  Linux and NetBSD and FreeBSD all have getrusage().  I 
> didn't notice any differences.  It covers page faults and CPU usage.  When 
> I'm in the right mood, I'll add another file parallel to sysstats to collect 
> that sort of data.  The CPU usage will probably be interesting even if page 
> faults are boring.

That kind of data is always useful and I would welcome it.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Use of pool servers reveals unacceptable crash rate in async DNS

2016-06-28 Thread Hal Murray

e...@thyrsus.com said:
> After discussion with Daniel about the performance and security issues I
> deleted the memlock code. As the comment explains:

I think changes like that are worthy of a general announcement.

> on modern systems, which swap so seldom
> that many people don't bother with swap partitions

I think you have extrapolated from some modern systems to our whole target 
environment.  I don't remember any discussion supporting memlock not being 
interesting/important.

I'd be a lot happier if you had a plan for what to do if it turned out to be 
a problem and/or a way to verify that we don't need it or detect that it 
causes trouble.

Consider ntpd running on an old system that is mostly lightly loaded and 
doesn't have a lot of memory.  I could easily imagine ntpd getting swapped 
out when some load did come along.  I don't know how to evaluate if that will 
cause problems and I don't think we have a test environment that is likely to 
blunder into it.

I poked around a bit.  Linux and NetBSD and FreeBSD all have getrusage().  I 
didn't notice any differences.  It covers page faults and CPU usage.  When 
I'm in the right mood, I'll add another file parallel to sysstats to collect 
that sort of data.  The CPU usage will probably be interesting even if page 
faults are boring.

-- 
These are my opinions.  I hate spam.



___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Use of pool servers reveals unacceptable crash rate in async DNS

2016-06-27 Thread Clark B. Wierda
On Mon, Jun 27, 2016 at 3:47 PM, Hal Murray  wrote:

>
> cbwie...@gmail.com said:
> > How are pool entries added when the service decides it needs more?
>
> There is some background stuff that roughly says "need more?", and if so
> fires off the DNS lookup.
>
>
> > Would it be possible to leverage this code for adding all servers
> specified
> > by name?
>
> Probably not directly, but it wouldn't be hard for the server code to use
> more than one address if that was desired.  Maybe it should be "servers"
> rather than "server".  Do you have an example where that would be useful?
>
> If you don't have lots of servers, you probably don't want to switch to
> using
> "pool" since that path will probably keep banging away at the DNS looking
> for
> more servers.
>
>
I'm not looking to change the operation of the server or pool directive.

I was thinking of setting up associations using the DNS lookup code.  If
the mechanism for adding new pool servers was blocking on the DNS call but
asynchronous to the rest of the daemon, I was figuring to call the lookup
with the name provided by the server directive.  The only real difference
between a specified server and a pool server is that you don't delete the
specified server.

I'm definitely not looking to bang on DNS servers any more than I have to.

Clark B. Wierda
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel

Re: Use of pool servers reveals unacceptable crash rate in async DNS

2016-06-27 Thread Clark B. Wierda
A question:

How are pool entries added when the service decides it needs more?

Would it be possible to leverage this code for adding all servers specified
by name?  The DNS cost would be the same.  This only difference is the name
used for the query.

Once a server is associated, the IP is used.  There should be no impact on
the time calculations after an association is provisioned.

Clark B. Wierda

On Sun, Jun 26, 2016 at 9:13 PM, Hal Murray  wrote:

>
> Possible crazy idea...
>
> How about we never kill the DNS helper thread.  Just let it sit there in
> case
> it gets more work to do.  The only cost is a bit of memory.
>
> Or maybe only do that if we are locking stuff into memory.
>
>
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel

Re: Use of pool servers reveals unacceptable crash rate in async DNS

2016-06-26 Thread Eric S. Raymond
Hal Murray :
> 
> e...@thyrsus.com said:
> > Ugh.  Our options have just narrowed.  I've just seen
> > libgcc_s.so.1 must be installed for pthread_cancel to work Aborted (core
> > dumped)
> 
> > with memlock off in the build.
> 
> Can you reproduce it?
> 
> My guess is that you didn't really get memlock turned off.  How about putting 
> a break on mlockall or the call to it.  (There is only one in ntpd.c)

This is possible.  I will attempt to reproduce.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Use of pool servers reveals unacceptable crash rate in async DNS

2016-06-26 Thread Eric S. Raymond
Hal Murray :
> If it uses threads, we still have the problem of not being able to load the 
> thread cleanup code.

Maybe.  We don't know if the libc implementation is vulnerable to that bug or
not.  I should do an experimental implementation on a branch and find out.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Use of pool servers reveals unacceptable crash rate in async DNS

2016-06-26 Thread Hal Murray

e...@thyrsus.com said:
>> Is getaddrinfo_a() in RTEMS?  QNX?   BSD?
> It's not an OS thing, it's a toolchain thing.  getaddrinfo_a() is
> implemented using standard C and POSIX threads, it doesn't need OS-specific
> support.

Or it's in an optional extra library.

> Linux has it because Linux uses libc whether you're compiling with gcc or
> clang.  Any of those other platforms will have it *if* they have (gcc ||
> clang) && glibc. 

My Linux man page says:
   #define _GNU_SOURCE /* See feature_test_macros(7) */
   Link with -lanl.

I couldn't find it in /usr/include/ on NetBSD or FreeBSD.  On Linux, it's in 
netdb.h.

--

If it uses threads, we still have the problem of not being able to load the 
thread cleanup code.


-- 
These are my opinions.  I hate spam.



___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Use of pool servers reveals unacceptable crash rate in async DNS

2016-06-26 Thread Eric S. Raymond
Mark Atwood :
> Is getaddrinfo_a() in RTEMS?  QNX?   BSD?

It's not an OS thing, it's a toolchain thing.  getaddrinfo_a() is implemented
using standard C and POSIX threads, it doesn't need OS-specific support.

Linux has it because Linux uses libc whether you're compiling with gcc
or clang.  Any of those other platforms will have it *if* they have
(gcc || clang) && glibc.

There is at least one other implementation out there, in a GPL-licensed
plackage called "adns".
-- 
http://www.catb.org/~esr/;>Eric S. Raymond
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Use of pool servers reveals unacceptable crash rate in async DNS

2016-06-26 Thread Mark Atwood
Is getaddrinfo_a() in RTEMS?  QNX?   BSD?

On Sun, Jun 26, 2016 at 7:06 AM Eric S. Raymond  wrote:

> Eric S. Raymond :
> > > What would you do if we discovered a case where we wanted it?
> >
> > Cry a lot.  Then add logic to force synchronous DNS when memlocking is
> > selected, and document this as a workaround for a bug we haven't fixed
> yet.
>
> Ugh.  Our options have just narrowed.  I've just seen
>
> libgcc_s.so.1 must be installed for pthread_cancel to work
> Aborted (core dumped)
>
> with memlock off in the build.
>
> I think the homebrew async-lookup code has to go.  Even if we installed
> the warmup fix, I don't think I'd trust it.
> --
> http://www.catb.org/~esr/;>Eric S. Raymond
> ___
> devel mailing list
> devel@ntpsec.org
> http://lists.ntpsec.org/mailman/listinfo/devel
>
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel

Re: Use of pool servers reveals unacceptable crash rate in async DNS

2016-06-26 Thread Eric S. Raymond
Eric S. Raymond :
> > What would you do if we discovered a case where we wanted it?
> 
> Cry a lot.  Then add logic to force synchronous DNS when memlocking is
> selected, and document this as a workaround for a bug we haven't fixed yet.

Ugh.  Our options have just narrowed.  I've just seen

libgcc_s.so.1 must be installed for pthread_cancel to work
Aborted (core dumped)

with memlock off in the build.

I think the homebrew async-lookup code has to go.  Even if we installed
the warmup fix, I don't think I'd trust it.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Use of pool servers reveals unacceptable crash rate in async DNS

2016-06-26 Thread Eric S. Raymond
Hal Murray :
> 
> e...@thyrsus.com said:
> > In this case, we have two possible complexity-reducing fixes.  One is to
> > drop the memlock feature entirely.  The other is to drop the buggy homebrew
> > asynchronous-DNS lookup from Classic and use libc's.
> 
> Dropping memlock is an interesting idea.  I can't think of any place where it 
> is required today but my crystal ball for what we will need tomorrow has 
> never been very good.

Crypto security *might* be it.  I'll wait for Daniel to weigh in once
he's done climbing mountains or whatever.

> What would you do if we discovered a case where we wanted it?

Cry a lot.  Then add logic to force synchronous DNS when memlocking is
selected, and document this as a workaround for a bug we haven't fixed yet.

> We could try simplifying things to only supporting lock-everything-I-need 
> rather than specifying how much.  There might be a slippery slope if 
> something like a thread stack needs a sane size specified.

I'm not intimate with mlockall, but it looks like it works that way now.

if (do_memlock) {
/*
 * lock the process into memory
 */
if (!dumpopts &&
0 != mlockall(MCL_CURRENT|MCL_FUTURE))
msyslog(LOG_ERR, "mlockall(): %m");
}

> Is there a simple way to count page faults for a process?  Or measure swapped 
> out data and/or code that isn't swapped in?

I don't know.  I can do some research, but I'm not sure "enough page faults
to merit memory locking" would be a well-defined threshold even if I knew how
to count them.

> I don't think your use-libc approach will be as simple as you would
> like.  It's not available on NetBSD or FreeBSD.  Maybe I just didn't
> look in the right place.  It's not in netdb.h where it is for Linux.

I believe you're right that these platforms don't have it.  The question is,
how important is that fact?  Is the performance hit from synchronous DNS
really a showstopper?  I don't know the answer.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Use of pool servers reveals unacceptable crash rate in async DNS

2016-06-25 Thread Hal Murray

e...@thyrsus.com said:
> In this case, we have two possible complexity-reducing fixes.  One is to
> drop the memlock feature entirely.  The other is to drop the buggy homebrew
> asynchronous-DNS lookup from Classic and use libc's.

Dropping memlock is an interesting idea.  I can't think of any place where it 
is required today but my crystal ball for what we will need tomorrow has 
never been very good.

What would you do if we discovered a case where we wanted it?

We could try simplifying things to only supporting lock-everything-I-need 
rather than specifying how much.  There might be a slippery slope if 
something like a thread stack needs a sane size specified.

Is there a simple way to count page faults for a process?  Or measure swapped 
out data and/or code that isn't swapped in?


I don't think your use-libc approach will be as simple as you would like.  It's 
not available on NetBSD or FreeBSD.  Maybe I just didn't look in the right 
place.  It's not in netdb.h where it is for Linux.



-- 
These are my opinions.  I hate spam.



___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Use of pool servers reveals unacceptable crash rate in async DNS

2016-06-25 Thread Eric S. Raymond
Mark: Heads up!  Policy issue.  Important but not urgent.

Hal Murray <hmur...@megapathdsl.net>:
> 
> e...@thyrsus.com said:
> > I think the hack is to force libgcc_s to be loaded early. I don't know how
> > to do that in waf. 
> 
> There are two problems in this area.  One is the end-of-thread code not 
> getting locked into memory.  I think that is what you are running into.
> 
> The other is a tangle of error handling on out-of-memory issues by things 
> like pthread_create and DNS lookup.  I think the latter end up with a retry 
> error code.  I think I fixed some/many of them to crash rather than retry on 
> the assumption that memory wasn't going to get freed and I didn't know of any 
> other reason to retry.  But that was a long time ago (maybe pre fork) and I 
> don't remember the details.
> 
> 
> I think we should copy the warmup code from ntp classic.  It's basically an 
> upstream bug.  Warmup seems like a reasonable work around.

We could do that.  But I'm opposed to the idea.  Not because I think the
warmup code is of itself bad, but because adding complexity seems like
the wrong direction to go in general.

The project motto is "Perfection is achieved, not when there is
nothing more to add, but when there is nothing left to take away."  I
didn't pick it out of a hat.  I wasn't just quoting it as a tribal
shibboleth. I *meant* it, and I've acted on it to the project's
great benefit.

Given a choice, I will almost always opt for the fix that removes
complexity and code bulk even if it sacrifices a feature I consider
marginal.  My being relentless about this is the direct reason we've
dodged so many CVEs; that is real-world feedback telling me to keep up
the simplifying pressure.

In this case, we have two possible complexity-reducing fixes.  One is
to drop the memlock feature entirely.  The other is to drop the
buggy homebrew asynchronous-DNS lookup from Classic and use libc's.

Before I will willingly sign off on any solution that adds code, someone
needs to explain to me why neither of those approaches will fly.

It could be, for example, that Daniel thinks we need memlocking for
crypto security.  (I'm not going to buy "performance", not when modern
systems swap so seldom that many people have stopped bothering with
swap partitions.)  But if so, I want to hear him explain that and
establish that the memory-locking code is worth its weight.

It could be that Mark judges there's a really important platform out
there that has POSIX threads but is non-libc, so getaddrinfo_a() is an
unacceptable port blocker that can be soilved with the homebrew code.
But if so, I want to hear him explain that and establish that the homebrew
lookup code is worth its weight.

Nothing that increases our defect rate gets to stay in purely on
historical inertia. Show me the use case, please.
-- 
http://www.catb.org/~esr/;>Eric S. Raymond
___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Use of pool servers reveals unacceptable crash rate in async DNS

2016-06-25 Thread Hal Murray

e...@thyrsus.com said:
> I think the hack is to force libgcc_s to be loaded early. I don't know how
> to do that in waf. 

There are two problems in this area.  One is the end-of-thread code not 
getting locked into memory.  I think that is what you are running into.

The other is a tangle of error handling on out-of-memory issues by things 
like pthread_create and DNS lookup.  I think the latter end up with a retry 
error code.  I think I fixed some/many of them to crash rather than retry on 
the assumption that memory wasn't going to get freed and I didn't know of any 
other reason to retry.  But that was a long time ago (maybe pre fork) and I 
don't remember the details.


I think we should copy the warmup code from ntp classic.  It's basically an 
upstream bug.  Warmup seems like a reasonable work around.

It's in ntpd/ntpd.c  Search for NEED_PTHREAD_WARMUP and backup over the long 
comment
which describes what's going on.

There is a note about not working on FreeBSD.  I haven't sorted that out.  It 
may refer to the linker hack.

Here are the bugs I remember:
  https://bugs.ntp.org/show_bug.cgi?id=2831
FreeBSD page fault story, morphs into lock discussion
  https://bugs.ntp.org/show_bug.cgi?id=2905
rlimit/memlock discussion

There is more info in various bugs:
  https://bugs.ntp.org/show_bug.cgi?id=2332
  https://bugs.ntp.org/show_bug.cgi?id=2954
  https://bugs.ntp.org/show_bug.cgi?id=2817
The signal/noise may not be good.



-- 
These are my opinions.  I hate spam.



___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel


Re: Use of pool servers reveals unacceptable crash rate in async DNS

2016-06-25 Thread Kurt Roeckx
On Sat, Jun 25, 2016 at 06:13:56PM -0400, Eric S. Raymond wrote:
> Hal Murray :
> > 
> > e...@thyrsus.com said:
> > > 1. Apply Classic's workaround for the problem, which I don't remember the
> > > details of but involved some dodgy nonstandard linker hacks done through 
> > > the
> > > build system.  *However, I did not trust this method when I understood 
> > > it.*
> > > It seemed sure to cause porting difficulties and is inherently fragile. 
> > 
> > k...@roeckx.be said:
> > > If it's the one I'm thinking about, I think the solution is to remove the
> > > locking of memory. 
> > 
> > We may be confusing several bugs.
> > 
> > There was a problem with locking stuff into memory.  Some library needed by 
> > end of thread processing wasn't loaded yet and things worked out such that 
> > with the default memory 32 bit systems worked but 64 bit systems didn't 
> > have 
> > enough room.
> > 
> > I think one solution was to create a dummy thread early on to get that 
> > module 
> > loaded.  Or disable memory locking, or tell it to use more memory, or ...
> 
> This matches what I remember, except for "use more memory". There was a third
> workaround involved weird linker options to force early loading of the 
> library.

Like -WL,-z,now?  That's not such a weird option.


Kurt

___
devel mailing list
devel@ntpsec.org
http://lists.ntpsec.org/mailman/listinfo/devel