Not sure if it's the same issue, but it sure looks like it is.

I have upgraded a couple of hosts (amd64) from 10.2-RELEASE-p5
to 10.2-RELEASE-p6, i.e. the freebsd-upgrade essentially just
replaced the /usr/sbin/ntpd with a new one; then I restarted
the ntpd.

On all host but one this was successful: the new ntpd starts
fine and works normally. But on one of these machines the
ntpd process immediately crashes with SIGSEGV. That machine
has an Intel Xeon cpu. It is not apparent to me in what way
this machine differs from others,

Played with some variations of ntpd on that host, here are
some findings:

- the new ntpd (that came with 10.2-RELEASE-p6) runs fine
  if it does *not* daemonize, i.e. ntpd with an option -n or -d
  stays attached to a terminal and works fine; the same
  happens when run under ktrace -d -i ntpd  ... it works fine,
  even when it daemonizes;

- the ntpd built from fresh net/ntp-devel behaves exactly
  the same: crashes on that machine when it daemonizes

- a previous ntpd (from 10.2-RELEASE-p5) works fine,
  so I ended up downgrading ntpd to that previous version
  on that machine. Also a ntpd from a recent 10-STABLE
  when copied to that host runs fine there!

I haven't tried yet to build it with debugging, or capture
a core dump.

Puzzling...

   Mark



2015-10-30 12:34, je David Wolfskill napisal
On Fri, Oct 30, 2015 at 09:42:07AM +0100, Dag-Erling Smørgrav wrote:
David Wolfskill <da...@catwhisker.org> writes:
> ...
> bound to 172.17.1.245 -- renewal in 43200 seconds.
> pid 544 (ntpd), uid 0: exited on signal 11 (core dumped)
> Starting Network: lo0 em0 iwn0 lagg0.
> ...

Did you find a solution? I'm wondering if the ntpd problems people are
reporting on freebsd-security@ are related.  I vaguely recall hearing
that this had been traced to a pthread bug, but can't find anything
about it in commit logs or mailing list archives.
....

I don't recall finding "a solution" per se; that said, I also don't
recall seeing an occurrence of the above for enough time that I'm not
sure when I sent that message. :-}

As a reality check:

g1-252(11.0-C)[1] ls -lT /*.core
-rw-r--r--  1 root  wheel  13783040 Aug 18 04:19:03 2015 /ntpd.core
g1-252(11.0-C)[2]

So -- among other points -- my last sighting of whatever was causing
that was the day I built:

FreeBSD 11.0-CURRENT #157  r286880M/286880:1100079: Tue Aug 18
04:45:25 PDT 2015
r...@g1-252.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY  amd64

Note that the machines where I run head get updated daily (unless
there's enough of a problem with head that I can't build it or can't
boot it (and I'm unable to circumvent the issue within a reasonable
time)) -- and while I do attempt to run ntpd on the machines, the above
failure is more "annoying" than "crippling" in my particular case.

And I'm presently running:

FreeBSD 11.0-CURRENT #227  r290138M/290138:1100084: Thu Oct 29
05:12:58 PDT 2015
r...@g1-252.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY  amd64

and building head @r290190 as I type.

And FWIW, I *suspect* that one of the issues involved (in my case)
was a ... lack of determinism ... in events involving getting the
(wireless) network connectivity into a usable state as part of the
initial transition to multi-user mode.  (I only have evidence at
the moment of the issue on my laptop; my build machine, which only
uses a wired NIC, has no /ntpd.core file.  It and my laptop are updated
pretty much in lock-step; it runs a completely GENERIC kernel, while
the laptop runs a modestly customized one based on GENERIC.)

Peace,
david
_______________________________________________
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Reply via email to