One more update .. when I look at `ntpctl -sa` now, it does not show any
"peer not valid" errors. However, it still rons a good 14 seconds behind.
And it gets worse every minute.

# ntpctl -sa
4/4 peers valid, 1/1 sensors valid, constraint offset -115s (4 errors),
clock unsynced

peer
   wt tl st  next  poll          offset       delay      jitter
213.154.236.182 from pool pool.ntp.org
    1 10  2 3080s 3153s      2298.229ms     3.482ms     1.359ms
83.98.201.134 from pool pool.ntp.org
    1 10  2 3154s 3220s      2764.077ms     2.686ms     0.703ms
217.23.3.234 from pool pool.ntp.org
    1 10  2 2952s 3020s      2682.053ms     2.880ms     0.528ms
185.92.220.131 from pool pool.ntp.org
    1 10  2 2999s 3076s      2266.144ms     2.287ms     0.937ms

sensor
   wt gd st  next  poll          offset  correction
vmmci0
    1  1  0    6s   15s     14607.577ms     0.000ms



On Fri, Nov 9, 2018 at 7:18 PM Stefan Arentz <stefan.are...@gmail.com>
wrote:

> Here is an update on the situation:
>
> I installed -current on this VM, clean install, and the ntpd error does
> not happen anymore. But the clock issues remain, even with ntpd running.
>
>
> The ntpd starts without complaints now, and seems to be running with its
> regular processes:
>
> _ntp     70093  0.0  0.5   920  2540 ??  S<sp   7:04PM    0:00.02 ntpd:
> ntp engine (ntpd)
> _ntp     51912  0.0  0.5   736  2464 ??  Isp    7:04PM    0:00.01 ntpd:
> dns engine (ntpd)
> root     46674  0.0  0.3   792  1640 ??  S<sp   7:04PM    0:00.00
> /usr/sbin/ntpd -s
>
>
> I have set kern.timecounter.hardware to tsc:
>
> # systctl kern.timecounter
> kern.timecounter.tick=1
> kern.timecounter.timestepwarnings=0
> kern.timecounter.hardware=tsc
> kern.timecounter.choice=i8254(0) tsc(-1000) dummy(-1000000)
>
>
> trondd asked for the output of ntpctl -sa, which shows me the following:
>
> # ntpctl -sa
> 0/4 peers valid, 1/1 sensors valid, constraint offset -3s, clock unsynced,
> clock offset is 12771.710ms
>
> peer
>    wt tl st  next  poll          offset       delay      jitter
> 213.154.236.182 from pool pool.ntp.org
>     1  2  -  152s  300s             ---- peer not valid ----
> 83.98.201.134 from pool pool.ntp.org
>     1  2  -  152s  300s             ---- peer not valid ----
> 217.23.3.234 from pool pool.ntp.org
>     1  2  -  152s  300s             ---- peer not valid ----
> 185.92.220.131 from pool pool.ntp.org
>     1  2  -  152s  300s             ---- peer not valid ----
>
> sensor
>    wt gd st  next  poll          offset  correction
> vmmci0
>     1  1  0   15s   15s     18001.915ms     0.000ms
>
>
> I am not sure how to interpret these numbers. I also don't understand the
> "peer not valid" messages here. I have another OpenBSD VM which has the
> exact same ntpd.conf and it does not complain about any of the peers.
>
>
> I think my conclusion is that this is not something that can be solved at
> the VM level.
>
>  S.
>
>
> On Sat, Nov 3, 2018 at 8:10 PM Stefan Arentz <stefan.are...@gmail.com>
> wrote:
>
>> Hi everyone,
>>
>> I am having an issue where an OpenBSD VM running on vmd is having
>> serious clock skew issues.
>>
>> I am relatively new to OpenBSD, so I am not sure how to properly debug
>> this. What I hope is that I can provide a good amount of data and folks
>> here can give me some hints and ask me for additional information to
>> get to the root cause of this.
>>
>> So first some facts and symptoms:
>>
>> - Both Host and Guest are running OpenBSD 6.4. The host runs GENERIC.MP
>>   and the guest GENERIC.
>> - The host runs 50 guests, all OpenBSD (openbsd.amsterdam)
>> - Only this VM is having this clock issue (is this correct, or were
>>   there others?)
>>
>> - The guest has kern.timecounter.hardware=tsc
>> - The time on the VM was set with rdate a couple of days ago, and as of
>>   now the VM is running about 4 hours behind.
>> - ntpd is running (main process, dns engine, ntp engine)
>> - when started or restarted, ntpd complains about "pipe write error
>>   (from main): No such file or directory" but does seem to start
>>
>> - I just ran rdate nl.pool.ntp.org and the date was properly updated
>> - One minute after running rdate, the clock is already 7 seconds slow
>>
>> - The guest also has some severe networking issues. often I cannot type
>>   more than a few characters before a ~15 second delays happens.
>>   Interactive typing is difficult.
>> - I can SSH into the Host and have none of these issues, ruling out
>>   connectivity issues between me (Toronto) and the Host (Amsterdam)
>>
>> It would be easy to blame this on NTPd, which does have an unexplained
>> error message. However, I think even without running NTPd, the clock
>> skew should not be this extreme.
>>
>> Somehow I have a gut feeling that the clock issues and the networking
>> issues are related.
>>
>> I am root on the VM but I am not on the host. I do have vmctl access.
>> However, the host admin is friendly (Hi Mischa) and is happy to help to
>> debug this issue.
>>
>> I tried to ktrace ntpd to get more insight in the "pipe write error
>> (from main): No such file or directory" error but I did not get useful
>> info out of it. This may be because of my unfamiliarity with those
>> tools.
>>
>> Help appreciated :-)
>>
>>  S.
>>
>>

Reply via email to