On Sun, Oct 23, 2022 at 02:07:12PM +0200, Kalabic S. wrote:
> On 23/10/2022 12:28, Scott Cheloha wrote:
> > On Fri, Oct 21, 2022 at 12:28:26AM +0200, Kalabic S. wrote:
> > > Hello,
> > > 
> > > I noticed a system clock issue after upgrade from 7.1 to 7.2, clock 
> > > started
> > > to run really fast, almost at 10x speed or so. It is a virtual machine 
> > > guest
> > > on ESXi 6.0 host, VM is used as a main Internet router for my home network
> > > (PPPoE over fiber). Both host and VM are configured with date/time in UTC
> > > timezone.
> > > 
> > > Long story short, setting 'kern.timecounter.hardware' to 'acpitimer0' has
> > > fixed it.
> > > 
> > > It did not make any difference if ntpd service was enabled or disabled.
> > 
> > Please provide a dmesg and the output of
> > 
> > $ sysctl hw
> > 
> > and
> > 
> > $ sysctl machdep
> > 
> > A dmesg from the VM before you upgraded will also help.

I'm not seeing one here.  I guess I'll just do a little guesswork.

> > If you have some kind of configuration file for the ESXi host and the
> > VM, that will also help.  I don't know anything about ESXi, but it
> > will help to look at what sort of settings you are using.

Also not seeing this here.  If you provide one it might save a lot of
time chasing non-problems.

> Attached output to this mail.

Thanks, let's take a peek.

> OpenBSD 7.2 (GENERIC.MP) #758: Tue Sep 27 11:57:54 MDT 2022
>     dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 1055850496 (1006MB)
> avail mem = 1006510080 (959MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xe0010 (556 entries)
> bios0: vendor Phoenix Technologies LTD version "6.00" date 09/21/2015
> bios0: VMware, Inc. VMware Virtual Platform

Okay, we're in a VMware guest, no surprises there.

> acpi0 at bios0: ACPI 4.0
> acpi0: sleep states S0 S1 S4 S5
> acpi0: tables DSDT FACP BOOT APIC MCFG SRAT HPET WAET
> acpi0: wakeup devices PCI0(S3) USB_(S1) P2P0(S3) S1F0(S3) S2F0(S3) S8F0(S3) 
> S16F(S3) S17F(S3) S18F(S3) S22F(S3) S23F(S3) S24F(S3) S25F(S3) PE40(S3) 
> S1F0(S3) PE50(S3) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits

You have a 24-bit virtual ACPI PM timer.

At this point in the boot, we will have switched from i8254_delay() to
using acpitimer_delay().

We probably also calibrate the TSC with acpitimer(4) here,
or near here.

> acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz, 745.35 MHz, 06-3a-09

The bogomips measurement is very, very low compared to the speed
advertised in the CPU string.

Not necessarily an error, but odd-looking, even in a VM.

> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,MMX,FXSR,SSE,SSE2,SS,HTT,SSE3,PCLMUL,SSSE3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,TSC_ADJUST,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,ARAT,MELTDOWN

TSC-related feature flags: TSC, DEADLINE, RDTSCP, ITSC, TSC_ADJUST

> cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 
> 64b/line 8-way L2 cache, 6MB 64b/line 12-way L3 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 15MHz

This looks low.

I don't know precisely what the hypervisor does to the local apic
timer frequency, and I don't have an old dmesg for reference, but my
guess is that this value is wrong.

At this point in the boot we have switched from acpitimer_delay() to
lapic_delay().

> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz, 813.12 MHz, 06-3a-09
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,MMX,FXSR,SSE,SSE2,SS,HTT,SSE3,PCLMUL,SSSE3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,TSC_ADJUST,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,ARAT,MELTDOWN
> cpu1: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 
> 64b/line 8-way L2 cache, 6MB 64b/line 12-way L3 cache
> cpu1: smt 0, core 1, package 0
> cpu2 at mainbus0: apid 2 (application processor)
> cpu2: Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz, 813.24 MHz, 06-3a-09
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,MMX,FXSR,SSE,SSE2,SS,HTT,SSE3,PCLMUL,SSSE3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,TSC_ADJUST,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,ARAT,MELTDOWN
> cpu2: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 
> 64b/line 8-way L2 cache, 6MB 64b/line 12-way L3 cache
> cpu2: smt 0, core 2, package 0
> cpu3 at mainbus0: apid 3 (application processor)
> cpu3: Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz, 975.57 MHz, 06-3a-09
> cpu3: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,MMX,FXSR,SSE,SSE2,SS,HTT,SSE3,PCLMUL,SSSE3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,TSC_ADJUST,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,ARAT,MELTDOWN
> cpu3: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 
> 64b/line 8-way L2 cache, 6MB 64b/line 12-way L3 cache
> cpu3: smt 0, core 3, package 0

Nothing surprising in here.  Bogomips still look odd.

> ioapic0 at mainbus0: apid 4 pa 0xfec00000, version 11, 24 pins
> acpimcfg0 at acpi0
> acpimcfg0: addr 0xf0000000, bus 0-127
> acpihpet0 at acpi0: 14318179 Hz

You have a virtual HPET.  We probably use it to calibrate the TSC
again, but this time we use lapic_delay() in the calibration loop.

> pppoe0: received unexpected PADO
> [snip]
> pppoe0: received unexpected PADO
> pppoe0: LCP keepalive timeout
> [snip]
> pppoe0: host unique tag found, but it belongs to a connection in state 3
> pppoe: received PADO but could not find request for it
> pppoe0: LCP keepalive timeout
> pppoe0: received unexpected PADO
> [snip]
> pppoe0: received unexpected PADO

I assume you didn't see these messages on 7.1?

> hw.sensors.vmt0.timedelta0=10.849606 secs, OK, Sun Oct 23 12:00:39.205
> hw.vendor=VMware, Inc.
> hw.product=VMware Virtual Platform
> hw.version=None
> hw.serialno=VMware-56 4d 1a e1 38 52 1c a2-2d 4c 27 f8 fc 19 f7 f3

Maybe relevant.

> machdep.cpuvendor=GenuineIntel
> machdep.cpuid=0x306a9
> machdep.cpufeature=0x1fbbfbff
> machdep.tscfreq=745206414
> machdep.invarianttsc=1

And yeah, your TSC frequency is astoundingly high.  Way higher than
anything I have ever seen.

My guess is that we are miscalibrating the TSC frequency at least
once.  The only major thing that changed in the frequency calibration
code between 7.1 and 7.2 is the introduction of additional delay(9) 
implementations.
Let's see if removing them from the picture changes the result we get.

Can you compile and boot a -current kernel with the attached patch and
send the resulting dmesg?

Index: dev/acpi/acpitimer.c
===================================================================
RCS file: /cvs/src/sys/dev/acpi/acpitimer.c,v
retrieving revision 1.16
diff -u -p -r1.16 acpitimer.c
--- dev/acpi/acpitimer.c        25 Aug 2022 17:43:34 -0000      1.16
+++ dev/acpi/acpitimer.c        24 Oct 2022 13:22:19 -0000
@@ -74,6 +74,11 @@ acpitimermatch(struct device *parent, vo
        return (1);
 }
 
+struct {
+       uint64_t tsc;
+       uint32_t pmt;
+} acpitimer_timestamp[1024];
+
 void
 acpitimerattach(struct device *parent, struct device *self, void *aux)
 {
@@ -102,7 +107,16 @@ acpitimerattach(struct device *parent, s
        acpi_timecounter.tc_name = sc->sc_dev.dv_xname;
        tc_init(&acpi_timecounter);
 
-       delay_init(acpitimer_delay, 1000);
+       size_t i;
+       for (i = 0; i < nitems(acpitimer_timestamp); i++) {
+               acpitimer_timestamp[i].pmt = acpitimer_read(sc);
+               acpitimer_timestamp[i].tsc = rdtsc_lfence();
+       }
+       for (i = 0; i < nitems(acpitimer_timestamp); i++) {
+               printf("%s: %u %llu\n",
+                   __func__, acpitimer_timestamp[i].pmt,
+                   acpitimer_timestamp[i].tsc);
+       }
 
 #if defined(__amd64__)
        extern void cpu_recalibrate_tsc(struct timecounter *);
Index: arch/amd64/amd64/tsc.c
===================================================================
RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v
retrieving revision 1.30
diff -u -p -r1.30 tsc.c
--- arch/amd64/amd64/tsc.c      24 Oct 2022 00:56:33 -0000      1.30
+++ arch/amd64/amd64/tsc.c      24 Oct 2022 13:22:19 -0000
@@ -243,6 +243,9 @@ measure_tsc_freq(struct timecounter *tc)
 
        delay_usec = 100000;
        for (i = 0; i < 3; i++) {
+               printf("%s: calibrating with %s(%d), %llu Hz: ",
+                   __func__, tc->tc_name, tc->tc_quality, tc->tc_frequency);
+
                s = intr_disable();
 
                err1 = get_tsc_and_timecount(tc, &tsc1, &count1);
@@ -251,16 +254,24 @@ measure_tsc_freq(struct timecounter *tc)
 
                intr_restore(s);
 
-               if (err1 || err2)
+               printf("count %llu %llu tsc %llu %llu ",
+                   count1, count2, tsc1, tsc2);
+               if (err1 || err2) {
+                       printf("(failed)\n");
                        continue;
+               }
 
                usec = calculate_tc_delay(tc, count1, count2);
+               printf("usecs %d: ", usec);
 
                if ((usec < (delay_usec - RECALIBRATE_DELAY_THRESHOLD)) ||
-                   (usec > (delay_usec + RECALIBRATE_DELAY_THRESHOLD)))
+                   (usec > (delay_usec + RECALIBRATE_DELAY_THRESHOLD))) {
+                       printf("(failed)\n");
                        continue;
+               }
 
                frequency = calculate_tsc_freq(tsc1, tsc2, usec);
+               printf("%llu Hz\n", frequency);
 
                min_freq = MIN(min_freq, frequency);
                success++;
Index: arch/amd64/amd64/machdep.c
===================================================================
RCS file: /cvs/src/sys/arch/amd64/amd64/machdep.c,v
retrieving revision 1.281
diff -u -p -r1.281 machdep.c
--- arch/amd64/amd64/machdep.c  16 Oct 2022 15:03:39 -0000      1.281
+++ arch/amd64/amd64/machdep.c  24 Oct 2022 13:22:19 -0000
@@ -2094,6 +2094,8 @@ delay_init(void(*fn)(int), int fn_qualit
 {
        static int cur_quality = 0;
        if (fn_quality > cur_quality) {
+               printf("%s: changing delay implementation: %d -> %d\n",
+                   __func__, cur_quality, fn_quality);
                delay_func = fn;
                cur_quality = fn_quality;
        }

Reply via email to