On 2019-11-17 02:20, Pratik Vyas wrote:
* George Koehler <[email protected]> [2019-11-16 18:59:08 -0500]:
I adapted some code from OpenBSD pvclock(4) into a Linux
kernel module, and used it to fix the clock in a Void Linux virtual
guest (which had been using the broken i8254 pit). In the Linux
module, I set "shift = 12", ignoring the shift = -20 from vmm(4).
This seems to fix the tsc-to-nanosecond conversion, so the Void guest
is now my only virtual machine with a precise clock.
Hi George,
I concur with your math and indeed the diff below fixes it for me.
ok?
--
Pratik
Index: sys/arch/amd64/amd64/vmm.c
===================================================================
RCS file: /cvs/src/sys/arch/amd64/amd64/vmm.c,v
retrieving revision 1.254
diff -u -p -a -u -r1.254 vmm.c
--- sys/arch/amd64/amd64/vmm.c 22 Sep 2019 08:47:54 -0000 1.254
+++ sys/arch/amd64/amd64/vmm.c 17 Nov 2019 07:11:04 -0000
@@ -6906,7 +6906,7 @@ vmm_update_pvclock(struct vcpu *vcpu)
nanotime(&tv);
pvclock_ti->ti_system_time =
tv.tv_sec * 1000000000L + tv.tv_nsec;
- pvclock_ti->ti_tsc_shift = -20;
+ pvclock_ti->ti_tsc_shift = 12;
pvclock_ti->ti_tsc_to_system_mul =
vcpu->vc_pvclock_system_tsc_mul;
pvclock_ti->ti_flags = PVCLOCK_FLAG_TSC_STABLE;
I have an intel that is not helped by this patch. the guest ntpd log
has a bunch of negative delay entries with the patch.
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz, 3392.84 MHz, 06-3a-09
cpu0:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.1, IBE
cpu0: using VERW MDS workaround (except on vmm entry)
cpu0: Enhanced SpeedStep 3392 MHz: speeds: 3901, 3900, 3700, 3600, 3400,
3200, 3100, 2900, 2700, 2600, 2400, 2300, 2100, 1900, 1800, 1600 MHz
I am wondering if serialized rdtsc would help. Linux uses
rdtsc_ordered() to grab timestamps, which uses the rdtscp instruction.
https://github.com/torvalds/linux/search?q=rdtsc_ordered&unscoped_q=rdtsc_ordered
A link provided by the post at https://stackoverflow.com/a/58146426
provided:
https://www.felixcloutier.com/x86/rdtscp says
"The time stamp disable (TSD) flag in register CR4 restricts the use of
the RDTSCP instruction as follows. When the flag is clear, the RDTSCP
instruction can be executed at any privilege level; when the flag is
set, the instruction can only be executed at privilege level 0."
I attempted the following diff but it yielded:
pvbus0 at mainbus0: OpenBSD
pvclock0 at pvbus0kernel: privileged instruction fault trap, code=0
Stopped at pvclock_get_timecount+0x50: rdtscp
The area is over my head -- Is rdtscp a possible solution or am I way
off course?
Index: arch/amd64/amd64/cpu.c
===================================================================
RCS file: /cvs/src/sys/arch/amd64/amd64/cpu.c,v
retrieving revision 1.142
diff -u -p -r1.142 cpu.c
--- arch/amd64/amd64/cpu.c 12 Oct 2019 14:05:50 -0000 1.142
+++ arch/amd64/amd64/cpu.c 25 Nov 2019 15:01:16 -0000
@@ -1144,6 +1144,7 @@ cpu_init_msrs(struct cpu_info *ci)
wrmsr(MSR_FSBASE, 0);
wrmsr(MSR_GSBASE, (u_int64_t)ci);
wrmsr(MSR_KERNELGSBASE, 0);
+ wrmsr(MSR_TSC_AUX, ci->ci_cpuid);
family = ci->ci_family;
if (strcmp(cpu_vendor, "GenuineIntel") == 0 &&
Index: arch/amd64/include/cpufunc.h
===================================================================
RCS file: /cvs/src/sys/arch/amd64/include/cpufunc.h,v
retrieving revision 1.34
diff -u -p -r1.34 cpufunc.h
--- arch/amd64/include/cpufunc.h 28 Jun 2019 21:54:05 -0000 1.34
+++ arch/amd64/include/cpufunc.h 25 Nov 2019 15:01:17 -0000
@@ -292,6 +292,15 @@ rdtsc(void)
}
static __inline u_int64_t
+rdtscp(void)
+{
+ uint32_t hi, lo;
+
+ __asm volatile("rdtscp" : "=d" (hi), "=a" (lo) : : "ecx");
+ return (((uint64_t)hi << 32) | (uint64_t) lo);
+}
+
+static __inline u_int64_t
rdpmc(u_int pmc)
{
uint32_t hi, lo;
Index: arch/amd64/include/specialreg.h
===================================================================
RCS file: /cvs/src/sys/arch/amd64/include/specialreg.h,v
retrieving revision 1.85
diff -u -p -r1.85 specialreg.h
--- arch/amd64/include/specialreg.h 14 Jun 2019 18:13:55 -0000 1.85
+++ arch/amd64/include/specialreg.h 25 Nov 2019 15:01:17 -0000
@@ -528,6 +528,8 @@
#define MSR_FSBASE 0xc0000100 /* 64bit offset for fs: */
#define MSR_GSBASE 0xc0000101 /* 64bit offset for gs: */
#define MSR_KERNELGSBASE 0xc0000102 /* storage for swapgs ins */
+#define MSR_TSC_AUX 0xc0000103 /* rdtscp storage */
+
#define MSR_PATCH_LOADER 0xc0010020
#define MSR_INT_PEN_MSG 0xc0010055 /* Interrupt pending message */
Index: dev/pv/pvclock.c
===================================================================
RCS file: /cvs/src/sys/dev/pv/pvclock.c,v
retrieving revision 1.4
diff -u -p -r1.4 pvclock.c
--- dev/pv/pvclock.c 13 May 2019 15:40:34 -0000 1.4
+++ dev/pv/pvclock.c 25 Nov 2019 15:01:17 -0000
@@ -224,7 +224,7 @@ pvclock_get_timecount(struct timecounter
* The algorithm is described in
* linux/Documentation/virtual/kvm/msr.txt
*/
- delta = rdtsc() - tsc_timestamp;
+ delta = rdtscp() - tsc_timestamp;
if (shift < 0)
delta >>= -shift;
else