[BUG]Uncalibrated TSC is not accurate enough as a time keeper

2018-12-22 Thread Da Shi Cao
The cpu_khz and tsc_khz are now read directly by the cpuid
instruction, and they are deemed to be very accurate. But this is not
the case in our situation. The OS time lags behind about 8 seconds per
hour. The CPU information is as follows:
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 85
model name  : Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
stepping: 4
microcode   : 0x24d
cpu MHz : 2300.000
cache size  : 25344 KB
physical id : 0
siblings: 36
core id : 0
cpu cores   : 18
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 22
It is this "cpuid level 22" that makes the kernel 4.14 to read both
cpu_khz and tsc_khz directly by instruction "cpuid", and the TSC is
thought to be very accurate, but in fact it is not.

* TSC frequency determined by CPUID is a "hardware reported"
* frequency and is the most accurate one so far we have. This
* is considered a known frequency.
+*
+*  The assumption may not be valid!
+*
*/
-  setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);


V 3.16.51 will not boot

2017-12-14 Thread Da Shi Cao
The latest version of 3.16 will not boot on my box of 4 sockets, 32 cores.
[1.952997] general protection fault:  [#1] SMP
[1.957992] Modules linked in:
[1.961064] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW
3.16.51-ds02-g148f3e4-dirty #2
[1.969834] Hardware name: IBM System x3850 X5 -[7143X1U]-/Node 1,
Processor Card, BIOS -[G0E185AUS-1.85]- 04/22/2015
[1.980422] task: 8820731417e0 ti: 882073158000 task.ti:
882073158000
[1.987894] RIP: 0010:[]  []
build_sched_domains+0x6e2/0xbf0
[1.996684] RSP: :88207315bdf0  EFLAGS: 00010206
[2.001987] RAX:  RBX:  RCX: 0008
[2.009112] RDX: 00014918 RSI:  RDI: 0080
[2.016237] RBP: 88207315bea0 R08: 882072e88ca0 R09: fffe
[2.023362] R10: 2469f94a R11:  R12: 882072e1db58
[2.030488] R13: 882072e88c80 R14: 8880724c1488 R15: 0080
[2.037613] FS:  () GS:88207fc0()
knlGS:
[2.045690] CS:  0010 DS:  ES:  CR0: 8005003b
[2.051428] CR2: 88807000 CR3: 01a11000 CR4: 07f0
[2.058553] Stack:
[2.060567]    
cd28
[2.068028]    
882072d60320
[2.075490]    8880724c1488
882072e1dac0
[2.082952] Call Trace:
[2.085400]  [] sched_init_smp+0x38f/0x41a
[2.091055]  [] ? native_smp_cpus_done+0x10b/0x112
[2.097400]  [] kernel_init_freeable+0xf4/0x200
[2.103485]  [] ? kernel_init_freeable+0xf4/0x200

I drill down to the function "build_group_mask"
@@ -5801,7 +5801,7 @@ build_group_mask(struct sched_domain *sd, struct
sched_group *sg, struct cpumask
continue;

/* If we would not end up here, we can't continue from here */
-   if (!cpumask_equal(span, sched_domain_span(sibling->child)))
+   if (!cpumask_subset(sched_domain_span(sibling->child), span))
continue;

cpumask_set_cpu(i, mask);

This is the best guess I can make and the change makes it boot up on my box.