On Mon, 25 Sep 2017, Mike Travis wrote: > On 9/25/2017 11:10 AM, Thomas Gleixner wrote: > > The reason why I put it there in the first place was to make the TSC > > deadline timer work on a whole range of systems. It turned out that our > > 'fix' was not enough so we changed that later to disable the deadline timer > > completely on affected systems when the firmware does not contain a fix for > > it. So there is no real technical reason anymore to enforce TSC_ADJUST == 0 > > on the boot CPU. So rather than special casing this for UV we should just > > remove that requirement and leave the boot value as is. > > Okay, I could do that but as I mentioned I'm not comfortable changing code > that I cannot personally verify is correct. I'm assuming you only mean > removing the part where socket 0 should have a TSC_ADJUST value of 0? And the > part mentioned below that all cpu threads on a socket should have the same > adjust values should not be changed?
Right, only the restriction of TSC_ADJUST == 0 on the boot CPU can be removed. If the TSC_ADJUST values in a package differ, then BIOS did something really stupid because the underlying counter is the same for all threads in a package and different TSC_ADJUST values will result in TSC out of sync being detected. > > > Our BIOS team did make a change to conform to the "TSC_ADJUST should be > > > the > > > same on all cpu threads on a single socket" requirement, so we were able > > > to > > > pass that part of the TSC validation functions. (Prior to this, the TSC's > > > were synced by writing directly to the TSC MSR and natural delays in the > > > processor firmware caused the slight differences in the TSC ADJUST > > > values.) > > > > Right. TSC_ADJUST is there for a reason. > > Well, there is the problem with bit 63 (sign bit) in the TSC ADJUST MSR. So > it's almost there... :) What's the problem with bit 63? > > But back to my question about ART. You might talk to your BIOS/HW folks > > about that and eventually disable the ART related functionality in the > > kernel on UV. > > Once again, I'm not sure what code you are talking about. I talked to the > BIOS engineer specifically working on TSC related items and he was also not > familiar with the Intel ART spec. I could go farther up the chain to the > hardware designers but right now this would be extremely low on their priority > queue. > > Perhaps if you were more specific in what code should be disabled? We would > like to keep as much as the self check code as possible as there are still > some remotely possible failure cases. And removing the self checks would have > a negative impact on reliability. detect_art() -> X86_FEATURE_ART X86_FEATURE_ART enables the hardware assisted PTP <-> TSC correlation in network cards and is used for network synchronized audio stuff. It's probably a non issue today but TSN is becoming more widespread so I expect more users in the foreseeable future. We can just drop out in detect_art() if the system is UV so X86_FEATURE_ART does not get set and all related functionality is disabled automagically. > I actually did test using the CPU flags, "TSC reliable" and no "TSC ADJUST" to > prevent the kernel from overwriting the already correct TSC ADJUST values. > But that also turned off the clock source watchdog and having the > check_tsc_sync_target() test run was a good verification step. Our goal > was/is to conform as much as possible to the checks and the only unfixable > code was the assumption that socket 0 should have a TSC_ADJUST value of 0. Which can be removed. If you don't want to do it. I'm happy to write the patch^Wchangelog myself. Thanks, tglx