RE: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
Vitaly Kuznetsov writes: > Michael Kelley writes: > >> I talked to KY Srinivasan for any history about TSC page on 32-bit. He said >> there was no technical reason not to implement it, but our focus was always >> 64-bit Linux, so the 32-bit was much less important. Also, on 32-bit Linux, >> the required 64x64 multiply and shift is more complex and takes more >> more cycles (compare 32-bit implementation of mul_u64_u64_shr vs. >> the 64-bit implementation), so the win over a MSR read is less. I >> don't know of any actual measurements being made to compare vs. >> MSR read. > > VMExit is 1000 CPU cycles or so, I would guess that TSC page > calculations are better. Let me try to build 32bit kernel and do some > quick measurements. So I tried and the difference is HUGE. For in-kernel clocksource reads (like sched_clock()), the testing code was: before = rdtsc_ordered(); for (i = 0; i < 1000; i++) (void)read_hv_sched_clock_msr(); after = rdtsc_ordered(); printk("MSR based clocksource: %d cycles\n", ((u32)(after - before))/1000); before = rdtsc_ordered(); for (i = 0; i < 1000; i++) (void)read_hv_sched_clock_tsc(); after = rdtsc_ordered(); printk("TSC page clocksource: %d cycles\n", ((u32)(after - before))/1000); The result (WS2016) is: [1.101910] MSR based clocksource: 3361 cycles [1.105224] TSC page clocksource: 49 cycles For userspace reads the absolute difference is even bigger as TSC page gives us functional vDSO: Testing code: before = rdtsc(); for (i = 0; i < COUNT; i++) clock_gettime(CLOCK_REALTIME, &tp); after = rdtsc(); printf("%d\n", (after - before)/COUNT); Result: TSC page: # ./gettime_cycles 131 MSR: # ./gettime_cycles 5664 With all that I see no reason for us to not enable TSC page on 32bit, even if the number of users is negligible, this will allow us to get rid of ugly #ifdef CONFIG_HYPERV_TSCPAGE in the code. I'll send a patch for discussion. -- Vitaly
RE: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
Michael Kelley writes: > From: Vitaly Kuznetsov Sent: Tuesday, August 13, 2019 > 1:34 AM >> >> Michael Kelley writes: >> >> > From: Tianyu Lan Sent: Tuesday, July 30, 2019 >> > 6:41 AM >> >> >> >> On Mon, Jul 29, 2019 at 8:13 PM Vitaly Kuznetsov >> >> wrote: >> >> > >> >> > Peter Zijlstra writes: >> >> > >> >> > > On Mon, Jul 29, 2019 at 12:59:26PM +0200, Vitaly Kuznetsov wrote: >> >> > >> lantianyu1...@gmail.com writes: >> >> > >> >> >> > >> > From: Tianyu Lan >> >> > >> > >> >> > >> > Hyper-V guests use the default native_sched_clock() in >> >> > >> > pv_ops.time.sched_clock >> >> > >> > on x86. But native_sched_clock() directly uses the raw TSC value, >> >> > >> > which >> >> > >> > can be discontinuous in a Hyper-V VM. Add the generic >> >> > >> > hv_setup_sched_clock() >> >> > >> > to set the sched clock function appropriately. On x86, this sets >> >> > >> > pv_ops.time.sched_clock to read the Hyper-V reference TSC value >> >> > >> > that is >> >> > >> > scaled and adjusted to be continuous. >> >> > >> >> >> > >> Hypervisor can, in theory, disable TSC page and then we're forced to >> >> > >> use >> >> > >> MSR-based clocksource but using it as sched_clock() can be very slow, >> >> > >> I'm afraid. >> >> > >> >> >> > >> On the other hand, what we have now is probably worse: TSC can, >> >> > >> actually, jump backwards (e.g. on migration) and we're breaking the >> >> > >> requirements for sched_clock(). >> >> > > >> >> > > That (obviously) also breaks the requirements for using TSC as >> >> > > clocksource. >> >> > > >> >> > > IOW, it breaks the entire purpose of having TSC in the first place. >> >> > >> >> > Currently, we mark raw TSC as unstable when running on Hyper-V (see >> >> > 88c9281a9fba6), 'TSC page' (which is TSC * scale + offset) is being used >> >> > instead. The problem is that 'TSC page' can be disabled by the >> >> > hypervisor and in that case the only remaining clocksource is MSR-based >> >> > (slow). >> >> > >> >> >> >> Yes, that will be slow if Hyper-V doesn't expose hv tsc page and >> >> kernel uses MSR based >> >> clocksource. Each MSR read will trigger one VM-EXIT. This also happens on >> >> other >> >> hypervisors (e,g, KVM doesn't expose KVM clock). Hypervisor should >> >> take this into >> >> account and determine which clocksource should be exposed or not. >> >> >> > >> > We've confirmed with the Hyper-V team that the TSC page is always available >> > on Hyper-V 2016 and later, and on Hyper-V 2012 R2 when the physical >> > hardware presents an InvariantTSC. >> >> Currently we check that TSC page is valid on every read and it seems >> this is redundant, right? It is either available on boot or not. I can >> only imagine migrating a VM to a non-InvariantTSC host when Hyper-V will >> likely disable the page (and we can get reenlightenment notification >> then). > > I think Hyper-V can have brief intervals when the TSC page is not valid, so > the code checks for the "sequence" value being zero. Otherwise, yes, it > should always be there or not be there. Is there some other validity > check on every read that you are thinking of? > No, it's this one. In case these 'invalidity periods' are real there's nothing to improve in the current code. >> >> > But the Linux Kconfig's are set up so >> > the TSC page is not used for 32-bit guests -- all clock reads are >> > synthetic MSR >> > reads. For 32-bit, this set of changes will add more overhead because the >> > sched clock reads will now be MSR reads. >> > >> > I would be inclined to fix the problem, even with the perf hit on 32-bit >> > Linux. >> > I don’t have any data on 32-bit Linux being used in a Hyper-V guest, but >> > it's not >> > supported in Azure so usage is pretty small. The alternative would be to >> > continue >> > to use the raw TSC value on 32-bit, even with the risk of a discontinuity >> > in case of >> > live migration or similar scenarios. >> >> The issue needs fixing, I agree, however using MSR based clocksource as >> sched clock may give us too big of a performance hit (not sure who cares >> about 32 bit guest performance nowadays but still). What stops us from >> enabling TSC page for 32 bit guests if it is available? > > I talked to KY Srinivasan for any history about TSC page on 32-bit. He said > there was no technical reason not to implement it, but our focus was always > 64-bit Linux, so the 32-bit was much less important. Also, on 32-bit Linux, > the required 64x64 multiply and shift is more complex and takes more > more cycles (compare 32-bit implementation of mul_u64_u64_shr vs. > the 64-bit implementation), so the win over a MSR read is less. I > don't know of any actual measurements being made to compare vs. > MSR read. VMExit is 1000 CPU cycles or so, I would guess that TSC page calculations are better. Let me try to build 32bit kernel and do some quick measurements. -- Vitaly
RE: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
From: Vitaly Kuznetsov Sent: Tuesday, August 13, 2019 1:34 AM > > Michael Kelley writes: > > > From: Tianyu Lan Sent: Tuesday, July 30, 2019 > > 6:41 AM > >> > >> On Mon, Jul 29, 2019 at 8:13 PM Vitaly Kuznetsov > >> wrote: > >> > > >> > Peter Zijlstra writes: > >> > > >> > > On Mon, Jul 29, 2019 at 12:59:26PM +0200, Vitaly Kuznetsov wrote: > >> > >> lantianyu1...@gmail.com writes: > >> > >> > >> > >> > From: Tianyu Lan > >> > >> > > >> > >> > Hyper-V guests use the default native_sched_clock() in > >> > >> > pv_ops.time.sched_clock > >> > >> > on x86. But native_sched_clock() directly uses the raw TSC value, > >> > >> > which > >> > >> > can be discontinuous in a Hyper-V VM. Add the generic > >> > >> > hv_setup_sched_clock() > >> > >> > to set the sched clock function appropriately. On x86, this sets > >> > >> > pv_ops.time.sched_clock to read the Hyper-V reference TSC value > >> > >> > that is > >> > >> > scaled and adjusted to be continuous. > >> > >> > >> > >> Hypervisor can, in theory, disable TSC page and then we're forced to > >> > >> use > >> > >> MSR-based clocksource but using it as sched_clock() can be very slow, > >> > >> I'm afraid. > >> > >> > >> > >> On the other hand, what we have now is probably worse: TSC can, > >> > >> actually, jump backwards (e.g. on migration) and we're breaking the > >> > >> requirements for sched_clock(). > >> > > > >> > > That (obviously) also breaks the requirements for using TSC as > >> > > clocksource. > >> > > > >> > > IOW, it breaks the entire purpose of having TSC in the first place. > >> > > >> > Currently, we mark raw TSC as unstable when running on Hyper-V (see > >> > 88c9281a9fba6), 'TSC page' (which is TSC * scale + offset) is being used > >> > instead. The problem is that 'TSC page' can be disabled by the > >> > hypervisor and in that case the only remaining clocksource is MSR-based > >> > (slow). > >> > > >> > >> Yes, that will be slow if Hyper-V doesn't expose hv tsc page and > >> kernel uses MSR based > >> clocksource. Each MSR read will trigger one VM-EXIT. This also happens on > >> other > >> hypervisors (e,g, KVM doesn't expose KVM clock). Hypervisor should > >> take this into > >> account and determine which clocksource should be exposed or not. > >> > > > > We've confirmed with the Hyper-V team that the TSC page is always available > > on Hyper-V 2016 and later, and on Hyper-V 2012 R2 when the physical > > hardware presents an InvariantTSC. > > Currently we check that TSC page is valid on every read and it seems > this is redundant, right? It is either available on boot or not. I can > only imagine migrating a VM to a non-InvariantTSC host when Hyper-V will > likely disable the page (and we can get reenlightenment notification > then). I think Hyper-V can have brief intervals when the TSC page is not valid, so the code checks for the "sequence" value being zero. Otherwise, yes, it should always be there or not be there. Is there some other validity check on every read that you are thinking of? > > > But the Linux Kconfig's are set up so > > the TSC page is not used for 32-bit guests -- all clock reads are synthetic > > MSR > > reads. For 32-bit, this set of changes will add more overhead because the > > sched clock reads will now be MSR reads. > > > > I would be inclined to fix the problem, even with the perf hit on 32-bit > > Linux. > > I don’t have any data on 32-bit Linux being used in a Hyper-V guest, but > > it's not > > supported in Azure so usage is pretty small. The alternative would be to > > continue > > to use the raw TSC value on 32-bit, even with the risk of a discontinuity > > in case of > > live migration or similar scenarios. > > The issue needs fixing, I agree, however using MSR based clocksource as > sched clock may give us too big of a performance hit (not sure who cares > about 32 bit guest performance nowadays but still). What stops us from > enabling TSC page for 32 bit guests if it is available? I talked to KY Srinivasan for any history about TSC page on 32-bit. He said there was no technical reason not to implement it, but our focus was always 64-bit Linux, so the 32-bit was much less important. Also, on 32-bit Linux, the required 64x64 multiply and shift is more complex and takes more more cycles (compare 32-bit implementation of mul_u64_u64_shr vs. the 64-bit implementation), so the win over a MSR read is less. I don't know of any actual measurements being made to compare vs. MSR read. Michael > > -- > Vitaly
RE: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
Michael Kelley writes: > From: Tianyu Lan Sent: Tuesday, July 30, 2019 6:41 > AM >> >> On Mon, Jul 29, 2019 at 8:13 PM Vitaly Kuznetsov wrote: >> > >> > Peter Zijlstra writes: >> > >> > > On Mon, Jul 29, 2019 at 12:59:26PM +0200, Vitaly Kuznetsov wrote: >> > >> lantianyu1...@gmail.com writes: >> > >> >> > >> > From: Tianyu Lan >> > >> > >> > >> > Hyper-V guests use the default native_sched_clock() in >> > >> > pv_ops.time.sched_clock >> > >> > on x86. But native_sched_clock() directly uses the raw TSC value, >> > >> > which >> > >> > can be discontinuous in a Hyper-V VM. Add the generic >> > >> > hv_setup_sched_clock() >> > >> > to set the sched clock function appropriately. On x86, this sets >> > >> > pv_ops.time.sched_clock to read the Hyper-V reference TSC value that >> > >> > is >> > >> > scaled and adjusted to be continuous. >> > >> >> > >> Hypervisor can, in theory, disable TSC page and then we're forced to use >> > >> MSR-based clocksource but using it as sched_clock() can be very slow, >> > >> I'm afraid. >> > >> >> > >> On the other hand, what we have now is probably worse: TSC can, >> > >> actually, jump backwards (e.g. on migration) and we're breaking the >> > >> requirements for sched_clock(). >> > > >> > > That (obviously) also breaks the requirements for using TSC as >> > > clocksource. >> > > >> > > IOW, it breaks the entire purpose of having TSC in the first place. >> > >> > Currently, we mark raw TSC as unstable when running on Hyper-V (see >> > 88c9281a9fba6), 'TSC page' (which is TSC * scale + offset) is being used >> > instead. The problem is that 'TSC page' can be disabled by the >> > hypervisor and in that case the only remaining clocksource is MSR-based >> > (slow). >> > >> >> Yes, that will be slow if Hyper-V doesn't expose hv tsc page and >> kernel uses MSR based >> clocksource. Each MSR read will trigger one VM-EXIT. This also happens on >> other >> hypervisors (e,g, KVM doesn't expose KVM clock). Hypervisor should >> take this into >> account and determine which clocksource should be exposed or not. >> > > We've confirmed with the Hyper-V team that the TSC page is always available > on Hyper-V 2016 and later, and on Hyper-V 2012 R2 when the physical > hardware presents an InvariantTSC. Currently we check that TSC page is valid on every read and it seems this is redundant, right? It is either available on boot or not. I can only imagine migrating a VM to a non-InvariantTSC host when Hyper-V will likely disable the page (and we can get reenlightenment notification then). > But the Linux Kconfig's are set up so > the TSC page is not used for 32-bit guests -- all clock reads are synthetic > MSR > reads. For 32-bit, this set of changes will add more overhead because the > sched clock reads will now be MSR reads. > > I would be inclined to fix the problem, even with the perf hit on 32-bit > Linux. > I don’t have any data on 32-bit Linux being used in a Hyper-V guest, but it's > not > supported in Azure so usage is pretty small. The alternative would be to > continue > to use the raw TSC value on 32-bit, even with the risk of a discontinuity in > case of > live migration or similar scenarios. The issue needs fixing, I agree, however using MSR based clocksource as sched clock may give us too big of a performance hit (not sure who cares about 32 bit guest performance nowadays but still). What stops us from enabling TSC page for 32 bit guests if it is available? -- Vitaly
RE: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
From: Tianyu Lan Sent: Tuesday, July 30, 2019 6:41 AM > > On Mon, Jul 29, 2019 at 8:13 PM Vitaly Kuznetsov wrote: > > > > Peter Zijlstra writes: > > > > > On Mon, Jul 29, 2019 at 12:59:26PM +0200, Vitaly Kuznetsov wrote: > > >> lantianyu1...@gmail.com writes: > > >> > > >> > From: Tianyu Lan > > >> > > > >> > Hyper-V guests use the default native_sched_clock() in > > >> > pv_ops.time.sched_clock > > >> > on x86. But native_sched_clock() directly uses the raw TSC value, > > >> > which > > >> > can be discontinuous in a Hyper-V VM. Add the generic > > >> > hv_setup_sched_clock() > > >> > to set the sched clock function appropriately. On x86, this sets > > >> > pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is > > >> > scaled and adjusted to be continuous. > > >> > > >> Hypervisor can, in theory, disable TSC page and then we're forced to use > > >> MSR-based clocksource but using it as sched_clock() can be very slow, > > >> I'm afraid. > > >> > > >> On the other hand, what we have now is probably worse: TSC can, > > >> actually, jump backwards (e.g. on migration) and we're breaking the > > >> requirements for sched_clock(). > > > > > > That (obviously) also breaks the requirements for using TSC as > > > clocksource. > > > > > > IOW, it breaks the entire purpose of having TSC in the first place. > > > > Currently, we mark raw TSC as unstable when running on Hyper-V (see > > 88c9281a9fba6), 'TSC page' (which is TSC * scale + offset) is being used > > instead. The problem is that 'TSC page' can be disabled by the > > hypervisor and in that case the only remaining clocksource is MSR-based > > (slow). > > > > Yes, that will be slow if Hyper-V doesn't expose hv tsc page and > kernel uses MSR based > clocksource. Each MSR read will trigger one VM-EXIT. This also happens on > other > hypervisors (e,g, KVM doesn't expose KVM clock). Hypervisor should > take this into > account and determine which clocksource should be exposed or not. > We've confirmed with the Hyper-V team that the TSC page is always available on Hyper-V 2016 and later, and on Hyper-V 2012 R2 when the physical hardware presents an InvariantTSC. But the Linux Kconfig's are set up so the TSC page is not used for 32-bit guests -- all clock reads are synthetic MSR reads. For 32-bit, this set of changes will add more overhead because the sched clock reads will now be MSR reads. I would be inclined to fix the problem, even with the perf hit on 32-bit Linux. I don’t have any data on 32-bit Linux being used in a Hyper-V guest, but it's not supported in Azure so usage is pretty small. The alternative would be to continue to use the raw TSC value on 32-bit, even with the risk of a discontinuity in case of live migration or similar scenarios. Michael
Re: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
Hi Vitaly & Peter: Thanks for your review. On Mon, Jul 29, 2019 at 8:13 PM Vitaly Kuznetsov wrote: > > Peter Zijlstra writes: > > > On Mon, Jul 29, 2019 at 12:59:26PM +0200, Vitaly Kuznetsov wrote: > >> lantianyu1...@gmail.com writes: > >> > >> > From: Tianyu Lan > >> > > >> > Hyper-V guests use the default native_sched_clock() in > >> > pv_ops.time.sched_clock > >> > on x86. But native_sched_clock() directly uses the raw TSC value, which > >> > can be discontinuous in a Hyper-V VM. Add the generic > >> > hv_setup_sched_clock() > >> > to set the sched clock function appropriately. On x86, this sets > >> > pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is > >> > scaled and adjusted to be continuous. > >> > >> Hypervisor can, in theory, disable TSC page and then we're forced to use > >> MSR-based clocksource but using it as sched_clock() can be very slow, > >> I'm afraid. > >> > >> On the other hand, what we have now is probably worse: TSC can, > >> actually, jump backwards (e.g. on migration) and we're breaking the > >> requirements for sched_clock(). > > > > That (obviously) also breaks the requirements for using TSC as > > clocksource. > > > > IOW, it breaks the entire purpose of having TSC in the first place. > > Currently, we mark raw TSC as unstable when running on Hyper-V (see > 88c9281a9fba6), 'TSC page' (which is TSC * scale + offset) is being used > instead. The problem is that 'TSC page' can be disabled by the > hypervisor and in that case the only remaining clocksource is MSR-based > (slow). > Yes, that will be slow if Hyper-V doesn't expose hv tsc page and kernel uses MSR based clocksource. Each MSR read will trigger one VM-EXIT. This also happens on other hypervisors (e,g, KVM doesn't expose KVM clock). Hypervisor should take this into account and determine which clocksource should be exposed or not. -- Best regards Tianyu Lan
Re: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
Peter Zijlstra writes: > On Mon, Jul 29, 2019 at 12:59:26PM +0200, Vitaly Kuznetsov wrote: >> lantianyu1...@gmail.com writes: >> >> > From: Tianyu Lan >> > >> > Hyper-V guests use the default native_sched_clock() in >> > pv_ops.time.sched_clock >> > on x86. But native_sched_clock() directly uses the raw TSC value, which >> > can be discontinuous in a Hyper-V VM. Add the generic >> > hv_setup_sched_clock() >> > to set the sched clock function appropriately. On x86, this sets >> > pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is >> > scaled and adjusted to be continuous. >> >> Hypervisor can, in theory, disable TSC page and then we're forced to use >> MSR-based clocksource but using it as sched_clock() can be very slow, >> I'm afraid. >> >> On the other hand, what we have now is probably worse: TSC can, >> actually, jump backwards (e.g. on migration) and we're breaking the >> requirements for sched_clock(). > > That (obviously) also breaks the requirements for using TSC as > clocksource. > > IOW, it breaks the entire purpose of having TSC in the first place. Currently, we mark raw TSC as unstable when running on Hyper-V (see 88c9281a9fba6), 'TSC page' (which is TSC * scale + offset) is being used instead. The problem is that 'TSC page' can be disabled by the hypervisor and in that case the only remaining clocksource is MSR-based (slow). -- Vitaly
Re: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
On Mon, Jul 29, 2019 at 12:59:26PM +0200, Vitaly Kuznetsov wrote: > lantianyu1...@gmail.com writes: > > > From: Tianyu Lan > > > > Hyper-V guests use the default native_sched_clock() in > > pv_ops.time.sched_clock > > on x86. But native_sched_clock() directly uses the raw TSC value, which > > can be discontinuous in a Hyper-V VM. Add the generic > > hv_setup_sched_clock() > > to set the sched clock function appropriately. On x86, this sets > > pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is > > scaled and adjusted to be continuous. > > Hypervisor can, in theory, disable TSC page and then we're forced to use > MSR-based clocksource but using it as sched_clock() can be very slow, > I'm afraid. > > On the other hand, what we have now is probably worse: TSC can, > actually, jump backwards (e.g. on migration) and we're breaking the > requirements for sched_clock(). That (obviously) also breaks the requirements for using TSC as clocksource. IOW, it breaks the entire purpose of having TSC in the first place.
Re: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
lantianyu1...@gmail.com writes: > From: Tianyu Lan > > Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock > on x86. But native_sched_clock() directly uses the raw TSC value, which > can be discontinuous in a Hyper-V VM. Add the generic hv_setup_sched_clock() > to set the sched clock function appropriately. On x86, this sets > pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is > scaled and adjusted to be continuous. Hypervisor can, in theory, disable TSC page and then we're forced to use MSR-based clocksource but using it as sched_clock() can be very slow, I'm afraid. On the other hand, what we have now is probably worse: TSC can, actually, jump backwards (e.g. on migration) and we're breaking the requirements for sched_clock(). -- Vitaly