Re: Faster System.nanotime() ?

Gil Tene Thu, 02 May 2019 17:20:13 -0700


> On May 2, 2019, at 2:53 PM, dor laor <dor.l...@gmail.com> wrote:
> 
> On Thu, May 2, 2019 at 7:14 AM Gil Tene <g...@azul.com 
> <mailto:g...@azul.com>> wrote:
> 
> 
> Sent from my iPad
> 
> On May 1, 2019, at 1:38 PM, dor laor <dor.l...@gmail.com 
> <mailto:dor.l...@gmail.com>> wrote:
> 
>> On Wed, May 1, 2019 at 9:58 AM Gil Tene <g...@azul.com 
>> <mailto:g...@azul.com>> wrote:
>> There are many ways for RDTSC to be made "wrong" (as in non-monotonic within 
>> a software thread, process, system, etc.) on systems, but AFAIK "most" 
>> modern x86-64 bare metal systems can be set up for good clean, monotonic 
>> system-wide TSC-ness. The hardware certainly has the ability to keep those 
>> TSCs in sync (enough to not have detectable non-sync effects) both within a 
>> socket and across multi-socket systems (when the hardware is "built right"). 
>> The TSCs all get reset together and move together unless interfered with...
>> 
>> Two ways I've seen this go wrong even on modern hardware include:
>> 
>> A) Some BIOSes resetting TSC on a single core or hyperthread on each socket 
>> (usually thread 0 of core 0) for some strange reason during the boot 
>> sequence. [I've conclusively shown this on some 4 socket Sandy Bridge 
>> systems.] This leads different vcores to have vastly differing TSC values, 
>> which gets bigger with every non-power-cycling reboot, with obvious negative 
>> effects and screams from anyone relying on TSC consistency for virtually any 
>> purpose.
>> 
>> B) Hypervisors virtualizing TSC. Some hypervisors (notably at least some 
>> versions of VMWare) will virtualize the TSC and "slew" the virtualized value 
>> to avoid presenting guest OSs with huge jumps in TSC values when a core was 
>> taken away for a "long" (i.e. many-msec) period of time. Instead, the 
>> virtualized TSC will incrementally move forward in small jumps until it 
>> catches up. The purpose of this appears to be to avoid triggering guest OS 
>> panics in code that watches TSC for panic-timeouts and other sanity checks 
>> (e.g. code in OS spinlocks). The effect of this "slewing" is obvious: TSWC 
>> values can easily jump backward, even within a single software thread.
>> 
>> A hypervisor wouldn't take the TSC backwards, it can slow the TSC but not 
>> take it backward, unless they virtualize the cpu bits for stable tsc 
>> differently which
>> happens but I doubt VMware (and better hypervisors) take the TSC back
> 
> A hypervisor wouldn't take the TSC backwards within one vcore.
> 
> But vcores are scheduled individually, which means that any slewing done to 
> hide a long jump forward in the physical TSC in situations where a vcore was 
> not actually running on a physical core for a “long enough” period of time is 
> done individually within each vcore and its virtualized TSC. (synchronizing 
> the virtualized TSC slewing across vcores would require either synchronizing 
> their scheduling such that the entire VM would be either “on” or “off” cores 
> at the same time, or making the virtualuzed a TSC only tick forward in large 
> quantum’s, or only when all vcores are actively running on physical cores,  
> all of which would cause some other dramatic strangeness).
> 
> Multiple vcores belonging to the same guest OS can (and usually will) end up 
> running simultaneously on multiple real cores, which obviously means that 
> during slewing periods they will be showing vastly differing virtualized TSC 
> values (with gaps of 10s of msec) until the “slewing” is done. All it takes 
> is a “lucky timing” context switch within the Guest OS, moving a thread from 
> one vcore to another (for whichever of the many reasons the guest OS might 
> decide to do that) for *your* program to observe the TSC “jumping backwards” 
> by 10s of msec between one RDTSC execution and another.
> 
> It's the same issue as a physical machine with multiple sockets, the tsc 
> isn't synced across those different sockets.


Except that since ~Nehalem, the hardware (if built to recommended specs) does 
keep the tsc on all cores sync'ed across all sockets. The only non-perfectly 
sync'ed TSCs I've ever seen on any modern Intel hardware were due to BIOS 
messing with them after the hardware reset that placed them all in sync has 
already happened.. On those platforms where I observed out-of-sync TSCs, all 
hyper-threads except for one per socket were perfectly in sync, And the two 
hyperthreads on core 0 were out of sync with each other]

> The hypervisor keeps an offset per unscheduled vcore and makes sure it is 
> monotonic. Although we at KVM considered to
> slew/speed the TSC on vcores, primarily for live migration, we didn't do it 
> in practice.

If KVM doesn't slew, KVM guests may be fine on modern hardware….

How did you deal with Guest OSs panicking if the observed large intra-core TSC 
skips in critical code (when scheduled out for a long period of time)?

> One of my old team members wrote this
> pretty good write up (in 2011 but still relevant):
> https://www.kernel.org/doc/Documentation/virtual/kvm/timekeeping.txt 
> <https://www.kernel.org/doc/Documentation/virtual/kvm/timekeeping.txt>

That's a good writeup, with lots of good detail. But since then (2011):
- the statement "...multi-socket systems are likely to have individual 
clocksources rather than a single, universally distributed clock" has changed, 
and most multi-socket systems have a universal sync'ed clock.
- TSC's that are invariant to pstate and cstate are the norm on modern hardware.

Which means that the TSC can actually be relied on in most bare-metal cases.

> 
> 
>> 
>> 
>> The bottom line is that TSC can be relied on bare metal (where there is no 
>> hypervisor scheduling of guest OS cores) if the system is set up right, but 
>> can do very wrong things otherwise. People who really care about low cost 
>> time measurement (like System.nanotime()) can control their systems to make 
>> this work and elect to rely on it (that's exactly what Zing's -XX:+UseRdtsc 
>> flag is for), but it can be dangerous to rely on it by default.
>> 
>> On Tuesday, April 30, 2019 at 3:07:11 AM UTC-7, Ben Evans wrote:
>> I'd assumed that the monotonicity of System.nanoTime() on modern
>> systems was due to the OS compensating, rather than any changes at the
>> hardware level. Is that not the case?
>> 
>> In particular, Rust definitely still seems to think that their
>> SystemTime (which looks to back directly on to a RDTSC) can be
>> non-monotonic: https://doc.rust-lang.org/std/time/struct.SystemTime.html 
>> <https://doc.rust-lang.org/std/time/struct.SystemTime.html>
>> 
>> On Tue, 30 Apr 2019 at 07:50, dor laor <dor...@gmail.com <>> wrote:
>> >
>> > It might be since in the past many systems did not have a stable rdtsc and 
>> > thus if the instruction is executed
>> > on different sockets it can result in wrong answers and negative time. 
>> > Today most systems do have a stable tsc
>> > and you can verify it from userspace/java too.
>> > I bet it's easy to google the reason
>> >
>> > On Mon, Apr 29, 2019 at 2:36 PM 'Carl Mastrangelo' via mechanical-sympathy 
>> > <mechanica...@googlegroups.com <>> wrote:
>> >>
>> >> This may be a dumb question, but why (on Linux) is System.nanotime() a 
>> >> call out to clock_gettime?    It seems like it could be inlined by the 
>> >> JVM, and stripped down to the rdtsc instruction.   From my reading of the 
>> >> vDSO source for x86, the implementation is not that complex, and could be 
>> >> copied into Java.
>> >>
>> >> --
>> >> You received this message because you are subscribed to the Google Groups 
>> >> "mechanical-sympathy" group.
>> >> To unsubscribe from this group and stop receiving emails from it, send an 
>> >> email to mechanical-sympathy+unsubscr...@googlegroups.com <>.
>> >> For more options, visit https://groups.google.com/d/optout 
>> >> <https://groups.google.com/d/optout>.
>> >
>> > --
>> > You received this message because you are subscribed to the Google Groups 
>> > "mechanical-sympathy" group.
>> > To unsubscribe from this group and stop receiving emails from it, send an 
>> > email to mechanical-sympathy+unsubscr...@googlegroups.com <>.
>> > For more options, visit https://groups.google.com/d/optout 
>> > <https://groups.google.com/d/optout>.
>> 
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "mechanical-sympathy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to mechanical-sympathy+unsubscr...@googlegroups.com 
>> <mailto:mechanical-sympathy+unsubscr...@googlegroups.com>.
>> For more options, visit https://groups.google.com/d/optout 
>> <https://groups.google.com/d/optout>.
>> 
>> --
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "mechanical-sympathy" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/mechanical-sympathy/7WnH37dA6Yc/unsubscribe
>>  
>> <https://groups.google.com/d/topic/mechanical-sympathy/7WnH37dA6Yc/unsubscribe>.
>> To unsubscribe from this group and all its topics, send an email to 
>> mechanical-sympathy+unsubscr...@googlegroups.com 
>> <mailto:mechanical-sympathy+unsubscr...@googlegroups.com>.
>> For more options, visit https://groups.google.com/d/optout 
>> <https://groups.google.com/d/optout>.
> 
> 
> --
> You received this message because you are subscribed to the Google Groups 
> "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to mechanical-sympathy+unsubscr...@googlegroups.com 
> <mailto:mechanical-sympathy+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.
> 
> --
> You received this message because you are subscribed to a topic in the Google 
> Groups "mechanical-sympathy" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/mechanical-sympathy/7WnH37dA6Yc/unsubscribe 
> <https://groups.google.com/d/topic/mechanical-sympathy/7WnH37dA6Yc/unsubscribe>.
> To unsubscribe from this group and all its topics, send an email to 
> mechanical-sympathy+unsubscr...@googlegroups.com 
> <mailto:mechanical-sympathy+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

signature.asc
Description: Message signed with OpenPGP

Re: Faster System.nanotime() ?

Reply via email to