Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support
On 05.07.24 17:02, David Woodhouse wrote: > On Fri, 2024-07-05 at 10:12 +0200, Peter Hilber wrote: >> On 03.07.24 12:40, David Woodhouse wrote: [...] >>> • Why is maxerror in picoseconds? It's the only use of that unit > > Between us we now have picoseconds, nanoseconds, (seconds >> 64) and > (seconds >> 64+n). > > The power-of-two fractions seem to make a lot of sense for the counter > period, because they mean we don't have to perform divisions. > > Does it makes sense to harmonise on (seconds >> 64) for all of the > fractional seconds? Again I don't have a strong opinion; I only want us > to have a *reason* for any differences that exist. > I don't have the expertise with fixed-point arithmetic to judge if this would become unwieldy. I selected ns for the virtio-rtc drafts so far because that didn't have any impact on the precision with the Linux kernel driver message-based use cases, but that would be different for SHM in my understanding. So I would tend to retain ns for convenience for messages (where it doesn't impact precision) but do not have any preference for SHM. >>> • Where do the clock_status values come from? Do they make sense? >>> • Are signed integers OK? (I think so!). >> >> Signed integers would need to be introduced to Virtio, which so far only >> uses explicitly unsigned types: u8, le16 etc. > > Perhaps. Although it would also be possible (if not ideal) to define > that e.g. the tai_offset field is a 16-bit "unsigned" integer according > to virtio, but to be interpreted as follows: > > If the number is <= 32767 then the TAI offset is that value, but if the > number is >= 32768 then the TAI offset is that value minus 65536. > > Perhaps not pretty, but there isn't a *fundamental* dependency on > virtio supporting signed integers as a primary type. > Agreed.
Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support
On Fri, 2024-07-05 at 10:12 +0200, Peter Hilber wrote: > On 03.07.24 12:40, David Woodhouse wrote: > > [...] > > > > > > > This is what I currently have for 'struct vmclock_abi' that I'd like to > > persuade you to adopt. I need to tweak it some more, for at least the > > following reasons, as well as any more you can see: > > > > • size isn't big enough for 64KiB pages > > • Should be explicitly little-endian > > • Does it need esterror as well as maxerror? > > I have no opinion about this. I can drop esterror if unwanted. I also don't care. I'm just observing the inconsistency. > > • Why is maxerror in picoseconds? It's the only use of that unit Between us we now have picoseconds, nanoseconds, (seconds >> 64) and (seconds >> 64+n). The power-of-two fractions seem to make a lot of sense for the counter period, because they mean we don't have to perform divisions. Does it makes sense to harmonise on (seconds >> 64) for all of the fractional seconds? Again I don't have a strong opinion; I only want us to have a *reason* for any differences that exist. > > • Where do the clock_status values come from? Do they make sense? > > • Are signed integers OK? (I think so!). > > Signed integers would need to be introduced to Virtio, which so far only > uses explicitly unsigned types: u8, le16 etc. Perhaps. Although it would also be possible (if not ideal) to define that e.g. the tai_offset field is a 16-bit "unsigned" integer according to virtio, but to be interpreted as follows: If the number is <= 32767 then the TAI offset is that value, but if the number is >= 32768 then the TAI offset is that value minus 65536. Perhaps not pretty, but there isn't a *fundamental* dependency on virtio supporting signed integers as a primary type. smime.p7s Description: S/MIME cryptographic signature
Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support
On 03.07.24 12:40, David Woodhouse wrote: [...] > > > This is what I currently have for 'struct vmclock_abi' that I'd like to > persuade you to adopt. I need to tweak it some more, for at least the > following reasons, as well as any more you can see: > > • size isn't big enough for 64KiB pages > • Should be explicitly little-endian > • Does it need esterror as well as maxerror? I have no opinion about this. I can drop esterror if unwanted. > • Why is maxerror in picoseconds? It's the only use of that unit > • Where do the clock_status values come from? Do they make sense? > • Are signed integers OK? (I think so!). Signed integers would need to be introduced to Virtio, which so far only uses explicitly unsigned types: u8, le16 etc.
Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support
On Wed, 2024-07-03 at 11:56 +0200, Peter Hilber wrote: > On 02.07.24 20:40, David Woodhouse wrote: > > On 2 July 2024 19:12:00 BST, Peter Hilber > > wrote: > > > On 02.07.24 18:39, David Woodhouse wrote: > > > > To clarify then, the main types are > > > > > > > > VIRTIO_RTC_CLOCK_UTC == 0 > > > > VIRTIO_RTC_CLOCK_TAI == 1 > > > > VIRTIO_RTC_CLOCK_MONOTONIC == 2 > > > > VIRTIO_RTC_CLOCK_SMEARED_UTC == 3 > > > > > > > > And the subtypes are *only* for the case of > > > > VIRTIO_RTC_CLOCK_SMEARED_UTC. They include > > > > > > > > VIRTIO_RTC_SUBTYPE_STRICT > > > > VIRTIO_RTC_SUBTYPE_UNDEFINED /* or whatever you want to call it */ > > > > VIRTIO_RTC_SUBTYPE_SMEAR_NOON_LINEAR > > > > VIRTIO_RTC_SUBTYPE_UTC_SLS /* if it's worth doing this one */ > > > > > > > > Is that what we just agreed on? > > > > > > > > > > > > > > This is a misunderstanding. My idea was that the main types are > > > > > > > VIRTIO_RTC_CLOCK_UTC == 0 > > > > VIRTIO_RTC_CLOCK_TAI == 1 > > > > VIRTIO_RTC_CLOCK_MONOTONIC == 2 > > > > VIRTIO_RTC_CLOCK_SMEARED_UTC == 3 > > > > > > VIRTIO_RTC_CLOCK_MAYBE_SMEARED_UTC == 4 > > > > > > The subtypes would be (1st for clocks other than > > > VIRTIO_RTC_CLOCK_SMEARED_UTC, 2nd to last for > > > VIRTIO_RTC_CLOCK_SMEARED_UTC): > > > > > > #define VIRTIO_RTC_SUBTYPE_STRICT 0 > > > #define VIRTIO_RTC_SUBTYPE_SMEAR_NOON_LINEAR 1 > > > #define VIRTIO_RTC_SUBTYPE_SMEAR_UTC_SLS 2 > > > > > > > Thanks. I really do think that from the guest point of view there's > > really no distinction between "maybe smeared" and "undefined > > smearing", and have a preference for using the latter form, which > > is the key difference there? > > > > Again though, not a hill for me to die on. > > I have no issue with staying with "undefined smearing", so would you agree > to something like > > VIRTIO_RTC_CLOCK_SMEAR_UNDEFINED_UTC == 4 > > (or another name if you prefer)? Well, the point of contention was really whether that was a *type* or a *subtype*. Either way, it's a "precision clock" telling its consumer that the device *itself* doesn't really know what time is being exposed. Which seems like a bizarre thing to support. But I think I've constructed an argument which persuades me to your point of view that *if* we permit it, it should be a primary type... A clock can *either* be UTC, *or* it can be monotonic. The whole point of smearing is to produce a monotonic clock, of course. VIRTIO_RTC_CLOCK_UTC is UTC. It is not monotonic. VIRTIO_RTC_CLOCK_SMEARED is, presumably, monotonic (and I think we should explicitly require that to be true in virtio-rtc). But VIRTIO_RTC_CLOCK_MAYBE_SMEARED is the worst of both worlds. It is neither known to be correct UTC, *nor* is it known to be monotonic. So (again, if we permit it at all) I think it probably does make sense for that to be a primary type. This is what I currently have for 'struct vmclock_abi' that I'd like to persuade you to adopt. I need to tweak it some more, for at least the following reasons, as well as any more you can see: • size isn't big enough for 64KiB pages • Should be explicitly little-endian • Does it need esterror as well as maxerror? • Why is maxerror in picoseconds? It's the only use of that unit • Where do the clock_status values come from? Do they make sense? • Are signed integers OK? (I think so!). /* * This structure provides a vDSO-style clock to VM guests, exposing the * relationship (or lack thereof) between the CPU clock (TSC, timebase, arch * counter, etc.) and real time. It is designed to address the problem of * live migration, which other clock enlightenments do not. * * When a guest is live migrated, this affects the clock in two ways. * * First, even between identical hosts the actual frequency of the underlying * counter will change within the tolerances of its specification (typically * ±50PPM, or 4 seconds a day). This frequency also varies over time on the * same host, but can be tracked by NTP as it generally varies slowly. With * live migration there is a step change in the frequency, with no warning. * * Second, there may be a step change in the value of the counter itself, as * its accuracy is limited by the precision of the NTP synchronization on the * source and destination hosts. * * So any calibration (NTP, PTP, etc.) which the guest has done on the source * host before migration is invalid, and needs to be redone on the new host. * * In its most basic mode, this structure provides only an indication to the * guest that live migration has occurred. This allows the guest to know that * its clock is invalid and take remedial action. For applications that need * reliable accurate timestamps (e.g. distributed databases), the structure * can be mapped all the way to userspace. This allows the application to see * directly for itself that the clock is disrupted and take appropriate * action, even when using a vDSO-style method to get the time instead of a *
Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support
On 02.07.24 20:40, David Woodhouse wrote: > On 2 July 2024 19:12:00 BST, Peter Hilber > wrote: >> On 02.07.24 18:39, David Woodhouse wrote: >>> To clarify then, the main types are >>> >>> VIRTIO_RTC_CLOCK_UTC == 0 >>> VIRTIO_RTC_CLOCK_TAI == 1 >>> VIRTIO_RTC_CLOCK_MONOTONIC == 2 >>> VIRTIO_RTC_CLOCK_SMEARED_UTC == 3 >>> >>> And the subtypes are *only* for the case of >>> VIRTIO_RTC_CLOCK_SMEARED_UTC. They include >>> >>> VIRTIO_RTC_SUBTYPE_STRICT >>> VIRTIO_RTC_SUBTYPE_UNDEFINED /* or whatever you want to call it */ >>> VIRTIO_RTC_SUBTYPE_SMEAR_NOON_LINEAR >>> VIRTIO_RTC_SUBTYPE_UTC_SLS /* if it's worth doing this one */ >>> >>> Is that what we just agreed on? >>> >>> >> >> This is a misunderstanding. My idea was that the main types are >> >>> VIRTIO_RTC_CLOCK_UTC == 0 >>> VIRTIO_RTC_CLOCK_TAI == 1 >>> VIRTIO_RTC_CLOCK_MONOTONIC == 2 >>> VIRTIO_RTC_CLOCK_SMEARED_UTC == 3 >> >> VIRTIO_RTC_CLOCK_MAYBE_SMEARED_UTC == 4 >> >> The subtypes would be (1st for clocks other than >> VIRTIO_RTC_CLOCK_SMEARED_UTC, 2nd to last for >> VIRTIO_RTC_CLOCK_SMEARED_UTC): >> >> #define VIRTIO_RTC_SUBTYPE_STRICT 0 >> #define VIRTIO_RTC_SUBTYPE_SMEAR_NOON_LINEAR 1 >> #define VIRTIO_RTC_SUBTYPE_SMEAR_UTC_SLS 2 >> > > Thanks. I really do think that from the guest point of view there's really no > distinction between "maybe smeared" and "undefined smearing", and have a > preference for using the latter form, which is the key difference there? > > Again though, not a hill for me to die on. I have no issue with staying with "undefined smearing", so would you agree to something like VIRTIO_RTC_CLOCK_SMEAR_UNDEFINED_UTC == 4 (or another name if you prefer)?
Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support
On 2 July 2024 19:12:00 BST, Peter Hilber wrote: >On 02.07.24 18:39, David Woodhouse wrote: >> To clarify then, the main types are >> >> VIRTIO_RTC_CLOCK_UTC == 0 >> VIRTIO_RTC_CLOCK_TAI == 1 >> VIRTIO_RTC_CLOCK_MONOTONIC == 2 >> VIRTIO_RTC_CLOCK_SMEARED_UTC == 3 >> >> And the subtypes are *only* for the case of >> VIRTIO_RTC_CLOCK_SMEARED_UTC. They include >> >> VIRTIO_RTC_SUBTYPE_STRICT >> VIRTIO_RTC_SUBTYPE_UNDEFINED /* or whatever you want to call it */ >> VIRTIO_RTC_SUBTYPE_SMEAR_NOON_LINEAR >> VIRTIO_RTC_SUBTYPE_UTC_SLS /* if it's worth doing this one */ >> >> Is that what we just agreed on? >> >> > >This is a misunderstanding. My idea was that the main types are > >> VIRTIO_RTC_CLOCK_UTC == 0 >> VIRTIO_RTC_CLOCK_TAI == 1 >> VIRTIO_RTC_CLOCK_MONOTONIC == 2 >> VIRTIO_RTC_CLOCK_SMEARED_UTC == 3 > >VIRTIO_RTC_CLOCK_MAYBE_SMEARED_UTC == 4 > >The subtypes would be (1st for clocks other than >VIRTIO_RTC_CLOCK_SMEARED_UTC, 2nd to last for >VIRTIO_RTC_CLOCK_SMEARED_UTC): > >#define VIRTIO_RTC_SUBTYPE_STRICT 0 >#define VIRTIO_RTC_SUBTYPE_SMEAR_NOON_LINEAR 1 >#define VIRTIO_RTC_SUBTYPE_SMEAR_UTC_SLS 2 > Thanks. I really do think that from the guest point of view there's really no distinction between "maybe smeared" and "undefined smearing", and have a preference for using the latter form, which is the key difference there? Again though, not a hill for me to die on.
Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support
On 02.07.24 18:39, David Woodhouse wrote: > On Tue, 2024-07-02 at 17:03 +0200, Peter Hilber wrote: >>> On 01.07.24 10:57, David Woodhouse wrote: > If my proposed memory structure is subsumed into the virtio-rtc > proposal we'd literally use the same names, but for the time being I'll > update mine to: >>> >>> Do you intend vmclock and virtio-rtc to be ABI compatible? > > I intend you to incorporate a shared memory structure like the vmclock > one into the virtio-rtc standard :) > > Because precision clocks in a VM are fundamentally nonsensical without > a way for the guest to know when live migration has screwed them up. > > So yes, so facilitate that, I'm trying to align things so that the > numeric values of fields like the time_type and smearing hint are > *precisely* the same as the VIRTIO_RTC_xxx values. > > Time pressure *may* mean I have to ship an ACPI-based device with a > preliminary version of the structure before I've finished persuading > you, and before we've completely finalized the structure. I am hoping > to avoid that. > > (In fact, my time pressure only applies to a version of the structure > which carries the disruption_marker field; the actual clock calibration > information doesn't have to be present in the interim implementation.) > > >>> FYI, I see a >>> potential problem in that Virtio does avoid the use of signed integers so >>> far. I did not check carefully if there might be other problems, yet. > > Hm, you use an unsigned integer to convey the tai_offset. I suppose at > +37 and with a plan to stop doing leap seconds in the next decade, > we're unlikely to get back below zero? > I think so. > The other signed integer I had was for the leap second direction, but I > think I'm happy to drop that and just adopt your uint8_t leap field > with VIRTIO_RTC_LEAP_{PRE_POS,PRE_NEG,etc.}. > > > > > > > /* > * What time is exposed in the time_sec/time_frac_sec fields? > */ > uint8_t time_type; > #define VMCLOCK_TIME_UTC0 /* Since 1970-01-01 > 00:00:00z */ > #define VMCLOCK_TIME_TAI1 /* Since 1970-01-01 > 00:00:00z */ > #define VMCLOCK_TIME_MONOTONIC 2 /* Since undefined epoch > */ > #define VMCLOCK_TIME_INVALID3 /* virtio-rtc uses this > for smeared UTC */ > > > I can then use your smearing subtype values as the 'hint' field in the > shared memory structure. You currently have: > > +\begin{lstlisting} > +#define VIRTIO_RTC_SUBTYPE_STRICT 0 > +#define VIRTIO_RTC_SUBTYPE_SMEAR 1 > +#define VIRTIO_RTC_SUBTYPE_SMEAR_NOON_LINEAR 2 > +#define VIRTIO_RTC_SUBTYPE_LEAP_UNSPECIFIED 3 > +\end{lstlisting} > >>> >>> I agree with the above part of your proposal. >>> > I can certainly ensure that 'noon linear' has the same value. I don't > think you need both 'SMEAR' and 'LEAP_UNSPECIFIED' though: > > > +\item VIRTIO_RTC_SUBTYPE_SMEAR deviates from the UTC standard by > + smearing time in the vicinity of the leap second, in a not > + precisely defined manner. This avoids clock steps due to UTC > + leap seconds. > > ... > > +\item VIRTIO_RTC_SUBTYPE_LEAP_UNSPECIFIED may deviate from the UTC > + standard w.r.t.\ leap second introduction in an unspecified > way > + (leap seconds may, or may not, be smeared). > > To the client, both of those just mean "for a day or so around a leap > second event, you can't trust this device to know what the time is". > There isn't any point in separating "does lie to you" from "might lie > to you", surely? The guest can't do anything useful with that > distinction. Let's drop SMEAR and keep only LEAP_UNSPECIFIED? >>> >>> As for VIRTIO_RTC_SUBTYPE_SMEAR, I think this could be dropped indeed >>> (resp., UTC_SLS may be added). >>> >>> But VIRTIO_RTC_CLOCK_SMEARED_UTC is an assurance that there will be no >>> steps (in particular, steps backwards, which some clients might not like) >>> due to leap seconds, while LEAP_UNSPECIFIED provides no such guarantee. >>> >>> So I think this might be better handled by adding, alongside >>> > #define VIRTIO_RTC_CLOCK_SMEARED_UTC 3 >>> >>> #define VIRTIO_RTC_CLOCK_LEAP_UNSPECIFIED_UTC 4 >>> >>> (or any better name, like VIRTIO_RTC_CLOCK_MAYBE_SMEARED_UTC). >>> > > And if you *really* want to parameterise it, I think that's a bad idea > and it encourages the proliferation of different time "standards", but > I'd probably just suck it up and do whatever you do because that's not > strictly within the remit of my live-migration part. >>> >>> I think the above proposal to have subtypes for >>> VIRTIO_RTC_CLOCK_SMEARED_UTC should work. > > To clarify then, the main types are > > VIRTIO_RTC_CLOCK_UTC == 0 > VIRTIO_RTC_CLOCK_TAI == 1 > VIRTIO_RTC_CLOCK_MONOTONIC == 2 >
Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support
On Tue, 2024-07-02 at 17:03 +0200, Peter Hilber wrote: > > On 01.07.24 10:57, David Woodhouse wrote: > > > > If my proposed memory structure is subsumed into the virtio-rtc > > > > proposal we'd literally use the same names, but for the time being I'll > > > > update mine to: > > > > Do you intend vmclock and virtio-rtc to be ABI compatible? I intend you to incorporate a shared memory structure like the vmclock one into the virtio-rtc standard :) Because precision clocks in a VM are fundamentally nonsensical without a way for the guest to know when live migration has screwed them up. So yes, so facilitate that, I'm trying to align things so that the numeric values of fields like the time_type and smearing hint are *precisely* the same as the VIRTIO_RTC_xxx values. Time pressure *may* mean I have to ship an ACPI-based device with a preliminary version of the structure before I've finished persuading you, and before we've completely finalized the structure. I am hoping to avoid that. (In fact, my time pressure only applies to a version of the structure which carries the disruption_marker field; the actual clock calibration information doesn't have to be present in the interim implementation.) > > FYI, I see a > > potential problem in that Virtio does avoid the use of signed integers so > > far. I did not check carefully if there might be other problems, yet. Hm, you use an unsigned integer to convey the tai_offset. I suppose at +37 and with a plan to stop doing leap seconds in the next decade, we're unlikely to get back below zero? The other signed integer I had was for the leap second direction, but I think I'm happy to drop that and just adopt your uint8_t leap field with VIRTIO_RTC_LEAP_{PRE_POS,PRE_NEG,etc.}. > > > > > > > > /* > > > > * What time is exposed in the time_sec/time_frac_sec fields? > > > > */ > > > > uint8_t time_type; > > > > #define VMCLOCK_TIME_UTC0 /* Since 1970-01-01 > > > > 00:00:00z */ > > > > #define VMCLOCK_TIME_TAI1 /* Since 1970-01-01 > > > > 00:00:00z */ > > > > #define VMCLOCK_TIME_MONOTONIC 2 /* Since undefined > > > > epoch */ > > > > #define VMCLOCK_TIME_INVALID3 /* virtio-rtc uses this > > > > for smeared UTC */ > > > > > > > > > > > > I can then use your smearing subtype values as the 'hint' field in the > > > > shared memory structure. You currently have: > > > > > > > > +\begin{lstlisting} > > > > +#define VIRTIO_RTC_SUBTYPE_STRICT 0 > > > > +#define VIRTIO_RTC_SUBTYPE_SMEAR 1 > > > > +#define VIRTIO_RTC_SUBTYPE_SMEAR_NOON_LINEAR 2 > > > > +#define VIRTIO_RTC_SUBTYPE_LEAP_UNSPECIFIED 3 > > > > +\end{lstlisting} > > > > > > > > I agree with the above part of your proposal. > > > > > > I can certainly ensure that 'noon linear' has the same value. I don't > > > > think you need both 'SMEAR' and 'LEAP_UNSPECIFIED' though: > > > > > > > > > > > > +\item VIRTIO_RTC_SUBTYPE_SMEAR deviates from the UTC standard by > > > > + smearing time in the vicinity of the leap second, in a not > > > > + precisely defined manner. This avoids clock steps due to UTC > > > > + leap seconds. > > > > > > > > ... > > > > > > > > +\item VIRTIO_RTC_SUBTYPE_LEAP_UNSPECIFIED may deviate from the UTC > > > > + standard w.r.t.\ leap second introduction in an unspecified > > > > way > > > > + (leap seconds may, or may not, be smeared). > > > > > > > > To the client, both of those just mean "for a day or so around a leap > > > > second event, you can't trust this device to know what the time is". > > > > There isn't any point in separating "does lie to you" from "might lie > > > > to you", surely? The guest can't do anything useful with that > > > > distinction. Let's drop SMEAR and keep only LEAP_UNSPECIFIED? > > > > As for VIRTIO_RTC_SUBTYPE_SMEAR, I think this could be dropped indeed > > (resp., UTC_SLS may be added). > > > > But VIRTIO_RTC_CLOCK_SMEARED_UTC is an assurance that there will be no > > steps (in particular, steps backwards, which some clients might not like) > > due to leap seconds, while LEAP_UNSPECIFIED provides no such guarantee. > > > > So I think this might be better handled by adding, alongside > > > > > > #define VIRTIO_RTC_CLOCK_SMEARED_UTC 3 > > > > #define VIRTIO_RTC_CLOCK_LEAP_UNSPECIFIED_UTC 4 > > > > (or any better name, like VIRTIO_RTC_CLOCK_MAYBE_SMEARED_UTC). > > > > > > > > > > And if you *really* want to parameterise it, I think that's a bad idea > > > > and it encourages the proliferation of different time "standards", but > > > > I'd probably just suck it up and do whatever you do because that's not > > > > strictly within the remit of my live-migration part. > > > > I think the above proposal to have subtypes for > > VIRTIO_RTC_CLOCK_SMEARED_UTC should work. To clarify then, the main types are VIRTIO_RTC_CLOCK_UTC == 0 VIRTIO_RTC_CLOCK_TAI == 1 VIRTIO_RTC_CLOCK_MONOTONIC == 2
Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support
On 01.07.24 10:57, David Woodhouse wrote: > On Fri, 2024-06-28 at 22:27 +0100, David Woodhouse wrote: >> On 28 June 2024 17:38:15 BST, Peter Hilber >> wrote: >>> On 28.06.24 14:15, David Woodhouse wrote: On Fri, 2024-06-28 at 13:33 +0200, Peter Hilber wrote: > On 27.06.24 16:52, David Woodhouse wrote: >> I already added a flags field, so this might look something like: >> >> /* >> * Smearing flags. The UTC clock exposed through this structure >> * is only ever true UTC, but a guest operating system may >> * choose to offer a monotonic smeared clock to its users. This >> * merely offers a hint about what kind of smearing to perform, >> * for consistency with systems in the nearby environment. >> */ >> #define VMCLOCK_FLAGS_SMEAR_UTC_SLS (1<<5) /* >> draft-kuhn-leapsecond-00.txt */ >> >> (UTC-SLS is probably a bad example but are there formal definitions for >> anything else?) > > I think it could also be more generic, like flags for linear smearing, > cosine smearing(?), and smear_start_sec and smear_end_sec fields (relative > to the leap second start). That could also represent UTC-SLS, and > noon-to-noon, and it would be well-defined. > > This should reduce the likelihood that the guest doesn't know the smearing > variant. I'm wary of making it too generic. That would seem to encourage a *proliferation* of false "UTC-like" clocks. It's bad enough that we do smearing at all, let alone that we don't have a single definition of how to do it. I made the smearing hint a full uint8_t instead of using bits in flags, in the end. That gives us a full 255 ways of lying to users about what the time is, so we're unlikely to run out. And it's easy enough to add a new VMCLOCK_SMEARING_XXX type to the 'registry' for any new methods that get invented. >>> >>> My concern is that the registry update may come after a driver has already >>> been implemented, so that it may be hard to ensure that the smearing which >>> has been chosen is actually implemented. >> >> Well yes, but why in the name of all that is holy would anyone want >> to invent *new* ways to lie to users about the time? If we capture >> the existing ones as we write this, surely it's a good thing that >> there's a barrier to entry for adding more? > > Ultimately though, this isn't the hill for me to die on. I'm pushing on > that topic because I want to avoid the proliferation of *ambiguity*. If > we have a precision clock, we should *know* what the time is. > > So how about this proposal. I line up the fields in the proposed shared > memory structure to match your virtio-rtc proposal, using 'subtype' as > you proposed. But, instead of the 'subtype' being valid only for > VIRTIO_RTC_CLOCK_UTC, we define a new top-level type for *smeared* UTC. > > So, you have: > > +\begin{lstlisting} > +#define VIRTIO_RTC_CLOCK_UTC 0 > +#define VIRTIO_RTC_CLOCK_TAI 1 > +#define VIRTIO_RTC_CLOCK_MONO 2 > +\end{lstlisting} > > I propose that you add > > #define VIRTIO_RTC_CLOCK_SMEARED_UTC 3 > > If my proposed memory structure is subsumed into the virtio-rtc > proposal we'd literally use the same names, but for the time being I'll > update mine to: Do you intend vmclock and virtio-rtc to be ABI compatible? FYI, I see a potential problem in that Virtio does avoid the use of signed integers so far. I did not check carefully if there might be other problems, yet. > > /* >* What time is exposed in the time_sec/time_frac_sec fields? >*/ > uint8_t time_type; > #define VMCLOCK_TIME_UTC 0 /* Since 1970-01-01 00:00:00z */ > #define VMCLOCK_TIME_TAI 1 /* Since 1970-01-01 00:00:00z */ > #define VMCLOCK_TIME_MONOTONIC2 /* Since undefined > epoch */ > #define VMCLOCK_TIME_INVALID 3 /* virtio-rtc uses this for > smeared UTC */ > > > I can then use your smearing subtype values as the 'hint' field in the > shared memory structure. You currently have: > > +\begin{lstlisting} > +#define VIRTIO_RTC_SUBTYPE_STRICT 0 > +#define VIRTIO_RTC_SUBTYPE_SMEAR 1 > +#define VIRTIO_RTC_SUBTYPE_SMEAR_NOON_LINEAR 2 > +#define VIRTIO_RTC_SUBTYPE_LEAP_UNSPECIFIED 3 > +\end{lstlisting} > I agree with the above part of your proposal. > I can certainly ensure that 'noon linear' has the same value. I don't > think you need both 'SMEAR' and 'LEAP_UNSPECIFIED' though: > > > +\item VIRTIO_RTC_SUBTYPE_SMEAR deviates from the UTC standard by > + smearing time in the vicinity of the leap second, in a not > + precisely defined manner. This avoids clock steps due to UTC > + leap seconds. > > ... > > +\item VIRTIO_RTC_SUBTYPE_LEAP_UNSPECIFIED may deviate from the UTC > + standard w.r.t.\ leap second introduction in an unspecified > way > + (leap seconds
Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support
On Fri, 2024-06-28 at 22:27 +0100, David Woodhouse wrote: > On 28 June 2024 17:38:15 BST, Peter Hilber > wrote: > > On 28.06.24 14:15, David Woodhouse wrote: > > > On Fri, 2024-06-28 at 13:33 +0200, Peter Hilber wrote: > > > > On 27.06.24 16:52, David Woodhouse wrote: > > > > > I already added a flags field, so this might look something like: > > > > > > > > > > /* > > > > > * Smearing flags. The UTC clock exposed through this > > > > > structure > > > > > * is only ever true UTC, but a guest operating system may > > > > > * choose to offer a monotonic smeared clock to its users. > > > > > This > > > > > * merely offers a hint about what kind of smearing to > > > > > perform, > > > > > * for consistency with systems in the nearby environment. > > > > > */ > > > > > #define VMCLOCK_FLAGS_SMEAR_UTC_SLS (1<<5) /* > > > > > draft-kuhn-leapsecond-00.txt */ > > > > > > > > > > (UTC-SLS is probably a bad example but are there formal definitions > > > > > for > > > > > anything else?) > > > > > > > > I think it could also be more generic, like flags for linear smearing, > > > > cosine smearing(?), and smear_start_sec and smear_end_sec fields > > > > (relative > > > > to the leap second start). That could also represent UTC-SLS, and > > > > noon-to-noon, and it would be well-defined. > > > > > > > > This should reduce the likelihood that the guest doesn't know the > > > > smearing > > > > variant. > > > > > > I'm wary of making it too generic. That would seem to encourage a > > > *proliferation* of false "UTC-like" clocks. > > > > > > It's bad enough that we do smearing at all, let alone that we don't > > > have a single definition of how to do it. > > > > > > I made the smearing hint a full uint8_t instead of using bits in flags, > > > in the end. That gives us a full 255 ways of lying to users about what > > > the time is, so we're unlikely to run out. And it's easy enough to add > > > a new VMCLOCK_SMEARING_XXX type to the 'registry' for any new methods > > > that get invented. > > > > > > > > > > My concern is that the registry update may come after a driver has already > > been implemented, so that it may be hard to ensure that the smearing which > > has been chosen is actually implemented. > > Well yes, but why in the name of all that is holy would anyone want > to invent *new* ways to lie to users about the time? If we capture > the existing ones as we write this, surely it's a good thing that > there's a barrier to entry for adding more? Ultimately though, this isn't the hill for me to die on. I'm pushing on that topic because I want to avoid the proliferation of *ambiguity*. If we have a precision clock, we should *know* what the time is. So how about this proposal. I line up the fields in the proposed shared memory structure to match your virtio-rtc proposal, using 'subtype' as you proposed. But, instead of the 'subtype' being valid only for VIRTIO_RTC_CLOCK_UTC, we define a new top-level type for *smeared* UTC. So, you have: +\begin{lstlisting} +#define VIRTIO_RTC_CLOCK_UTC 0 +#define VIRTIO_RTC_CLOCK_TAI 1 +#define VIRTIO_RTC_CLOCK_MONO 2 +\end{lstlisting} I propose that you add #define VIRTIO_RTC_CLOCK_SMEARED_UTC 3 If my proposed memory structure is subsumed into the virtio-rtc proposal we'd literally use the same names, but for the time being I'll update mine to: /* * What time is exposed in the time_sec/time_frac_sec fields? */ uint8_t time_type; #define VMCLOCK_TIME_UTC0 /* Since 1970-01-01 00:00:00z */ #define VMCLOCK_TIME_TAI1 /* Since 1970-01-01 00:00:00z */ #define VMCLOCK_TIME_MONOTONIC 2 /* Since undefined epoch */ #define VMCLOCK_TIME_INVALID3 /* virtio-rtc uses this for smeared UTC */ I can then use your smearing subtype values as the 'hint' field in the shared memory structure. You currently have: +\begin{lstlisting} +#define VIRTIO_RTC_SUBTYPE_STRICT 0 +#define VIRTIO_RTC_SUBTYPE_SMEAR 1 +#define VIRTIO_RTC_SUBTYPE_SMEAR_NOON_LINEAR 2 +#define VIRTIO_RTC_SUBTYPE_LEAP_UNSPECIFIED 3 +\end{lstlisting} I can certainly ensure that 'noon linear' has the same value. I don't think you need both 'SMEAR' and 'LEAP_UNSPECIFIED' though: +\item VIRTIO_RTC_SUBTYPE_SMEAR deviates from the UTC standard by + smearing time in the vicinity of the leap second, in a not + precisely defined manner. This avoids clock steps due to UTC + leap seconds. ... +\item VIRTIO_RTC_SUBTYPE_LEAP_UNSPECIFIED may deviate from the UTC + standard w.r.t.\ leap second introduction in an unspecified way + (leap seconds may, or may not, be smeared). To the client, both of those just mean "for a day or so around a leap second event, you can't trust this device to know what the time is". There isn't any point in separating "does lie to you" from "might lie to you", surely? The guest can't do
Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support
On 28 June 2024 17:38:15 BST, Peter Hilber wrote: >On 28.06.24 14:15, David Woodhouse wrote: >> On Fri, 2024-06-28 at 13:33 +0200, Peter Hilber wrote: >>> On 27.06.24 16:52, David Woodhouse wrote: I already added a flags field, so this might look something like: /* * Smearing flags. The UTC clock exposed through this structure * is only ever true UTC, but a guest operating system may * choose to offer a monotonic smeared clock to its users. This * merely offers a hint about what kind of smearing to perform, * for consistency with systems in the nearby environment. */ #define VMCLOCK_FLAGS_SMEAR_UTC_SLS (1<<5) /* draft-kuhn-leapsecond-00.txt */ (UTC-SLS is probably a bad example but are there formal definitions for anything else?) >>> >>> I think it could also be more generic, like flags for linear smearing, >>> cosine smearing(?), and smear_start_sec and smear_end_sec fields (relative >>> to the leap second start). That could also represent UTC-SLS, and >>> noon-to-noon, and it would be well-defined. >>> >>> This should reduce the likelihood that the guest doesn't know the smearing >>> variant. >> >> I'm wary of making it too generic. That would seem to encourage a >> *proliferation* of false "UTC-like" clocks. >> >> It's bad enough that we do smearing at all, let alone that we don't >> have a single definition of how to do it. >> >> I made the smearing hint a full uint8_t instead of using bits in flags, >> in the end. That gives us a full 255 ways of lying to users about what >> the time is, so we're unlikely to run out. And it's easy enough to add >> a new VMCLOCK_SMEARING_XXX type to the 'registry' for any new methods >> that get invented. >> >> > >My concern is that the registry update may come after a driver has already >been implemented, so that it may be hard to ensure that the smearing which >has been chosen is actually implemented. Well yes, but why in the name of all that is holy would anyone want to invent *new* ways to lie to users about the time? If we capture the existing ones as we write this, surely it's a good thing that there's a barrier to entry for adding more? >But the error bounds could be large or missing. I am trying to address use >cases where the host steps or slews the clock as well. The host is absolutely intended to be skewing the clock to keep it accurate as the frequency of the underlying oscillator changes, and the seq_count field will change every time the host does so. Do we need to handle steps differently? Or just let the guest deal with it? If an NTP server suddenly steps the time it reports, what does the client do?
Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support
On 28.06.24 14:15, David Woodhouse wrote: > On Fri, 2024-06-28 at 13:33 +0200, Peter Hilber wrote: >> On 27.06.24 16:52, David Woodhouse wrote: >>> I already added a flags field, so this might look something like: >>> >>> /* >>> * Smearing flags. The UTC clock exposed through this structure >>> * is only ever true UTC, but a guest operating system may >>> * choose to offer a monotonic smeared clock to its users. This >>> * merely offers a hint about what kind of smearing to perform, >>> * for consistency with systems in the nearby environment. >>> */ >>> #define VMCLOCK_FLAGS_SMEAR_UTC_SLS (1<<5) /* draft-kuhn-leapsecond-00.txt >>> */ >>> >>> (UTC-SLS is probably a bad example but are there formal definitions for >>> anything else?) >> >> I think it could also be more generic, like flags for linear smearing, >> cosine smearing(?), and smear_start_sec and smear_end_sec fields (relative >> to the leap second start). That could also represent UTC-SLS, and >> noon-to-noon, and it would be well-defined. >> >> This should reduce the likelihood that the guest doesn't know the smearing >> variant. > > I'm wary of making it too generic. That would seem to encourage a > *proliferation* of false "UTC-like" clocks. > > It's bad enough that we do smearing at all, let alone that we don't > have a single definition of how to do it. > > I made the smearing hint a full uint8_t instead of using bits in flags, > in the end. That gives us a full 255 ways of lying to users about what > the time is, so we're unlikely to run out. And it's easy enough to add > a new VMCLOCK_SMEARING_XXX type to the 'registry' for any new methods > that get invented. > > My concern is that the registry update may come after a driver has already been implemented, so that it may be hard to ensure that the smearing which has been chosen is actually implemented. > + /* > + * This field changes to another non-repeating value when the CPU > + * counter is disrupted, for example on live migration. > + */ > + uint64_t disruption_marker; The field could also change when the clock is stepped (leap seconds excepted), or when the clock frequency is slewed. >>> >>> I'm not sure. The concept of the disruption marker is that it tells the >>> guest to throw away any calibration of the counter that the guest has >>> done for *itself* (with NTP, other PTP devices, etc.). >>> >>> One mode for this device would be not to populate the clock fields at >>> all, but *only* to signal disruption when it occurs. So the guest can >>> abort transactions until it's resynced its clocks (to avoid incurring >>> fines if breaking databases, etc.). >>> >>> Exposing the host timekeeping through the structure means that the >>> migrated guest can keep working because it can trust the timekeeping >>> performed by the (new) host and exposed to it. >>> >>> If the counter is actually varying in frequency over time, and the host >>> is slewing the clock frequency that it reports, that *isn't* a step >>> change and doesn't mean that the guest should throw away any >>> calibration that it's been doing for itself. One hopes that the guest >>> would have detected the *same* frequency change, and be adapting for >>> itself. So I don't think that should indicate a disruption. >>> >>> I think the same is even true if the clock is stepped by the host. The >>> actual *counter* hasn't changed, so the guest is better off ignoring >>> the vacillating host and continuing to derive its idea of time from the >>> hardware counter itself, as calibrated against some external NTP/PTP >>> sources. Surely we actively *don't* to tell the guest to throw its own >>> calibrations away, in this case? >> >> In case the guest is also considering other time sources, it might indeed >> not be a good idea to mix host clock changes into the hardware counter >> disruption marker. >> >> But if the vmclock is the authoritative source of time, it can still be >> helpful to know about such changes, maybe through another marker. > > Could that be the existing seq_count field? > > Skewing the counter_period_frac_sec as the underlying oscillator speeds > up and slows down is perfectly normal and expected, and we already > expect the seq_count to change when that happens. > > Maybe step changes are different, but arguably if the time advertised > by the host steps *outside* the error bounds previously advertised, > that's just broken? But the error bounds could be large or missing. I am trying to address use cases where the host steps or slews the clock as well. > > Depending on how the clock information is fed, a change in seq_count > may even result in non-monotonicity. If the underlying oscillator has > sped up and the structure is updated accordingly, the time calculated > the moment *before* that update may appear later than the time > calculated immediately after it. > > It's up
Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support
On Fri, 2024-06-28 at 13:33 +0200, Peter Hilber wrote: > On 27.06.24 16:52, David Woodhouse wrote: > > I already added a flags field, so this might look something like: > > > > /* > > * Smearing flags. The UTC clock exposed through this structure > > * is only ever true UTC, but a guest operating system may > > * choose to offer a monotonic smeared clock to its users. This > > * merely offers a hint about what kind of smearing to perform, > > * for consistency with systems in the nearby environment. > > */ > > #define VMCLOCK_FLAGS_SMEAR_UTC_SLS (1<<5) /* draft-kuhn-leapsecond-00.txt > > */ > > > > (UTC-SLS is probably a bad example but are there formal definitions for > > anything else?) > > I think it could also be more generic, like flags for linear smearing, > cosine smearing(?), and smear_start_sec and smear_end_sec fields (relative > to the leap second start). That could also represent UTC-SLS, and > noon-to-noon, and it would be well-defined. > > This should reduce the likelihood that the guest doesn't know the smearing > variant. I'm wary of making it too generic. That would seem to encourage a *proliferation* of false "UTC-like" clocks. It's bad enough that we do smearing at all, let alone that we don't have a single definition of how to do it. I made the smearing hint a full uint8_t instead of using bits in flags, in the end. That gives us a full 255 ways of lying to users about what the time is, so we're unlikely to run out. And it's easy enough to add a new VMCLOCK_SMEARING_XXX type to the 'registry' for any new methods that get invented. > > > > + /* > > > > + * This field changes to another non-repeating value when the > > > > CPU > > > > + * counter is disrupted, for example on live migration. > > > > + */ > > > > + uint64_t disruption_marker; > > > > > > The field could also change when the clock is stepped (leap seconds > > > excepted), or when the clock frequency is slewed. > > > > I'm not sure. The concept of the disruption marker is that it tells the > > guest to throw away any calibration of the counter that the guest has > > done for *itself* (with NTP, other PTP devices, etc.). > > > > One mode for this device would be not to populate the clock fields at > > all, but *only* to signal disruption when it occurs. So the guest can > > abort transactions until it's resynced its clocks (to avoid incurring > > fines if breaking databases, etc.). > > > > Exposing the host timekeeping through the structure means that the > > migrated guest can keep working because it can trust the timekeeping > > performed by the (new) host and exposed to it. > > > > If the counter is actually varying in frequency over time, and the host > > is slewing the clock frequency that it reports, that *isn't* a step > > change and doesn't mean that the guest should throw away any > > calibration that it's been doing for itself. One hopes that the guest > > would have detected the *same* frequency change, and be adapting for > > itself. So I don't think that should indicate a disruption. > > > > I think the same is even true if the clock is stepped by the host. The > > actual *counter* hasn't changed, so the guest is better off ignoring > > the vacillating host and continuing to derive its idea of time from the > > hardware counter itself, as calibrated against some external NTP/PTP > > sources. Surely we actively *don't* to tell the guest to throw its own > > calibrations away, in this case? > > In case the guest is also considering other time sources, it might indeed > not be a good idea to mix host clock changes into the hardware counter > disruption marker. > > But if the vmclock is the authoritative source of time, it can still be > helpful to know about such changes, maybe through another marker. Could that be the existing seq_count field? Skewing the counter_period_frac_sec as the underlying oscillator speeds up and slows down is perfectly normal and expected, and we already expect the seq_count to change when that happens. Maybe step changes are different, but arguably if the time advertised by the host steps *outside* the error bounds previously advertised, that's just broken? Depending on how the clock information is fed, a change in seq_count may even result in non-monotonicity. If the underlying oscillator has sped up and the structure is updated accordingly, the time calculated the moment *before* that update may appear later than the time calculated immediately after it. It's up to the guest operating system to feed that information into its own timekeeping system and skew towards correctness instead of stepping the time it reports to its users. smime.p7s Description: S/MIME cryptographic signature
Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support
On Fri, 2024-06-28 at 13:33 +0200, Peter Hilber wrote: > > > > > /* > > * What time is exposed in the time_sec/time_frac_sec fields? > > */ > > uint8_t time_type; > > #define VMCLOCK_TIME_UNKNOWN0 /* Invalid / no time > > exposed */ > > #define VMCLOCK_TIME_UTC1 /* Since 1970-01-01 > > 00:00:00z */ > > #define VMCLOCK_TIME_TAI2 /* Since 1970-01-01 > > 00:00:00z */ > > #define VMCLOCK_TIME_MONOTONIC 3 /* Since undefined epoch */ > > > > /* Bit shift for counter_period_frac_sec and its error rate */ > > uint8_t counter_period_shift; > > > > /* > > * Unlike in NTP, this can indicate a leap second in the past. This > > * is needed to allow guests to derive an imprecise clock with > > * smeared leap seconds for themselves, as some modes of smearing > > * need the adjustments to continue even after the moment at which > > * the leap second should have occurred. > > */ > > int8_t leapsecond_direction; > > uint64_t leapsecond_tai_sec; /* Since 1970-01-01 00:00:00z */ > > > > /* > > * Paired values of counter and UTC at a given point in time. > > */ > > uint64_t counter_value; > > uint64_t time_sec; /* Since 1970-01-01 00:00:00z */ > > Nitpick: The comment is not valid any more for TIME_MONOTONIC. Ah yes, I "moved" that comment up to the UTC/TAI time_type values, but neglected to actually delete it from here. Fixed; thanks. smime.p7s Description: S/MIME cryptographic signature
Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support
On 27.06.24 18:03, David Woodhouse wrote: > > I've updated the tree at > https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/vmclock > (but not yet the qemu one). > > I think I've taken into account all your comments apart from the one > about non-64-bit counters wrapping. I reduced the seq_count to 32 bit > to make room for a 32-bit flags field, added the time type > (UTC/TAI/MONOTONIC) and a smearing hint, with some straw man > definitions for smearing algorithms for which I could actually find > definitions. > > The structure now looks like this: > > > struct vmclock_abi { [...] > > /* >* What time is exposed in the time_sec/time_frac_sec fields? >*/ > uint8_t time_type; > #define VMCLOCK_TIME_UNKNOWN 0 /* Invalid / no time exposed */ > #define VMCLOCK_TIME_UTC 1 /* Since 1970-01-01 00:00:00z */ > #define VMCLOCK_TIME_TAI 2 /* Since 1970-01-01 00:00:00z */ > #define VMCLOCK_TIME_MONOTONIC3 /* Since undefined > epoch */ > > /* Bit shift for counter_period_frac_sec and its error rate */ > uint8_t counter_period_shift; > > /* >* Unlike in NTP, this can indicate a leap second in the past. This >* is needed to allow guests to derive an imprecise clock with >* smeared leap seconds for themselves, as some modes of smearing >* need the adjustments to continue even after the moment at which >* the leap second should have occurred. >*/ > int8_t leapsecond_direction; > uint64_t leapsecond_tai_sec; /* Since 1970-01-01 00:00:00z */ > > /* >* Paired values of counter and UTC at a given point in time. >*/ > uint64_t counter_value; > uint64_t time_sec; /* Since 1970-01-01 00:00:00z */ Nitpick: The comment is not valid any more for TIME_MONOTONIC.
Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support
On 27.06.24 16:52, David Woodhouse wrote: > On Thu, 2024-06-27 at 15:50 +0200, Peter Hilber wrote: >> On 25.06.24 21:01, David Woodhouse wrote: >>> From: David Woodhouse >>> >>> The vmclock "device" provides a shared memory region with precision clock >>> information. By using shared memory, it is safe across Live Migration. >>> >>> Like the KVM PTP clock, this can convert TSC-based cross timestamps into >>> KVM clock values. Unlike the KVM PTP clock, it does so only when such is >>> actually helpful. >>> >>> The memory region of the device is also exposed to userspace so it can be >>> read or memory mapped by application which need reliable notification of >>> clock disruptions. >>> >>> Signed-off-by: David Woodhouse >>> --- >>> >>> v2: >>> • Add gettimex64() support >>> • Convert TSC values to KVM clock when appropriate >>> • Require int128 support >>> • Add counter_period_shift >>> • Add timeout when seq_count is invalid >>> • Add flags field >>> • Better comments in vmclock ABI structure >>> • Explicitly forbid smearing (as clock rates would need to change) >> >> Leap second smearing information could still be conveyed through the >> vmclock_abi. AFAIU, to cover the popular smearing variants, it should be >> enough to indicate whether the driver should apply linear or cosine >> smearing, and the start time and end time. > > Yes. The clock information actually conveyed through the {counter, > time, rate} tuple should never be smeared, and should only ever be UTC. > > But we could provide a hint to the guest operating system about what > type of smearing to perform, *if* it chooses to offer a clock other > than the standard CLOCK_REALTIME to its users. > > I already added a flags field, so this might look something like: > > /* > * Smearing flags. The UTC clock exposed through this structure > * is only ever true UTC, but a guest operating system may > * choose to offer a monotonic smeared clock to its users. This > * merely offers a hint about what kind of smearing to perform, > * for consistency with systems in the nearby environment. > */ > #define VMCLOCK_FLAGS_SMEAR_UTC_SLS (1<<5) /* draft-kuhn-leapsecond-00.txt */ > > > (UTC-SLS is probably a bad example but are there formal definitions for > anything else?) > > I think it could also be more generic, like flags for linear smearing, cosine smearing(?), and smear_start_sec and smear_end_sec fields (relative to the leap second start). That could also represent UTC-SLS, and noon-to-noon, and it would be well-defined. This should reduce the likelihood that the guest doesn't know the smearing variant. [...] >>> diff --git a/include/uapi/linux/vmclock.h b/include/uapi/linux/vmclock.h >>> new file mode 100644 >>> index ..cf0f22205e79 >>> --- /dev/null >>> +++ b/include/uapi/linux/vmclock.h >>> @@ -0,0 +1,138 @@ >>> +/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR >>> BSD-2-Clause) */ >>> + >>> +/* >>> + * This structure provides a vDSO-style clock to VM guests, exposing the >>> + * relationship (or lack thereof) between the CPU clock (TSC, timebase, >>> arch >>> + * counter, etc.) and real time. It is designed to address the problem of >>> + * live migration, which other clock enlightenments do not. >>> + * >>> + * When a guest is live migrated, this affects the clock in two ways. >>> + * >>> + * First, even between identical hosts the actual frequency of the >>> underlying >>> + * counter will change within the tolerances of its specification >>> (typically >>> + * ±50PPM, or 4 seconds a day). The frequency also varies over time on the >>> + * same host, but can be tracked by NTP as it generally varies slowly. With >>> + * live migration there is a step change in the frequency, with no warning. >>> + * >>> + * Second, there may be a step change in the value of the counter itself, >>> as >>> + * its accuracy is limited by the precision of the NTP synchronization on >>> the >>> + * source and destination hosts. >>> + * >>> + * So any calibration (NTP, PTP, etc.) which the guest has done on the >>> source >>> + * host before migration is invalid, and needs to be redone on the new >>> host. >>> + * >>> + * In its most basic mode, this structure provides only an indication to >>> the >>> + * guest that live migration has occurred. This allows the guest to know >>> that >>> + * its clock is invalid and take remedial action. For applications that >>> need >>> + * reliable accurate timestamps (e.g. distributed databases), the structure >>> + * can be mapped all the way to userspace. This allows the application to >>> see >>> + * directly for itself that the clock is disrupted and take appropriate >>> + * action, even when using a vDSO-style method to get the time instead of a >>> + * system call. >>> + * >>> + * In its more advanced mode. this structure can also be used to expose the >>> + * precise relationship of the CPU counter to
Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support
I've updated the tree at https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/vmclock (but not yet the qemu one). I think I've taken into account all your comments apart from the one about non-64-bit counters wrapping. I reduced the seq_count to 32 bit to make room for a 32-bit flags field, added the time type (UTC/TAI/MONOTONIC) and a smearing hint, with some straw man definitions for smearing algorithms for which I could actually find definitions. The structure now looks like this: struct vmclock_abi { uint32_t magic; #define VMCLOCK_MAGIC 0x4b4c4356 /* "VCLK" */ uint16_t size; /* Size of page containing this structure */ uint16_t version; /* 1 */ /* Sequence lock. Low bit means an update is in progress. */ uint32_t seq_count; uint32_t flags; /* Indicates that the tai_offset_sec field is valid */ #define VMCLOCK_FLAG_TAI_OFFSET_VALID (1 << 0) /* * Optionally used to notify guests of pending maintenance events. * A guest may wish to remove itself from service if an event is * coming up. Two flags indicate the rough imminence of the event. */ #define VMCLOCK_FLAG_DISRUPTION_SOON(1 << 1) /* About a day */ #define VMCLOCK_FLAG_DISRUPTION_IMMINENT(1 << 2) /* About an hour */ /* Indicates that the utc_time_maxerror_picosec field is valid */ #define VMCLOCK_FLAG_UTC_MAXERROR_VALID (1 << 3) /* Indicates counter_period_error_rate_frac_sec is valid */ #define VMCLOCK_FLAG_PERIOD_ERROR_VALID (1 << 4) /* * This field changes to another non-repeating value when the CPU * counter is disrupted, for example on live migration. This lets * the guest know that it should discard any calibration it has * performed of the counter against external sources (NTP/PTP/etc.). */ uint64_t disruption_marker; uint8_t clock_status; #define VMCLOCK_STATUS_UNKNOWN 0 #define VMCLOCK_STATUS_INITIALIZING 1 #define VMCLOCK_STATUS_SYNCHRONIZED 2 #define VMCLOCK_STATUS_FREERUNNING 3 #define VMCLOCK_STATUS_UNRELIABLE 4 uint8_t counter_id; #define VMCLOCK_COUNTER_INVALID 0 #define VMCLOCK_COUNTER_X86_TSC 1 #define VMCLOCK_COUNTER_ARM_VCNT2 #define VMCLOCK_COUNTER_X86_ART 3 /* * By providing the offset from UTC to TAI, the guest can know both * UTC and TAI reliably, whichever is indicated in the time_type * field. Valid if VMCLOCK_FLAG_TAI_OFFSET_VALID is set in flags. */ int16_t tai_offset_sec; /* * The time exposed through this device is never smeaared; if it * claims to be VMCLOCK_TIME_UTC then it MUST be UTC. This field * provides a hint to the guest operating system, such that *if* * the guest OS wants to provide its users with an alternative * clock which does not follow the POSIX CLOCK_REALTIME standard, * it may do so in a fashion consistent with the other systems * in the nearby environment. */ uint8_t leap_second_smearing_hint; /* Provide true UTC to users, unsmeared. */; #define VMCLOCK_SMEARING_NONE 0 /* * https://aws.amazon.com/blogs/aws/look-before-you-leap-the-coming-leap-second-and-aws/ * From noon on the day before to noon on the day after, smear the * clock by a linear 1/86400s per second. */ #define VMCLOCK_SMEARING_LINEAR_86400 1 /* * draft-kuhn-leapsecond-00 * For the 1000s leading up to the leap second, smear the clock by * clock by a linear 1ms per second. */ #define VMCLOCK_SMEARING_UTC_SLS2 /* * What time is exposed in the time_sec/time_frac_sec fields? */ uint8_t time_type; #define VMCLOCK_TIME_UNKNOWN0 /* Invalid / no time exposed */ #define VMCLOCK_TIME_UTC1 /* Since 1970-01-01 00:00:00z */ #define VMCLOCK_TIME_TAI2 /* Since 1970-01-01 00:00:00z */ #define VMCLOCK_TIME_MONOTONIC 3 /* Since undefined epoch */ /* Bit shift for counter_period_frac_sec and its error rate */ uint8_t counter_period_shift; /* * Unlike in NTP, this can indicate a leap second in the past. This * is needed to allow guests to derive an imprecise clock with * smeared leap seconds for themselves, as some modes of smearing * need the adjustments to continue even after the moment at which * the leap second should have occurred. */ int8_t leapsecond_direction; uint64_t leapsecond_tai_sec; /* Since 1970-01-01 00:00:00z */ /* * Paired values of counter and UTC at a given point in time. */ uint64_t
Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support
On Thu, 2024-06-27 at 15:50 +0200, Peter Hilber wrote: > On 25.06.24 21:01, David Woodhouse wrote: > > From: David Woodhouse > > > > The vmclock "device" provides a shared memory region with precision clock > > information. By using shared memory, it is safe across Live Migration. > > > > Like the KVM PTP clock, this can convert TSC-based cross timestamps into > > KVM clock values. Unlike the KVM PTP clock, it does so only when such is > > actually helpful. > > > > The memory region of the device is also exposed to userspace so it can be > > read or memory mapped by application which need reliable notification of > > clock disruptions. > > > > Signed-off-by: David Woodhouse > > --- > > > > v2: > > • Add gettimex64() support > > • Convert TSC values to KVM clock when appropriate > > • Require int128 support > > • Add counter_period_shift > > • Add timeout when seq_count is invalid > > • Add flags field > > • Better comments in vmclock ABI structure > > • Explicitly forbid smearing (as clock rates would need to change) > > Leap second smearing information could still be conveyed through the > vmclock_abi. AFAIU, to cover the popular smearing variants, it should be > enough to indicate whether the driver should apply linear or cosine > smearing, and the start time and end time. Yes. The clock information actually conveyed through the {counter, time, rate} tuple should never be smeared, and should only ever be UTC. But we could provide a hint to the guest operating system about what type of smearing to perform, *if* it chooses to offer a clock other than the standard CLOCK_REALTIME to its users. I already added a flags field, so this might look something like: /* * Smearing flags. The UTC clock exposed through this structure * is only ever true UTC, but a guest operating system may * choose to offer a monotonic smeared clock to its users. This * merely offers a hint about what kind of smearing to perform, * for consistency with systems in the nearby environment. */ #define VMCLOCK_FLAGS_SMEAR_UTC_SLS (1<<5) /* draft-kuhn-leapsecond-00.txt */ (UTC-SLS is probably a bad example but are there formal definitions for anything else?) > > But we > > drivers/ptp/Kconfig | 13 + > > drivers/ptp/Makefile | 1 + > > drivers/ptp/ptp_vmclock.c | 516 +++ > > include/uapi/linux/vmclock.h | 138 ++ > > 4 files changed, 668 insertions(+) > > create mode 100644 drivers/ptp/ptp_vmclock.c > > create mode 100644 include/uapi/linux/vmclock.h > > > > [...] > > > + > > +/* > > + * Multiply a 64-bit count by a 64-bit tick 'period' in units of seconds > > >> 64 > > + * and add the fractional second part of the reference time. > > + * > > + * The result is a 128-bit value, the top 64 bits of which are seconds, and > > + * the low 64 bits are (seconds >> 64). > > + * > > + * If __int128 isn't available, perform the calculation 32 bits at a time > > to > > + * avoid overflow. > > + */ > > +static inline uint64_t mul_u64_u64_shr_add_u64(uint64_t *res_hi, uint64_t > > delta, > > + uint64_t period, uint8_t > > shift, > > + uint64_t frac_sec) > > +{ > > + unsigned __int128 res = (unsigned __int128)delta * period; > > + > > + res >>= shift; > > + res += frac_sec; > > + *res_hi = res >> 64; > > + return (uint64_t)res; > > +} > > + > > +static int vmclock_get_crosststamp(struct vmclock_state *st, > > + struct ptp_system_timestamp *sts, > > + struct system_counterval_t > > *system_counter, > > + struct timespec64 *tspec) > > +{ > > + ktime_t deadline = ktime_add(ktime_get(), VMCLOCK_MAX_WAIT); > > + struct system_time_snapshot systime_snapshot; > > + uint64_t cycle, delta, seq, frac_sec; > > + > > +#ifdef CONFIG_X86 > > + /* > > + * We'd expect the hypervisor to know this and to report the clock > > + * status as VMCLOCK_STATUS_UNRELIABLE. But be paranoid. > > + */ > > + if (check_tsc_unstable()) > > + return -EINVAL; > > +#endif > > + > > + while (1) { > > + seq = st->clk->seq_count & ~1ULL; > > + virt_rmb(); > > + > > + if (st->clk->clock_status == VMCLOCK_STATUS_UNRELIABLE) > > + return -EINVAL; > > + > > + /* > > + * When invoked for gettimex64(), fill in the pre/post > > system > > + * times. The simple case is when system time is based on > > the > > + * same counter as st->cs_id, in which case all three times > > + * will be derived from the *same* counter value. > > + * > > + * If the system isn't using the same counter, then
Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support
On 25.06.24 21:01, David Woodhouse wrote: > From: David Woodhouse > > The vmclock "device" provides a shared memory region with precision clock > information. By using shared memory, it is safe across Live Migration. > > Like the KVM PTP clock, this can convert TSC-based cross timestamps into > KVM clock values. Unlike the KVM PTP clock, it does so only when such is > actually helpful. > > The memory region of the device is also exposed to userspace so it can be > read or memory mapped by application which need reliable notification of > clock disruptions. > > Signed-off-by: David Woodhouse > --- > > v2: > • Add gettimex64() support > • Convert TSC values to KVM clock when appropriate > • Require int128 support > • Add counter_period_shift > • Add timeout when seq_count is invalid > • Add flags field > • Better comments in vmclock ABI structure > • Explicitly forbid smearing (as clock rates would need to change) Leap second smearing information could still be conveyed through the vmclock_abi. AFAIU, to cover the popular smearing variants, it should be enough to indicate whether the driver should apply linear or cosine smearing, and the start time and end time. > > drivers/ptp/Kconfig | 13 + > drivers/ptp/Makefile | 1 + > drivers/ptp/ptp_vmclock.c| 516 +++ > include/uapi/linux/vmclock.h | 138 ++ > 4 files changed, 668 insertions(+) > create mode 100644 drivers/ptp/ptp_vmclock.c > create mode 100644 include/uapi/linux/vmclock.h > [...] > + > +/* > + * Multiply a 64-bit count by a 64-bit tick 'period' in units of seconds >> > 64 > + * and add the fractional second part of the reference time. > + * > + * The result is a 128-bit value, the top 64 bits of which are seconds, and > + * the low 64 bits are (seconds >> 64). > + * > + * If __int128 isn't available, perform the calculation 32 bits at a time to > + * avoid overflow. > + */ > +static inline uint64_t mul_u64_u64_shr_add_u64(uint64_t *res_hi, uint64_t > delta, > +uint64_t period, uint8_t shift, > +uint64_t frac_sec) > +{ > + unsigned __int128 res = (unsigned __int128)delta * period; > + > + res >>= shift; > + res += frac_sec; > + *res_hi = res >> 64; > + return (uint64_t)res; > +} > + > +static int vmclock_get_crosststamp(struct vmclock_state *st, > +struct ptp_system_timestamp *sts, > +struct system_counterval_t *system_counter, > +struct timespec64 *tspec) > +{ > + ktime_t deadline = ktime_add(ktime_get(), VMCLOCK_MAX_WAIT); > + struct system_time_snapshot systime_snapshot; > + uint64_t cycle, delta, seq, frac_sec; > + > +#ifdef CONFIG_X86 > + /* > + * We'd expect the hypervisor to know this and to report the clock > + * status as VMCLOCK_STATUS_UNRELIABLE. But be paranoid. > + */ > + if (check_tsc_unstable()) > + return -EINVAL; > +#endif > + > + while (1) { > + seq = st->clk->seq_count & ~1ULL; > + virt_rmb(); > + > + if (st->clk->clock_status == VMCLOCK_STATUS_UNRELIABLE) > + return -EINVAL; > + > + /* > + * When invoked for gettimex64(), fill in the pre/post system > + * times. The simple case is when system time is based on the > + * same counter as st->cs_id, in which case all three times > + * will be derived from the *same* counter value. > + * > + * If the system isn't using the same counter, then the value > + * from ktime_get_snapshot() will still be used as pre_ts, and > + * ptp_read_system_postts() is called to populate postts after > + * calling get_cycles(). > + * > + * The conversion to timespec64 happens further down, outside > + * the seq_count loop. > + */ > + if (sts) { > + ktime_get_snapshot(_snapshot); > + if (systime_snapshot.cs_id == st->cs_id) { > + cycle = systime_snapshot.cycles; > + } else { > + cycle = get_cycles(); > + ptp_read_system_postts(sts); > + } > + } else > + cycle = get_cycles(); > + > + delta = cycle - st->clk->counter_value; AFAIU in the general case this needs to be masked for non 64-bit counters. > + > + frac_sec = mul_u64_u64_shr_add_u64(>tv_sec, delta, > + > st->clk->counter_period_frac_sec, > + > st->clk->counter_period_shift, > +st->clk->utc_time_frac_sec);
Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support
On Tue, Jun 25, 2024 at 11:34:24PM +0200, Thomas Gleixner wrote: > There is effort underway to expose PTP clocks to user space via > VDSO. That sounds interesting. Has anything been posted? Thanks, Richard
Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support
On Tue, 2024-06-25 at 15:22 -0700, John Stultz wrote: > On Tue, Jun 25, 2024 at 2:48 PM David Woodhouse wrote: > > On Tue, 2024-06-25 at 23:34 +0200, Thomas Gleixner wrote: > > > On Tue, Jun 25 2024 at 20:01, David Woodhouse wrote: > > > > From: David Woodhouse > > > > > > > > The vmclock "device" provides a shared memory region with precision > > > > clock > > > > information. By using shared memory, it is safe across Live Migration. > > > > > > > > Like the KVM PTP clock, this can convert TSC-based cross timestamps into > > > > KVM clock values. Unlike the KVM PTP clock, it does so only when such is > > > > actually helpful. > > > > > > > > The memory region of the device is also exposed to userspace so it can > > > > be > > > > read or memory mapped by application which need reliable notification of > > > > clock disruptions. > > > > > > There is effort underway to expose PTP clocks to user space via VDSO. > > > > Ooh, interesting. Got a reference to that please? > > > > > Can we please not expose an ad hoc interface for that? > > > > Absolutely. I'm explicitly trying to intercept the virtio-rtc > > specification here, to *avoid* having to do anything ad hoc. > > > > Note that this is a "vDSO-style" interface from hypervisor to guest via > > a shared memory region, not necessarily an actual vDSO. > > > > But yes, it *is* intended to be exposed to userspace, so that userspace > > can know the *accurate* time without a system call, and know that it > > hasn't been perturbed by live migration. > > Yea, I was going to raise a concern that just defining an mmaped > structure means it has to trust the guest logic is as expected. It's > good that it's versioned! :) Right. Although it's basically a pvclock, and we've had those for ages. The main difference here is that we add an indicator that tells the guest that it's been live migrated, so any additional NTP/PTP refinement that the *guest* has done of its oscillator, should now be discarded. It's also designed to be useful in "disruption-only" mode, where the pvclock information isn't actually populated, so *all* it does is tell guests that their clock is now hosed due to live migration. That part is why it needs to be mappable directly to userspace, so that userspace can not only get a timestamp but *also* know that it's actually valid. All without a system call. The critical use cases are financial systems where they incur massive fines if they submit mis-timestamped transactions, and distributed databases which rely on accurate timestamps (and error bounds) for eventual coherence. Live migration can screw those completely. I'm open to changing fairly much anything about the proposal as long as we can address those use cases (which the existing virtio-rtc and other KVM enlightenments do not). > I'd fret a bit about exposing this to userland. It feels very similar > to the old powerpc systemcfg implementation that similarly mapped just > kernel data out to userland and was difficult to maintain as changes > were made. Would including a code page like a proper vdso make sense > to make this more flexible of an UABI to maintain? I think the structure itself should be stable once we've bikeshedded it a bit. But there is certainly some potential for vDSO functions which help us expose it to the user... This structure exposes a 'disruption count' which is updated every time the TSC/counter is messed with by live migration. But what is userspace actually going to *compare* it with? It basically needs to compare it with the disruption count when the clock was last synchronized, so maybe the kernel could export *that* to vDSO too, then expose a simple vDSO function which reports whether the clock is valid? The 'invalid' code path could turn into an actual system call which makes the kernel (check for itself and) call ntp_clear() when the disruption occurs. Or maybe not just ntp_clear() but actually consume the pvclock rate information directly and apply the *new* calibration? That kind of thing would be great, and I've definitely tried to design the structure so that it *can* be made a first-class citizen within the kernel's timekeeping code and used like that. But I was going to start with a more modest proposal that it's "just a device", and applications which care about reliable time after LM would have to /dev/vmclock0 and mmap it and check for themselves. (Which would be assisted by things like the ClockBound library). smime.p7s Description: S/MIME cryptographic signature
Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support
On Tue, Jun 25, 2024 at 2:48 PM David Woodhouse wrote: > On Tue, 2024-06-25 at 23:34 +0200, Thomas Gleixner wrote: > > On Tue, Jun 25 2024 at 20:01, David Woodhouse wrote: > > > From: David Woodhouse > > > > > > The vmclock "device" provides a shared memory region with precision clock > > > information. By using shared memory, it is safe across Live Migration. > > > > > > Like the KVM PTP clock, this can convert TSC-based cross timestamps into > > > KVM clock values. Unlike the KVM PTP clock, it does so only when such is > > > actually helpful. > > > > > > The memory region of the device is also exposed to userspace so it can be > > > read or memory mapped by application which need reliable notification of > > > clock disruptions. > > > > There is effort underway to expose PTP clocks to user space via VDSO. > > Ooh, interesting. Got a reference to that please? > > > Can we please not expose an ad hoc interface for that? > > Absolutely. I'm explicitly trying to intercept the virtio-rtc > specification here, to *avoid* having to do anything ad hoc. > > Note that this is a "vDSO-style" interface from hypervisor to guest via > a shared memory region, not necessarily an actual vDSO. > > But yes, it *is* intended to be exposed to userspace, so that userspace > can know the *accurate* time without a system call, and know that it > hasn't been perturbed by live migration. Yea, I was going to raise a concern that just defining an mmaped structure means it has to trust the guest logic is as expected. It's good that it's versioned! :) I'd fret a bit about exposing this to userland. It feels very similar to the old powerpc systemcfg implementation that similarly mapped just kernel data out to userland and was difficult to maintain as changes were made. Would including a code page like a proper vdso make sense to make this more flexible of an UABI to maintain? thanks -john
Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support
On Tue, 2024-06-25 at 23:34 +0200, Thomas Gleixner wrote: > On Tue, Jun 25 2024 at 20:01, David Woodhouse wrote: > > From: David Woodhouse > > > > The vmclock "device" provides a shared memory region with precision clock > > information. By using shared memory, it is safe across Live Migration. > > > > Like the KVM PTP clock, this can convert TSC-based cross timestamps into > > KVM clock values. Unlike the KVM PTP clock, it does so only when such is > > actually helpful. > > > > The memory region of the device is also exposed to userspace so it can be > > read or memory mapped by application which need reliable notification of > > clock disruptions. > > There is effort underway to expose PTP clocks to user space via VDSO. Ooh, interesting. Got a reference to that please? > Can we please not expose an ad hoc interface for that? Absolutely. I'm explicitly trying to intercept the virtio-rtc specification here, to *avoid* having to do anything ad hoc. Note that this is a "vDSO-style" interface from hypervisor to guest via a shared memory region, not necessarily an actual vDSO. But yes, it *is* intended to be exposed to userspace, so that userspace can know the *accurate* time without a system call, and know that it hasn't been perturbed by live migration. > As you might have heard the sad news, I'm not feeling up to the task to > dig deeper into this right now. Give me a couple of days to look at this > with working brain. I have not heard any news, although now I'm making inferences. Wishing you the best! smime.p7s Description: S/MIME cryptographic signature
Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support
On Tue, Jun 25 2024 at 20:01, David Woodhouse wrote: > From: David Woodhouse > > The vmclock "device" provides a shared memory region with precision clock > information. By using shared memory, it is safe across Live Migration. > > Like the KVM PTP clock, this can convert TSC-based cross timestamps into > KVM clock values. Unlike the KVM PTP clock, it does so only when such is > actually helpful. > > The memory region of the device is also exposed to userspace so it can be > read or memory mapped by application which need reliable notification of > clock disruptions. There is effort underway to expose PTP clocks to user space via VDSO. Can we please not expose an ad hoc interface for that? As you might have heard the sad news, I'm not feeling up to the task to dig deeper into this right now. Give me a couple of days to look at this with working brain. Thanks, tglx