Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support

2024-07-06 Thread Peter Hilber
On 05.07.24 17:02, David Woodhouse wrote:
> On Fri, 2024-07-05 at 10:12 +0200, Peter Hilber wrote:
>> On 03.07.24 12:40, David Woodhouse wrote:

[...]

>>>  • Why is maxerror in picoseconds? It's the only use of that unit
> 
> Between us we now have picoseconds, nanoseconds, (seconds >> 64) and
> (seconds >> 64+n).
> 
> The power-of-two fractions seem to make a lot of sense for the counter
> period, because they mean we don't have to perform divisions.
> 
> Does it makes sense to harmonise on (seconds >> 64) for all of the
> fractional seconds? Again I don't have a strong opinion; I only want us
> to have a *reason* for any differences that exist.
> 

I don't have the expertise with fixed-point arithmetic to judge if this
would become unwieldy.

I selected ns for the virtio-rtc drafts so far because that didn't have any
impact on the precision with the Linux kernel driver message-based use
cases, but that would be different for SHM in my understanding.

So I would tend to retain ns for convenience for messages (where it doesn't
impact precision) but do not have any preference for SHM.

>>>  • Where do the clock_status values come from? Do they make sense?
>>>  • Are signed integers OK? (I think so!).
>>
>> Signed integers would need to be introduced to Virtio, which so far only
>> uses explicitly unsigned types: u8, le16 etc.
> 
> Perhaps. Although it would also be possible (if not ideal) to define
> that e.g. the tai_offset field is a 16-bit "unsigned" integer according
> to virtio, but to be interpreted as follows:
> 
> If the number is <= 32767 then the TAI offset is that value, but if the
> number is >= 32768 then the TAI offset is that value minus 65536.
> 
> Perhaps not pretty, but there isn't a *fundamental* dependency on
> virtio supporting signed integers as a primary type.
> 

Agreed.



Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support

2024-07-05 Thread David Woodhouse
On Fri, 2024-07-05 at 10:12 +0200, Peter Hilber wrote:
> On 03.07.24 12:40, David Woodhouse wrote:
> 
> [...]
> 
> > 
> > 
> > This is what I currently have for 'struct vmclock_abi' that I'd like to
> > persuade you to adopt. I need to tweak it some more, for at least the
> > following reasons, as well as any more you can see:
> > 
> >  • size isn't big enough for 64KiB pages
> >  • Should be explicitly little-endian
> >  • Does it need esterror as well as maxerror?
> 
> I have no opinion about this. I can drop esterror if unwanted.

I also don't care. I'm just observing the inconsistency.

> >  • Why is maxerror in picoseconds? It's the only use of that unit

Between us we now have picoseconds, nanoseconds, (seconds >> 64) and
(seconds >> 64+n).

The power-of-two fractions seem to make a lot of sense for the counter
period, because they mean we don't have to perform divisions.

Does it makes sense to harmonise on (seconds >> 64) for all of the
fractional seconds? Again I don't have a strong opinion; I only want us
to have a *reason* for any differences that exist.

> >  • Where do the clock_status values come from? Do they make sense?
> >  • Are signed integers OK? (I think so!).
> 
> Signed integers would need to be introduced to Virtio, which so far only
> uses explicitly unsigned types: u8, le16 etc.

Perhaps. Although it would also be possible (if not ideal) to define
that e.g. the tai_offset field is a 16-bit "unsigned" integer according
to virtio, but to be interpreted as follows:

If the number is <= 32767 then the TAI offset is that value, but if the
number is >= 32768 then the TAI offset is that value minus 65536.

Perhaps not pretty, but there isn't a *fundamental* dependency on
virtio supporting signed integers as a primary type.



smime.p7s
Description: S/MIME cryptographic signature


Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support

2024-07-05 Thread Peter Hilber
On 03.07.24 12:40, David Woodhouse wrote:

[...]

> 
> 
> This is what I currently have for 'struct vmclock_abi' that I'd like to
> persuade you to adopt. I need to tweak it some more, for at least the
> following reasons, as well as any more you can see:
> 
>  • size isn't big enough for 64KiB pages
>  • Should be explicitly little-endian
>  • Does it need esterror as well as maxerror?

I have no opinion about this. I can drop esterror if unwanted.

>  • Why is maxerror in picoseconds? It's the only use of that unit
>  • Where do the clock_status values come from? Do they make sense?
>  • Are signed integers OK? (I think so!).

Signed integers would need to be introduced to Virtio, which so far only
uses explicitly unsigned types: u8, le16 etc.



Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support

2024-07-03 Thread David Woodhouse
On Wed, 2024-07-03 at 11:56 +0200, Peter Hilber wrote:
> On 02.07.24 20:40, David Woodhouse wrote:
> > On 2 July 2024 19:12:00 BST, Peter Hilber  
> > wrote:
> > > On 02.07.24 18:39, David Woodhouse wrote:
> > > > To clarify then, the main types are
> > > > 
> > > >  VIRTIO_RTC_CLOCK_UTC == 0
> > > >  VIRTIO_RTC_CLOCK_TAI == 1
> > > >  VIRTIO_RTC_CLOCK_MONOTONIC == 2
> > > >  VIRTIO_RTC_CLOCK_SMEARED_UTC == 3
> > > > 
> > > > And the subtypes are *only* for the case of
> > > > VIRTIO_RTC_CLOCK_SMEARED_UTC. They include
> > > > 
> > > >  VIRTIO_RTC_SUBTYPE_STRICT
> > > >  VIRTIO_RTC_SUBTYPE_UNDEFINED /* or whatever you want to call it */
> > > >  VIRTIO_RTC_SUBTYPE_SMEAR_NOON_LINEAR 
> > > >  VIRTIO_RTC_SUBTYPE_UTC_SLS /* if it's worth doing this one */
> > > > 
> > > > Is that what we just agreed on?
> > > > 
> > > > 
> > > 
> > > This is a misunderstanding. My idea was that the main types are
> > > 
> > > >  VIRTIO_RTC_CLOCK_UTC == 0
> > > >  VIRTIO_RTC_CLOCK_TAI == 1
> > > >  VIRTIO_RTC_CLOCK_MONOTONIC == 2
> > > >  VIRTIO_RTC_CLOCK_SMEARED_UTC == 3
> > > 
> > > VIRTIO_RTC_CLOCK_MAYBE_SMEARED_UTC == 4
> > > 
> > > The subtypes would be (1st for clocks other than
> > > VIRTIO_RTC_CLOCK_SMEARED_UTC, 2nd to last for
> > > VIRTIO_RTC_CLOCK_SMEARED_UTC):
> > > 
> > > #define VIRTIO_RTC_SUBTYPE_STRICT 0
> > > #define VIRTIO_RTC_SUBTYPE_SMEAR_NOON_LINEAR 1
> > > #define VIRTIO_RTC_SUBTYPE_SMEAR_UTC_SLS 2
> > > 
> > 
> > Thanks. I really do think that from the guest point of view there's
> > really no distinction between "maybe smeared" and "undefined
> > smearing", and have a preference for using the latter form, which
> > is the key difference there?
> > 
> > Again though, not a hill for me to die on.
> 
> I have no issue with staying with "undefined smearing", so would you agree
> to something like
> 
> VIRTIO_RTC_CLOCK_SMEAR_UNDEFINED_UTC == 4
> 
> (or another name if you prefer)?

Well, the point of contention was really whether that was a *type* or a
*subtype*.

Either way, it's a "precision clock" telling its consumer that the
device *itself* doesn't really know what time is being exposed. Which
seems like a bizarre thing to support.

But I think I've constructed an argument which persuades me to your
point of view that *if* we permit it, it should be a primary type...

A clock can *either* be UTC, *or* it can be monotonic. The whole point
of smearing is to produce a monotonic clock, of course.

VIRTIO_RTC_CLOCK_UTC is UTC. It is not monotonic.

VIRTIO_RTC_CLOCK_SMEARED is, presumably, monotonic (and I think we
should explicitly require that to be true in virtio-rtc).


But VIRTIO_RTC_CLOCK_MAYBE_SMEARED is the worst of both worlds. It is
neither known to be correct UTC, *nor* is it known to be monotonic. So
(again, if we permit it at all) I think it probably does make sense for
that to be a primary type.


This is what I currently have for 'struct vmclock_abi' that I'd like to
persuade you to adopt. I need to tweak it some more, for at least the
following reasons, as well as any more you can see:

 • size isn't big enough for 64KiB pages
 • Should be explicitly little-endian
 • Does it need esterror as well as maxerror?
 • Why is maxerror in picoseconds? It's the only use of that unit
 • Where do the clock_status values come from? Do they make sense?
 • Are signed integers OK? (I think so!).

 
/*
 * This structure provides a vDSO-style clock to VM guests, exposing the
 * relationship (or lack thereof) between the CPU clock (TSC, timebase, arch
 * counter, etc.) and real time. It is designed to address the problem of
 * live migration, which other clock enlightenments do not.
 *
 * When a guest is live migrated, this affects the clock in two ways.
 *
 * First, even between identical hosts the actual frequency of the underlying
 * counter will change within the tolerances of its specification (typically
 * ±50PPM, or 4 seconds a day). This frequency also varies over time on the
 * same host, but can be tracked by NTP as it generally varies slowly. With
 * live migration there is a step change in the frequency, with no warning.
 *
 * Second, there may be a step change in the value of the counter itself, as
 * its accuracy is limited by the precision of the NTP synchronization on the
 * source and destination hosts.
 *
 * So any calibration (NTP, PTP, etc.) which the guest has done on the source
 * host before migration is invalid, and needs to be redone on the new host.
 *
 * In its most basic mode, this structure provides only an indication to the
 * guest that live migration has occurred. This allows the guest to know that
 * its clock is invalid and take remedial action. For applications that need
 * reliable accurate timestamps (e.g. distributed databases), the structure
 * can be mapped all the way to userspace. This allows the application to see
 * directly for itself that the clock is disrupted and take appropriate
 * action, even when using a vDSO-style method to get the time instead of a
 * 

Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support

2024-07-03 Thread Peter Hilber
On 02.07.24 20:40, David Woodhouse wrote:
> On 2 July 2024 19:12:00 BST, Peter Hilber  
> wrote:
>> On 02.07.24 18:39, David Woodhouse wrote:
>>> To clarify then, the main types are
>>>
>>>  VIRTIO_RTC_CLOCK_UTC == 0
>>>  VIRTIO_RTC_CLOCK_TAI == 1
>>>  VIRTIO_RTC_CLOCK_MONOTONIC == 2
>>>  VIRTIO_RTC_CLOCK_SMEARED_UTC == 3
>>>
>>> And the subtypes are *only* for the case of
>>> VIRTIO_RTC_CLOCK_SMEARED_UTC. They include
>>>
>>>  VIRTIO_RTC_SUBTYPE_STRICT
>>>  VIRTIO_RTC_SUBTYPE_UNDEFINED /* or whatever you want to call it */
>>>  VIRTIO_RTC_SUBTYPE_SMEAR_NOON_LINEAR 
>>>  VIRTIO_RTC_SUBTYPE_UTC_SLS /* if it's worth doing this one */
>>>
>>> Is that what we just agreed on?
>>>
>>>
>>
>> This is a misunderstanding. My idea was that the main types are
>>
>>>  VIRTIO_RTC_CLOCK_UTC == 0
>>>  VIRTIO_RTC_CLOCK_TAI == 1
>>>  VIRTIO_RTC_CLOCK_MONOTONIC == 2
>>>  VIRTIO_RTC_CLOCK_SMEARED_UTC == 3
>>
>> VIRTIO_RTC_CLOCK_MAYBE_SMEARED_UTC == 4
>>
>> The subtypes would be (1st for clocks other than
>> VIRTIO_RTC_CLOCK_SMEARED_UTC, 2nd to last for
>> VIRTIO_RTC_CLOCK_SMEARED_UTC):
>>
>> #define VIRTIO_RTC_SUBTYPE_STRICT 0
>> #define VIRTIO_RTC_SUBTYPE_SMEAR_NOON_LINEAR 1
>> #define VIRTIO_RTC_SUBTYPE_SMEAR_UTC_SLS 2
>>
> 
> Thanks. I really do think that from the guest point of view there's really no 
> distinction between "maybe smeared" and "undefined smearing", and have a 
> preference for using the latter form, which is the key difference there?
> 
> Again though, not a hill for me to die on.

I have no issue with staying with "undefined smearing", so would you agree
to something like

VIRTIO_RTC_CLOCK_SMEAR_UNDEFINED_UTC == 4

(or another name if you prefer)?



Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support

2024-07-02 Thread David Woodhouse
On 2 July 2024 19:12:00 BST, Peter Hilber  wrote:
>On 02.07.24 18:39, David Woodhouse wrote:
>> To clarify then, the main types are
>> 
>>  VIRTIO_RTC_CLOCK_UTC == 0
>>  VIRTIO_RTC_CLOCK_TAI == 1
>>  VIRTIO_RTC_CLOCK_MONOTONIC == 2
>>  VIRTIO_RTC_CLOCK_SMEARED_UTC == 3
>> 
>> And the subtypes are *only* for the case of
>> VIRTIO_RTC_CLOCK_SMEARED_UTC. They include
>> 
>>  VIRTIO_RTC_SUBTYPE_STRICT
>>  VIRTIO_RTC_SUBTYPE_UNDEFINED /* or whatever you want to call it */
>>  VIRTIO_RTC_SUBTYPE_SMEAR_NOON_LINEAR 
>>  VIRTIO_RTC_SUBTYPE_UTC_SLS /* if it's worth doing this one */
>> 
>> Is that what we just agreed on?
>> 
>> 
>
>This is a misunderstanding. My idea was that the main types are
>
>>  VIRTIO_RTC_CLOCK_UTC == 0
>>  VIRTIO_RTC_CLOCK_TAI == 1
>>  VIRTIO_RTC_CLOCK_MONOTONIC == 2
>>  VIRTIO_RTC_CLOCK_SMEARED_UTC == 3
>
>VIRTIO_RTC_CLOCK_MAYBE_SMEARED_UTC == 4
>
>The subtypes would be (1st for clocks other than
>VIRTIO_RTC_CLOCK_SMEARED_UTC, 2nd to last for
>VIRTIO_RTC_CLOCK_SMEARED_UTC):
>
>#define VIRTIO_RTC_SUBTYPE_STRICT 0
>#define VIRTIO_RTC_SUBTYPE_SMEAR_NOON_LINEAR 1
>#define VIRTIO_RTC_SUBTYPE_SMEAR_UTC_SLS 2
>

Thanks. I really do think that from the guest point of view there's really no 
distinction between "maybe smeared" and "undefined smearing", and have a 
preference for using the latter form, which is the key difference there?

Again though, not a hill for me to die on.



Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support

2024-07-02 Thread Peter Hilber
On 02.07.24 18:39, David Woodhouse wrote:
> On Tue, 2024-07-02 at 17:03 +0200, Peter Hilber wrote:
>>> On 01.07.24 10:57, David Woodhouse wrote:
> If my proposed memory structure is subsumed into the virtio-rtc
> proposal we'd literally use the same names, but for the time being I'll
> update mine to:
>>>
>>> Do you intend vmclock and virtio-rtc to be ABI compatible?
> 
> I intend you to incorporate a shared memory structure like the vmclock
> one into the virtio-rtc standard :)
> 
> Because precision clocks in a VM are fundamentally nonsensical without
> a way for the guest to know when live migration has screwed them up.
> 
> So yes, so facilitate that, I'm trying to align things so that the
> numeric values of fields like the time_type and smearing hint are
> *precisely* the same as the VIRTIO_RTC_xxx values.
> 
> Time pressure *may* mean I have to ship an ACPI-based device with a
> preliminary version of the structure before I've finished persuading
> you, and before we've completely finalized the structure. I am hoping
> to avoid that.
> 
> (In fact, my time pressure only applies to a version of the structure
> which carries the disruption_marker field; the actual clock calibration
> information doesn't have to be present in the interim implementation.)
> 
> 
>>>  FYI, I see a
>>> potential problem in that Virtio does avoid the use of signed integers so
>>> far. I did not check carefully if there might be other problems, yet.
> 
> Hm, you use an unsigned integer to convey the tai_offset. I suppose at
> +37 and with a plan to stop doing leap seconds in the next decade,
> we're unlikely to get back below zero?
> 

I think so.

> The other signed integer I had was for the leap second direction, but I
> think I'm happy to drop that and just adopt your uint8_t leap field
> with VIRTIO_RTC_LEAP_{PRE_POS,PRE_NEG,etc.}.
> 
> 
> 
> 
> 
>
> /*
>  * What time is exposed in the time_sec/time_frac_sec fields?
>  */
> uint8_t time_type;
> #define VMCLOCK_TIME_UTC0   /* Since 1970-01-01 
> 00:00:00z */
> #define VMCLOCK_TIME_TAI1   /* Since 1970-01-01 
> 00:00:00z */
> #define VMCLOCK_TIME_MONOTONIC  2   /* Since undefined epoch 
> */
> #define VMCLOCK_TIME_INVALID3   /* virtio-rtc uses this 
> for smeared UTC */
>
>
> I can then use your smearing subtype values as the 'hint' field in the
> shared memory structure. You currently have:
>
> +\begin{lstlisting}
> +#define VIRTIO_RTC_SUBTYPE_STRICT 0
> +#define VIRTIO_RTC_SUBTYPE_SMEAR 1
> +#define VIRTIO_RTC_SUBTYPE_SMEAR_NOON_LINEAR 2
> +#define VIRTIO_RTC_SUBTYPE_LEAP_UNSPECIFIED 3
> +\end{lstlisting}
>
>>>
>>> I agree with the above part of your proposal.
>>>
> I can certainly ensure that 'noon linear' has the same value. I don't
> think you need both 'SMEAR' and 'LEAP_UNSPECIFIED' though:
>
>
> +\item VIRTIO_RTC_SUBTYPE_SMEAR deviates from the UTC standard by
> +   smearing time in the vicinity of the leap second, in a not
> +   precisely defined manner. This avoids clock steps due to UTC
> +   leap seconds.
>
> ...
>
> +\item VIRTIO_RTC_SUBTYPE_LEAP_UNSPECIFIED may deviate from the UTC
> +   standard w.r.t.\ leap second introduction in an unspecified
> way
> +   (leap seconds may, or may not, be smeared).
>
> To the client, both of those just mean "for a day or so around a leap
> second event, you can't trust this device to know what the time is".
> There isn't any point in separating "does lie to you" from "might lie
> to you", surely? The guest can't do anything useful with that
> distinction. Let's drop SMEAR and keep only LEAP_UNSPECIFIED?
>>>
>>> As for VIRTIO_RTC_SUBTYPE_SMEAR, I think this could be dropped indeed
>>> (resp., UTC_SLS may be added).
>>>
>>> But VIRTIO_RTC_CLOCK_SMEARED_UTC is an assurance that there will be no
>>> steps (in particular, steps backwards, which some clients might not like)
>>> due to leap seconds, while LEAP_UNSPECIFIED provides no such guarantee.
>>>
>>> So I think this might be better handled by adding, alongside
>>>
> #define VIRTIO_RTC_CLOCK_SMEARED_UTC 3
>>>
>>> #define VIRTIO_RTC_CLOCK_LEAP_UNSPECIFIED_UTC 4
>>>
>>> (or any better name, like VIRTIO_RTC_CLOCK_MAYBE_SMEARED_UTC).
>>>
>
> And if you *really* want to parameterise it, I think that's a bad idea
> and it encourages the proliferation of different time "standards", but
> I'd probably just suck it up and do whatever you do because that's not
> strictly within the remit of my live-migration part.
>>>
>>> I think the above proposal to have subtypes for
>>> VIRTIO_RTC_CLOCK_SMEARED_UTC should work.
> 
> To clarify then, the main types are
> 
>  VIRTIO_RTC_CLOCK_UTC == 0
>  VIRTIO_RTC_CLOCK_TAI == 1
>  VIRTIO_RTC_CLOCK_MONOTONIC == 2
> 

Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support

2024-07-02 Thread David Woodhouse
On Tue, 2024-07-02 at 17:03 +0200, Peter Hilber wrote:
> > On 01.07.24 10:57, David Woodhouse wrote:
> > > > If my proposed memory structure is subsumed into the virtio-rtc
> > > > proposal we'd literally use the same names, but for the time being I'll
> > > > update mine to:
> > 
> > Do you intend vmclock and virtio-rtc to be ABI compatible?

I intend you to incorporate a shared memory structure like the vmclock
one into the virtio-rtc standard :)

Because precision clocks in a VM are fundamentally nonsensical without
a way for the guest to know when live migration has screwed them up.

So yes, so facilitate that, I'm trying to align things so that the
numeric values of fields like the time_type and smearing hint are
*precisely* the same as the VIRTIO_RTC_xxx values.

Time pressure *may* mean I have to ship an ACPI-based device with a
preliminary version of the structure before I've finished persuading
you, and before we've completely finalized the structure. I am hoping
to avoid that.

(In fact, my time pressure only applies to a version of the structure
which carries the disruption_marker field; the actual clock calibration
information doesn't have to be present in the interim implementation.)


> >  FYI, I see a
> > potential problem in that Virtio does avoid the use of signed integers so
> > far. I did not check carefully if there might be other problems, yet.

Hm, you use an unsigned integer to convey the tai_offset. I suppose at
+37 and with a plan to stop doing leap seconds in the next decade,
we're unlikely to get back below zero?

The other signed integer I had was for the leap second direction, but I
think I'm happy to drop that and just adopt your uint8_t leap field
with VIRTIO_RTC_LEAP_{PRE_POS,PRE_NEG,etc.}.





> > > > 
> > > > /*
> > > >  * What time is exposed in the time_sec/time_frac_sec fields?
> > > >  */
> > > > uint8_t time_type;
> > > > #define VMCLOCK_TIME_UTC0   /* Since 1970-01-01 
> > > > 00:00:00z */
> > > > #define VMCLOCK_TIME_TAI1   /* Since 1970-01-01 
> > > > 00:00:00z */
> > > > #define VMCLOCK_TIME_MONOTONIC  2   /* Since undefined 
> > > > epoch */
> > > > #define VMCLOCK_TIME_INVALID3   /* virtio-rtc uses this 
> > > > for smeared UTC */
> > > > 
> > > > 
> > > > I can then use your smearing subtype values as the 'hint' field in the
> > > > shared memory structure. You currently have:
> > > > 
> > > > +\begin{lstlisting}
> > > > +#define VIRTIO_RTC_SUBTYPE_STRICT 0
> > > > +#define VIRTIO_RTC_SUBTYPE_SMEAR 1
> > > > +#define VIRTIO_RTC_SUBTYPE_SMEAR_NOON_LINEAR 2
> > > > +#define VIRTIO_RTC_SUBTYPE_LEAP_UNSPECIFIED 3
> > > > +\end{lstlisting}
> > > > 
> > 
> > I agree with the above part of your proposal.
> > 
> > > > I can certainly ensure that 'noon linear' has the same value. I don't
> > > > think you need both 'SMEAR' and 'LEAP_UNSPECIFIED' though:
> > > > 
> > > > 
> > > > +\item VIRTIO_RTC_SUBTYPE_SMEAR deviates from the UTC standard by
> > > > +   smearing time in the vicinity of the leap second, in a not
> > > > +   precisely defined manner. This avoids clock steps due to UTC
> > > > +   leap seconds.
> > > > 
> > > > ...
> > > > 
> > > > +\item VIRTIO_RTC_SUBTYPE_LEAP_UNSPECIFIED may deviate from the UTC
> > > > +   standard w.r.t.\ leap second introduction in an unspecified
> > > > way
> > > > +   (leap seconds may, or may not, be smeared).
> > > > 
> > > > To the client, both of those just mean "for a day or so around a leap
> > > > second event, you can't trust this device to know what the time is".
> > > > There isn't any point in separating "does lie to you" from "might lie
> > > > to you", surely? The guest can't do anything useful with that
> > > > distinction. Let's drop SMEAR and keep only LEAP_UNSPECIFIED?
> > 
> > As for VIRTIO_RTC_SUBTYPE_SMEAR, I think this could be dropped indeed
> > (resp., UTC_SLS may be added).
> > 
> > But VIRTIO_RTC_CLOCK_SMEARED_UTC is an assurance that there will be no
> > steps (in particular, steps backwards, which some clients might not like)
> > due to leap seconds, while LEAP_UNSPECIFIED provides no such guarantee.
> > 
> > So I think this might be better handled by adding, alongside
> > 
> > > > #define VIRTIO_RTC_CLOCK_SMEARED_UTC 3
> > 
> > #define VIRTIO_RTC_CLOCK_LEAP_UNSPECIFIED_UTC 4
> > 
> > (or any better name, like VIRTIO_RTC_CLOCK_MAYBE_SMEARED_UTC).
> > 
> > > > 
> > > > And if you *really* want to parameterise it, I think that's a bad idea
> > > > and it encourages the proliferation of different time "standards", but
> > > > I'd probably just suck it up and do whatever you do because that's not
> > > > strictly within the remit of my live-migration part.
> > 
> > I think the above proposal to have subtypes for
> > VIRTIO_RTC_CLOCK_SMEARED_UTC should work.

To clarify then, the main types are

 VIRTIO_RTC_CLOCK_UTC == 0
 VIRTIO_RTC_CLOCK_TAI == 1
 VIRTIO_RTC_CLOCK_MONOTONIC == 2
 

Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support

2024-07-02 Thread Peter Hilber
On 01.07.24 10:57, David Woodhouse wrote:
> On Fri, 2024-06-28 at 22:27 +0100, David Woodhouse wrote:
>> On 28 June 2024 17:38:15 BST, Peter Hilber  
>> wrote:
>>> On 28.06.24 14:15, David Woodhouse wrote:
 On Fri, 2024-06-28 at 13:33 +0200, Peter Hilber wrote:
> On 27.06.24 16:52, David Woodhouse wrote:
>> I already added a flags field, so this might look something like:
>>
>>     /*
>>  * Smearing flags. The UTC clock exposed through this structure
>>  * is only ever true UTC, but a guest operating system may
>>  * choose to offer a monotonic smeared clock to its users. This
>>  * merely offers a hint about what kind of smearing to perform,
>>  * for consistency with systems in the nearby environment.
>>  */
>> #define VMCLOCK_FLAGS_SMEAR_UTC_SLS (1<<5) /* 
>> draft-kuhn-leapsecond-00.txt */
>>
>> (UTC-SLS is probably a bad example but are there formal definitions for
>> anything else?)
>
> I think it could also be more generic, like flags for linear smearing,
> cosine smearing(?), and smear_start_sec and smear_end_sec fields (relative
> to the leap second start). That could also represent UTC-SLS, and
> noon-to-noon, and it would be well-defined.
>
> This should reduce the likelihood that the guest doesn't know the smearing
> variant.

 I'm wary of making it too generic. That would seem to encourage a
 *proliferation* of false "UTC-like" clocks.

 It's bad enough that we do smearing at all, let alone that we don't
 have a single definition of how to do it.

 I made the smearing hint a full uint8_t instead of using bits in flags,
 in the end. That gives us a full 255 ways of lying to users about what
 the time is, so we're unlikely to run out. And it's easy enough to add
 a new VMCLOCK_SMEARING_XXX type to the 'registry' for any new methods
 that get invented.


>>>
>>> My concern is that the registry update may come after a driver has already
>>> been implemented, so that it may be hard to ensure that the smearing which
>>> has been chosen is actually implemented.
>>
>> Well yes, but why in the name of all that is holy would anyone want
>> to invent *new* ways to lie to users about the time? If we capture
>> the existing ones as we write this, surely it's a good thing that
>> there's a barrier to entry for adding more?
> 
> Ultimately though, this isn't the hill for me to die on. I'm pushing on
> that topic because I want to avoid the proliferation of *ambiguity*. If
> we have a precision clock, we should *know* what the time is.
> 
> So how about this proposal. I line up the fields in the proposed shared
> memory structure to match your virtio-rtc proposal, using 'subtype' as
> you proposed. But, instead of the 'subtype' being valid only for
> VIRTIO_RTC_CLOCK_UTC, we define a new top-level type for *smeared* UTC.
> 
> So, you have:
> 
> +\begin{lstlisting}
> +#define VIRTIO_RTC_CLOCK_UTC 0
> +#define VIRTIO_RTC_CLOCK_TAI 1
> +#define VIRTIO_RTC_CLOCK_MONO 2
> +\end{lstlisting}
> 
> I propose that you add
> 
> #define VIRTIO_RTC_CLOCK_SMEARED_UTC 3
> 
> If my proposed memory structure is subsumed into the virtio-rtc
> proposal we'd literally use the same names, but for the time being I'll
> update mine to:

Do you intend vmclock and virtio-rtc to be ABI compatible? FYI, I see a
potential problem in that Virtio does avoid the use of signed integers so
far. I did not check carefully if there might be other problems, yet.

> 
>   /*
>* What time is exposed in the time_sec/time_frac_sec fields?
>*/
>   uint8_t time_type;
> #define VMCLOCK_TIME_UTC  0   /* Since 1970-01-01 00:00:00z */
> #define VMCLOCK_TIME_TAI  1   /* Since 1970-01-01 00:00:00z */
> #define VMCLOCK_TIME_MONOTONIC2   /* Since undefined 
> epoch */
> #define VMCLOCK_TIME_INVALID  3   /* virtio-rtc uses this for 
> smeared UTC */
> 
> 
> I can then use your smearing subtype values as the 'hint' field in the
> shared memory structure. You currently have:
> 
> +\begin{lstlisting}
> +#define VIRTIO_RTC_SUBTYPE_STRICT 0
> +#define VIRTIO_RTC_SUBTYPE_SMEAR 1
> +#define VIRTIO_RTC_SUBTYPE_SMEAR_NOON_LINEAR 2
> +#define VIRTIO_RTC_SUBTYPE_LEAP_UNSPECIFIED 3
> +\end{lstlisting}
> 

I agree with the above part of your proposal.

> I can certainly ensure that 'noon linear' has the same value. I don't
> think you need both 'SMEAR' and 'LEAP_UNSPECIFIED' though:
> 
> 
> +\item VIRTIO_RTC_SUBTYPE_SMEAR deviates from the UTC standard by
> + smearing time in the vicinity of the leap second, in a not
> + precisely defined manner. This avoids clock steps due to UTC
> + leap seconds.
> 
> ...
> 
> +\item VIRTIO_RTC_SUBTYPE_LEAP_UNSPECIFIED may deviate from the UTC
> + standard w.r.t.\ leap second introduction in an unspecified
> way
> + (leap seconds 

Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support

2024-07-01 Thread David Woodhouse
On Fri, 2024-06-28 at 22:27 +0100, David Woodhouse wrote:
> On 28 June 2024 17:38:15 BST, Peter Hilber  
> wrote:
> > On 28.06.24 14:15, David Woodhouse wrote:
> > > On Fri, 2024-06-28 at 13:33 +0200, Peter Hilber wrote:
> > > > On 27.06.24 16:52, David Woodhouse wrote:
> > > > > I already added a flags field, so this might look something like:
> > > > > 
> > > > >     /*
> > > > >  * Smearing flags. The UTC clock exposed through this 
> > > > > structure
> > > > >  * is only ever true UTC, but a guest operating system may
> > > > >  * choose to offer a monotonic smeared clock to its users. 
> > > > > This
> > > > >  * merely offers a hint about what kind of smearing to 
> > > > > perform,
> > > > >  * for consistency with systems in the nearby environment.
> > > > >  */
> > > > > #define VMCLOCK_FLAGS_SMEAR_UTC_SLS (1<<5) /* 
> > > > > draft-kuhn-leapsecond-00.txt */
> > > > > 
> > > > > (UTC-SLS is probably a bad example but are there formal definitions 
> > > > > for
> > > > > anything else?)
> > > > 
> > > > I think it could also be more generic, like flags for linear smearing,
> > > > cosine smearing(?), and smear_start_sec and smear_end_sec fields 
> > > > (relative
> > > > to the leap second start). That could also represent UTC-SLS, and
> > > > noon-to-noon, and it would be well-defined.
> > > > 
> > > > This should reduce the likelihood that the guest doesn't know the 
> > > > smearing
> > > > variant.
> > > 
> > > I'm wary of making it too generic. That would seem to encourage a
> > > *proliferation* of false "UTC-like" clocks.
> > > 
> > > It's bad enough that we do smearing at all, let alone that we don't
> > > have a single definition of how to do it.
> > > 
> > > I made the smearing hint a full uint8_t instead of using bits in flags,
> > > in the end. That gives us a full 255 ways of lying to users about what
> > > the time is, so we're unlikely to run out. And it's easy enough to add
> > > a new VMCLOCK_SMEARING_XXX type to the 'registry' for any new methods
> > > that get invented.
> > > 
> > > 
> > 
> > My concern is that the registry update may come after a driver has already
> > been implemented, so that it may be hard to ensure that the smearing which
> > has been chosen is actually implemented.
> 
> Well yes, but why in the name of all that is holy would anyone want
> to invent *new* ways to lie to users about the time? If we capture
> the existing ones as we write this, surely it's a good thing that
> there's a barrier to entry for adding more?

Ultimately though, this isn't the hill for me to die on. I'm pushing on
that topic because I want to avoid the proliferation of *ambiguity*. If
we have a precision clock, we should *know* what the time is.

So how about this proposal. I line up the fields in the proposed shared
memory structure to match your virtio-rtc proposal, using 'subtype' as
you proposed. But, instead of the 'subtype' being valid only for
VIRTIO_RTC_CLOCK_UTC, we define a new top-level type for *smeared* UTC.

So, you have:

+\begin{lstlisting}
+#define VIRTIO_RTC_CLOCK_UTC 0
+#define VIRTIO_RTC_CLOCK_TAI 1
+#define VIRTIO_RTC_CLOCK_MONO 2
+\end{lstlisting}

I propose that you add

#define VIRTIO_RTC_CLOCK_SMEARED_UTC 3

If my proposed memory structure is subsumed into the virtio-rtc
proposal we'd literally use the same names, but for the time being I'll
update mine to:

/*
 * What time is exposed in the time_sec/time_frac_sec fields?
 */
uint8_t time_type;
#define VMCLOCK_TIME_UTC0   /* Since 1970-01-01 00:00:00z */
#define VMCLOCK_TIME_TAI1   /* Since 1970-01-01 00:00:00z */
#define VMCLOCK_TIME_MONOTONIC  2   /* Since undefined epoch */
#define VMCLOCK_TIME_INVALID3   /* virtio-rtc uses this for 
smeared UTC */


I can then use your smearing subtype values as the 'hint' field in the
shared memory structure. You currently have:

+\begin{lstlisting}
+#define VIRTIO_RTC_SUBTYPE_STRICT 0
+#define VIRTIO_RTC_SUBTYPE_SMEAR 1
+#define VIRTIO_RTC_SUBTYPE_SMEAR_NOON_LINEAR 2
+#define VIRTIO_RTC_SUBTYPE_LEAP_UNSPECIFIED 3
+\end{lstlisting}

I can certainly ensure that 'noon linear' has the same value. I don't
think you need both 'SMEAR' and 'LEAP_UNSPECIFIED' though:


+\item VIRTIO_RTC_SUBTYPE_SMEAR deviates from the UTC standard by
+   smearing time in the vicinity of the leap second, in a not
+   precisely defined manner. This avoids clock steps due to UTC
+   leap seconds.

...

+\item VIRTIO_RTC_SUBTYPE_LEAP_UNSPECIFIED may deviate from the UTC
+   standard w.r.t.\ leap second introduction in an unspecified
way
+   (leap seconds may, or may not, be smeared).

To the client, both of those just mean "for a day or so around a leap
second event, you can't trust this device to know what the time is".
There isn't any point in separating "does lie to you" from "might lie
to you", surely? The guest can't do 

Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support

2024-06-28 Thread David Woodhouse
On 28 June 2024 17:38:15 BST, Peter Hilber  wrote:
>On 28.06.24 14:15, David Woodhouse wrote:
>> On Fri, 2024-06-28 at 13:33 +0200, Peter Hilber wrote:
>>> On 27.06.24 16:52, David Woodhouse wrote:
 I already added a flags field, so this might look something like:

     /*
  * Smearing flags. The UTC clock exposed through this structure
  * is only ever true UTC, but a guest operating system may
  * choose to offer a monotonic smeared clock to its users. This
  * merely offers a hint about what kind of smearing to perform,
  * for consistency with systems in the nearby environment.
  */
 #define VMCLOCK_FLAGS_SMEAR_UTC_SLS (1<<5) /* draft-kuhn-leapsecond-00.txt 
 */

 (UTC-SLS is probably a bad example but are there formal definitions for
 anything else?)
>>>
>>> I think it could also be more generic, like flags for linear smearing,
>>> cosine smearing(?), and smear_start_sec and smear_end_sec fields (relative
>>> to the leap second start). That could also represent UTC-SLS, and
>>> noon-to-noon, and it would be well-defined.
>>>
>>> This should reduce the likelihood that the guest doesn't know the smearing
>>> variant.
>> 
>> I'm wary of making it too generic. That would seem to encourage a
>> *proliferation* of false "UTC-like" clocks.
>> 
>> It's bad enough that we do smearing at all, let alone that we don't
>> have a single definition of how to do it.
>> 
>> I made the smearing hint a full uint8_t instead of using bits in flags,
>> in the end. That gives us a full 255 ways of lying to users about what
>> the time is, so we're unlikely to run out. And it's easy enough to add
>> a new VMCLOCK_SMEARING_XXX type to the 'registry' for any new methods
>> that get invented.
>> 
>> 
>
>My concern is that the registry update may come after a driver has already
>been implemented, so that it may be hard to ensure that the smearing which
>has been chosen is actually implemented.

Well yes, but why in the name of all that is holy would anyone want to invent 
*new* ways to lie to users about the time? If we capture the existing ones as 
we write this, surely it's a good thing that there's a barrier to entry for 
adding more?


>But the error bounds could be large or missing. I am trying to address use
>cases where the host steps or slews the clock as well.

The host is absolutely intended to be skewing the clock to keep it accurate as 
the frequency of the underlying oscillator changes, and the seq_count field 
will change every time the host does so.

Do we need to handle steps differently? Or just let the guest deal with it?

If an NTP server suddenly steps the time it reports, what does the client do?




Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support

2024-06-28 Thread Peter Hilber
On 28.06.24 14:15, David Woodhouse wrote:
> On Fri, 2024-06-28 at 13:33 +0200, Peter Hilber wrote:
>> On 27.06.24 16:52, David Woodhouse wrote:
>>> I already added a flags field, so this might look something like:
>>>
>>>     /*
>>>  * Smearing flags. The UTC clock exposed through this structure
>>>  * is only ever true UTC, but a guest operating system may
>>>  * choose to offer a monotonic smeared clock to its users. This
>>>  * merely offers a hint about what kind of smearing to perform,
>>>  * for consistency with systems in the nearby environment.
>>>  */
>>> #define VMCLOCK_FLAGS_SMEAR_UTC_SLS (1<<5) /* draft-kuhn-leapsecond-00.txt 
>>> */
>>>
>>> (UTC-SLS is probably a bad example but are there formal definitions for
>>> anything else?)
>>
>> I think it could also be more generic, like flags for linear smearing,
>> cosine smearing(?), and smear_start_sec and smear_end_sec fields (relative
>> to the leap second start). That could also represent UTC-SLS, and
>> noon-to-noon, and it would be well-defined.
>>
>> This should reduce the likelihood that the guest doesn't know the smearing
>> variant.
> 
> I'm wary of making it too generic. That would seem to encourage a
> *proliferation* of false "UTC-like" clocks.
> 
> It's bad enough that we do smearing at all, let alone that we don't
> have a single definition of how to do it.
> 
> I made the smearing hint a full uint8_t instead of using bits in flags,
> in the end. That gives us a full 255 ways of lying to users about what
> the time is, so we're unlikely to run out. And it's easy enough to add
> a new VMCLOCK_SMEARING_XXX type to the 'registry' for any new methods
> that get invented.
> 
> 

My concern is that the registry update may come after a driver has already
been implemented, so that it may be hard to ensure that the smearing which
has been chosen is actually implemented.

> +   /*
> +    * This field changes to another non-repeating value when the CPU
> +    * counter is disrupted, for example on live migration.
> +    */
> +   uint64_t disruption_marker;

 The field could also change when the clock is stepped (leap seconds
 excepted), or when the clock frequency is slewed.
>>>
>>> I'm not sure. The concept of the disruption marker is that it tells the
>>> guest to throw away any calibration of the counter that the guest has
>>> done for *itself* (with NTP, other PTP devices, etc.).
>>>
>>> One mode for this device would be not to populate the clock fields at
>>> all, but *only* to signal disruption when it occurs. So the guest can
>>> abort transactions until it's resynced its clocks (to avoid incurring
>>> fines if breaking databases, etc.).
>>>
>>> Exposing the host timekeeping through the structure means that the
>>> migrated guest can keep working because it can trust the timekeeping
>>> performed by the (new) host and exposed to it.
>>>
>>> If the counter is actually varying in frequency over time, and the host
>>> is slewing the clock frequency that it reports, that *isn't* a step
>>> change and doesn't mean that the guest should throw away any
>>> calibration that it's been doing for itself. One hopes that the guest
>>> would have detected the *same* frequency change, and be adapting for
>>> itself. So I don't think that should indicate a disruption.
>>>
>>> I think the same is even true if the clock is stepped by the host. The
>>> actual *counter* hasn't changed, so the guest is better off ignoring
>>> the vacillating host and continuing to derive its idea of time from the
>>> hardware counter itself, as calibrated against some external NTP/PTP
>>> sources. Surely we actively *don't* to tell the guest to throw its own
>>> calibrations away, in this case?
>>
>> In case the guest is also considering other time sources, it might indeed
>> not be a good idea to mix host clock changes into the hardware counter
>> disruption marker.
>>
>> But if the vmclock is the authoritative source of time, it can still be
>> helpful to know about such changes, maybe through another marker.
> 
> Could that be the existing seq_count field?
> 
> Skewing the counter_period_frac_sec as the underlying oscillator speeds
> up and slows down is perfectly normal and expected, and we already
> expect the seq_count to change when that happens.
> 
> Maybe step changes are different, but arguably if the time advertised
> by the host steps *outside* the error bounds previously advertised,
> that's just broken?

But the error bounds could be large or missing. I am trying to address use
cases where the host steps or slews the clock as well.

> 
> Depending on how the clock information is fed, a change in seq_count
> may even result in non-monotonicity. If the underlying oscillator has
> sped up and the structure is updated accordingly, the time calculated
> the moment *before* that update may appear later than the time
> calculated immediately after it.
> 
> It's up 

Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support

2024-06-28 Thread David Woodhouse
On Fri, 2024-06-28 at 13:33 +0200, Peter Hilber wrote:
> On 27.06.24 16:52, David Woodhouse wrote:
> > I already added a flags field, so this might look something like:
> > 
> >     /*
> >  * Smearing flags. The UTC clock exposed through this structure
> >  * is only ever true UTC, but a guest operating system may
> >  * choose to offer a monotonic smeared clock to its users. This
> >  * merely offers a hint about what kind of smearing to perform,
> >  * for consistency with systems in the nearby environment.
> >  */
> > #define VMCLOCK_FLAGS_SMEAR_UTC_SLS (1<<5) /* draft-kuhn-leapsecond-00.txt 
> > */
> > 
> > (UTC-SLS is probably a bad example but are there formal definitions for
> > anything else?)
> 
> I think it could also be more generic, like flags for linear smearing,
> cosine smearing(?), and smear_start_sec and smear_end_sec fields (relative
> to the leap second start). That could also represent UTC-SLS, and
> noon-to-noon, and it would be well-defined.
> 
> This should reduce the likelihood that the guest doesn't know the smearing
> variant.

I'm wary of making it too generic. That would seem to encourage a
*proliferation* of false "UTC-like" clocks.

It's bad enough that we do smearing at all, let alone that we don't
have a single definition of how to do it.

I made the smearing hint a full uint8_t instead of using bits in flags,
in the end. That gives us a full 255 ways of lying to users about what
the time is, so we're unlikely to run out. And it's easy enough to add
a new VMCLOCK_SMEARING_XXX type to the 'registry' for any new methods
that get invented.


> > > > +   /*
> > > > +    * This field changes to another non-repeating value when the 
> > > > CPU
> > > > +    * counter is disrupted, for example on live migration.
> > > > +    */
> > > > +   uint64_t disruption_marker;
> > > 
> > > The field could also change when the clock is stepped (leap seconds
> > > excepted), or when the clock frequency is slewed.
> > 
> > I'm not sure. The concept of the disruption marker is that it tells the
> > guest to throw away any calibration of the counter that the guest has
> > done for *itself* (with NTP, other PTP devices, etc.).
> > 
> > One mode for this device would be not to populate the clock fields at
> > all, but *only* to signal disruption when it occurs. So the guest can
> > abort transactions until it's resynced its clocks (to avoid incurring
> > fines if breaking databases, etc.).
> > 
> > Exposing the host timekeeping through the structure means that the
> > migrated guest can keep working because it can trust the timekeeping
> > performed by the (new) host and exposed to it.
> > 
> > If the counter is actually varying in frequency over time, and the host
> > is slewing the clock frequency that it reports, that *isn't* a step
> > change and doesn't mean that the guest should throw away any
> > calibration that it's been doing for itself. One hopes that the guest
> > would have detected the *same* frequency change, and be adapting for
> > itself. So I don't think that should indicate a disruption.
> > 
> > I think the same is even true if the clock is stepped by the host. The
> > actual *counter* hasn't changed, so the guest is better off ignoring
> > the vacillating host and continuing to derive its idea of time from the
> > hardware counter itself, as calibrated against some external NTP/PTP
> > sources. Surely we actively *don't* to tell the guest to throw its own
> > calibrations away, in this case?
> 
> In case the guest is also considering other time sources, it might indeed
> not be a good idea to mix host clock changes into the hardware counter
> disruption marker.
> 
> But if the vmclock is the authoritative source of time, it can still be
> helpful to know about such changes, maybe through another marker.

Could that be the existing seq_count field?

Skewing the counter_period_frac_sec as the underlying oscillator speeds
up and slows down is perfectly normal and expected, and we already
expect the seq_count to change when that happens.

Maybe step changes are different, but arguably if the time advertised
by the host steps *outside* the error bounds previously advertised,
that's just broken?

Depending on how the clock information is fed, a change in seq_count
may even result in non-monotonicity. If the underlying oscillator has
sped up and the structure is updated accordingly, the time calculated
the moment *before* that update may appear later than the time
calculated immediately after it.

It's up to the guest operating system to feed that information into its
own timekeeping system and skew towards correctness instead of stepping
the time it reports to its users.



smime.p7s
Description: S/MIME cryptographic signature


Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support

2024-06-28 Thread David Woodhouse
On Fri, 2024-06-28 at 13:33 +0200, Peter Hilber wrote:
> 
> > 
> > /*
> >  * What time is exposed in the time_sec/time_frac_sec fields?
> >  */
> > uint8_t time_type;
> > #define VMCLOCK_TIME_UNKNOWN0   /* Invalid / no time 
> > exposed */
> > #define VMCLOCK_TIME_UTC1   /* Since 1970-01-01 
> > 00:00:00z */
> > #define VMCLOCK_TIME_TAI2   /* Since 1970-01-01 
> > 00:00:00z */
> > #define VMCLOCK_TIME_MONOTONIC  3   /* Since undefined epoch */
> > 
> > /* Bit shift for counter_period_frac_sec and its error rate */
> > uint8_t counter_period_shift;
> > 
> > /*
> >  * Unlike in NTP, this can indicate a leap second in the past. This
> >  * is needed to allow guests to derive an imprecise clock with
> >  * smeared leap seconds for themselves, as some modes of smearing
> >  * need the adjustments to continue even after the moment at which
> >  * the leap second should have occurred.
> >  */
> > int8_t leapsecond_direction;
> > uint64_t leapsecond_tai_sec; /* Since 1970-01-01 00:00:00z */
> > 
> > /*
> >  * Paired values of counter and UTC at a given point in time.
> >  */
> > uint64_t counter_value;
> > uint64_t time_sec; /* Since 1970-01-01 00:00:00z */
> 
> Nitpick: The comment is not valid any more for TIME_MONOTONIC.

Ah yes, I "moved" that comment up to the UTC/TAI time_type values, but
neglected to actually delete it from here. Fixed; thanks.


smime.p7s
Description: S/MIME cryptographic signature


Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support

2024-06-28 Thread Peter Hilber
On 27.06.24 18:03, David Woodhouse wrote:
> 
> I've updated the tree at
> https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/vmclock
> (but not yet the qemu one).
> 
> I think I've taken into account all your comments apart from the one
> about non-64-bit counters wrapping. I reduced the seq_count to 32 bit
> to make room for a 32-bit flags field, added the time type
> (UTC/TAI/MONOTONIC) and a smearing hint, with some straw man
> definitions for smearing algorithms for which I could actually find
> definitions.
> 
> The structure now looks like this:
> 
> 
> struct vmclock_abi {

[...]

> 
>   /*
>* What time is exposed in the time_sec/time_frac_sec fields?
>*/
>   uint8_t time_type;
> #define VMCLOCK_TIME_UNKNOWN  0   /* Invalid / no time exposed */
> #define VMCLOCK_TIME_UTC  1   /* Since 1970-01-01 00:00:00z */
> #define VMCLOCK_TIME_TAI  2   /* Since 1970-01-01 00:00:00z */
> #define VMCLOCK_TIME_MONOTONIC3   /* Since undefined 
> epoch */
> 
>   /* Bit shift for counter_period_frac_sec and its error rate */
>   uint8_t counter_period_shift;
> 
>   /*
>* Unlike in NTP, this can indicate a leap second in the past. This
>* is needed to allow guests to derive an imprecise clock with
>* smeared leap seconds for themselves, as some modes of smearing
>* need the adjustments to continue even after the moment at which
>* the leap second should have occurred.
>*/
>   int8_t leapsecond_direction;
>   uint64_t leapsecond_tai_sec; /* Since 1970-01-01 00:00:00z */
> 
>   /*
>* Paired values of counter and UTC at a given point in time.
>*/
>   uint64_t counter_value;
>   uint64_t time_sec; /* Since 1970-01-01 00:00:00z */

Nitpick: The comment is not valid any more for TIME_MONOTONIC.



Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support

2024-06-28 Thread Peter Hilber
On 27.06.24 16:52, David Woodhouse wrote:
> On Thu, 2024-06-27 at 15:50 +0200, Peter Hilber wrote:
>> On 25.06.24 21:01, David Woodhouse wrote:
>>> From: David Woodhouse 
>>>
>>> The vmclock "device" provides a shared memory region with precision clock
>>> information. By using shared memory, it is safe across Live Migration.
>>>
>>> Like the KVM PTP clock, this can convert TSC-based cross timestamps into
>>> KVM clock values. Unlike the KVM PTP clock, it does so only when such is
>>> actually helpful.
>>>
>>> The memory region of the device is also exposed to userspace so it can be
>>> read or memory mapped by application which need reliable notification of
>>> clock disruptions.
>>>
>>> Signed-off-by: David Woodhouse 
>>> ---
>>>
>>> v2: 
>>>  • Add gettimex64() support
>>>  • Convert TSC values to KVM clock when appropriate
>>>  • Require int128 support
>>>  • Add counter_period_shift 
>>>  • Add timeout when seq_count is invalid
>>>  • Add flags field
>>>  • Better comments in vmclock ABI structure
>>>  • Explicitly forbid smearing (as clock rates would need to change)
>>
>> Leap second smearing information could still be conveyed through the
>> vmclock_abi. AFAIU, to cover the popular smearing variants, it should be
>> enough to indicate whether the driver should apply linear or cosine
>> smearing, and the start time and end time.
> 
> Yes. The clock information actually conveyed through the {counter,
> time, rate} tuple should never be smeared, and should only ever be UTC.
> 
> But we could provide a hint to the guest operating system about what
> type of smearing to perform, *if* it chooses to offer a clock other
> than the standard CLOCK_REALTIME to its users.
> 
> I already added a flags field, so this might look something like:
> 
> /*
>  * Smearing flags. The UTC clock exposed through this structure
>  * is only ever true UTC, but a guest operating system may
>  * choose to offer a monotonic smeared clock to its users. This
>  * merely offers a hint about what kind of smearing to perform,
>  * for consistency with systems in the nearby environment.
>  */
> #define VMCLOCK_FLAGS_SMEAR_UTC_SLS (1<<5) /* draft-kuhn-leapsecond-00.txt */
> 
> 
> (UTC-SLS is probably a bad example but are there formal definitions for
> anything else?)
> 
> 

I think it could also be more generic, like flags for linear smearing,
cosine smearing(?), and smear_start_sec and smear_end_sec fields (relative
to the leap second start). That could also represent UTC-SLS, and
noon-to-noon, and it would be well-defined.

This should reduce the likelihood that the guest doesn't know the smearing
variant.


[...]

>>> diff --git a/include/uapi/linux/vmclock.h b/include/uapi/linux/vmclock.h
>>> new file mode 100644
>>> index ..cf0f22205e79
>>> --- /dev/null
>>> +++ b/include/uapi/linux/vmclock.h
>>> @@ -0,0 +1,138 @@
>>> +/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR 
>>> BSD-2-Clause) */
>>> +
>>> +/*
>>> + * This structure provides a vDSO-style clock to VM guests, exposing the
>>> + * relationship (or lack thereof) between the CPU clock (TSC, timebase, 
>>> arch
>>> + * counter, etc.) and real time. It is designed to address the problem of
>>> + * live migration, which other clock enlightenments do not.
>>> + *
>>> + * When a guest is live migrated, this affects the clock in two ways.
>>> + *
>>> + * First, even between identical hosts the actual frequency of the 
>>> underlying
>>> + * counter will change within the tolerances of its specification 
>>> (typically
>>> + * ±50PPM, or 4 seconds a day). The frequency also varies over time on the
>>> + * same host, but can be tracked by NTP as it generally varies slowly. With
>>> + * live migration there is a step change in the frequency, with no warning.
>>> + *
>>> + * Second, there may be a step change in the value of the counter itself, 
>>> as
>>> + * its accuracy is limited by the precision of the NTP synchronization on 
>>> the
>>> + * source and destination hosts.
>>> + *
>>> + * So any calibration (NTP, PTP, etc.) which the guest has done on the 
>>> source
>>> + * host before migration is invalid, and needs to be redone on the new 
>>> host.
>>> + *
>>> + * In its most basic mode, this structure provides only an indication to 
>>> the
>>> + * guest that live migration has occurred. This allows the guest to know 
>>> that
>>> + * its clock is invalid and take remedial action. For applications that 
>>> need
>>> + * reliable accurate timestamps (e.g. distributed databases), the structure
>>> + * can be mapped all the way to userspace. This allows the application to 
>>> see
>>> + * directly for itself that the clock is disrupted and take appropriate
>>> + * action, even when using a vDSO-style method to get the time instead of a
>>> + * system call.
>>> + *
>>> + * In its more advanced mode. this structure can also be used to expose the
>>> + * precise relationship of the CPU counter to 

Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support

2024-06-27 Thread David Woodhouse

I've updated the tree at
https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/vmclock
(but not yet the qemu one).

I think I've taken into account all your comments apart from the one
about non-64-bit counters wrapping. I reduced the seq_count to 32 bit
to make room for a 32-bit flags field, added the time type
(UTC/TAI/MONOTONIC) and a smearing hint, with some straw man
definitions for smearing algorithms for which I could actually find
definitions.

The structure now looks like this:


struct vmclock_abi {
uint32_t magic;
#define VMCLOCK_MAGIC   0x4b4c4356 /* "VCLK" */
uint16_t size;  /* Size of page containing this structure */
uint16_t version;   /* 1 */

/* Sequence lock. Low bit means an update is in progress. */
uint32_t seq_count;

uint32_t flags;
/* Indicates that the tai_offset_sec field is valid */
#define VMCLOCK_FLAG_TAI_OFFSET_VALID   (1 << 0)
/*
 * Optionally used to notify guests of pending maintenance events.
 * A guest may wish to remove itself from service if an event is
 * coming up. Two flags indicate the rough imminence of the event.
 */
#define VMCLOCK_FLAG_DISRUPTION_SOON(1 << 1) /* About a day */
#define VMCLOCK_FLAG_DISRUPTION_IMMINENT(1 << 2) /* About an hour */
/* Indicates that the utc_time_maxerror_picosec field is valid */
#define VMCLOCK_FLAG_UTC_MAXERROR_VALID (1 << 3)
/* Indicates counter_period_error_rate_frac_sec is valid */
#define VMCLOCK_FLAG_PERIOD_ERROR_VALID (1 << 4)

/*
 * This field changes to another non-repeating value when the CPU
 * counter is disrupted, for example on live migration. This lets
 * the guest know that it should discard any calibration it has
 * performed of the counter against external sources (NTP/PTP/etc.).
 */
uint64_t disruption_marker;

uint8_t clock_status;
#define VMCLOCK_STATUS_UNKNOWN  0
#define VMCLOCK_STATUS_INITIALIZING 1
#define VMCLOCK_STATUS_SYNCHRONIZED 2
#define VMCLOCK_STATUS_FREERUNNING  3
#define VMCLOCK_STATUS_UNRELIABLE   4

uint8_t counter_id;
#define VMCLOCK_COUNTER_INVALID 0
#define VMCLOCK_COUNTER_X86_TSC 1
#define VMCLOCK_COUNTER_ARM_VCNT2
#define VMCLOCK_COUNTER_X86_ART 3

/*
 * By providing the offset from UTC to TAI, the guest can know both
 * UTC and TAI reliably, whichever is indicated in the time_type
 * field. Valid if VMCLOCK_FLAG_TAI_OFFSET_VALID is set in flags.
 */
int16_t tai_offset_sec;

/*
 * The time exposed through this device is never smeaared; if it
 * claims to be VMCLOCK_TIME_UTC then it MUST be UTC. This field
 * provides a hint to the guest operating system, such that *if*
 * the guest OS wants to provide its users with an alternative
 * clock which does not follow the POSIX CLOCK_REALTIME standard,
 * it may do so in a fashion consistent with the other systems
 * in the nearby environment.
 */
uint8_t leap_second_smearing_hint;
/* Provide true UTC to users, unsmeared. */;
#define VMCLOCK_SMEARING_NONE   0
/*
 * 
https://aws.amazon.com/blogs/aws/look-before-you-leap-the-coming-leap-second-and-aws/
 * From noon on the day before to noon on the day after, smear the
 * clock by a linear 1/86400s per second.
*/
#define VMCLOCK_SMEARING_LINEAR_86400   1
/*
 * draft-kuhn-leapsecond-00
 * For the 1000s leading up to the leap second, smear the clock by
 * clock by a linear 1ms per second.
 */
#define VMCLOCK_SMEARING_UTC_SLS2

/*
 * What time is exposed in the time_sec/time_frac_sec fields?
 */
uint8_t time_type;
#define VMCLOCK_TIME_UNKNOWN0   /* Invalid / no time exposed */
#define VMCLOCK_TIME_UTC1   /* Since 1970-01-01 00:00:00z */
#define VMCLOCK_TIME_TAI2   /* Since 1970-01-01 00:00:00z */
#define VMCLOCK_TIME_MONOTONIC  3   /* Since undefined epoch */

/* Bit shift for counter_period_frac_sec and its error rate */
uint8_t counter_period_shift;

/*
 * Unlike in NTP, this can indicate a leap second in the past. This
 * is needed to allow guests to derive an imprecise clock with
 * smeared leap seconds for themselves, as some modes of smearing
 * need the adjustments to continue even after the moment at which
 * the leap second should have occurred.
 */
int8_t leapsecond_direction;
uint64_t leapsecond_tai_sec; /* Since 1970-01-01 00:00:00z */

/*
 * Paired values of counter and UTC at a given point in time.
 */
uint64_t 

Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support

2024-06-27 Thread David Woodhouse
On Thu, 2024-06-27 at 15:50 +0200, Peter Hilber wrote:
> On 25.06.24 21:01, David Woodhouse wrote:
> > From: David Woodhouse 
> > 
> > The vmclock "device" provides a shared memory region with precision clock
> > information. By using shared memory, it is safe across Live Migration.
> > 
> > Like the KVM PTP clock, this can convert TSC-based cross timestamps into
> > KVM clock values. Unlike the KVM PTP clock, it does so only when such is
> > actually helpful.
> > 
> > The memory region of the device is also exposed to userspace so it can be
> > read or memory mapped by application which need reliable notification of
> > clock disruptions.
> > 
> > Signed-off-by: David Woodhouse 
> > ---
> > 
> > v2: 
> >  • Add gettimex64() support
> >  • Convert TSC values to KVM clock when appropriate
> >  • Require int128 support
> >  • Add counter_period_shift 
> >  • Add timeout when seq_count is invalid
> >  • Add flags field
> >  • Better comments in vmclock ABI structure
> >  • Explicitly forbid smearing (as clock rates would need to change)
> 
> Leap second smearing information could still be conveyed through the
> vmclock_abi. AFAIU, to cover the popular smearing variants, it should be
> enough to indicate whether the driver should apply linear or cosine
> smearing, and the start time and end time.

Yes. The clock information actually conveyed through the {counter,
time, rate} tuple should never be smeared, and should only ever be UTC.

But we could provide a hint to the guest operating system about what
type of smearing to perform, *if* it chooses to offer a clock other
than the standard CLOCK_REALTIME to its users.

I already added a flags field, so this might look something like:

/*
 * Smearing flags. The UTC clock exposed through this structure
 * is only ever true UTC, but a guest operating system may
 * choose to offer a monotonic smeared clock to its users. This
 * merely offers a hint about what kind of smearing to perform,
 * for consistency with systems in the nearby environment.
 */
#define VMCLOCK_FLAGS_SMEAR_UTC_SLS (1<<5) /* draft-kuhn-leapsecond-00.txt */


(UTC-SLS is probably a bad example but are there formal definitions for
anything else?)


> > But we 
> >  drivers/ptp/Kconfig  |  13 +
> >  drivers/ptp/Makefile |   1 +
> >  drivers/ptp/ptp_vmclock.c    | 516 +++
> >  include/uapi/linux/vmclock.h | 138 ++
> >  4 files changed, 668 insertions(+)
> >  create mode 100644 drivers/ptp/ptp_vmclock.c
> >  create mode 100644 include/uapi/linux/vmclock.h
> > 
> 
> [...]
> 
> > +
> > +/*
> > + * Multiply a 64-bit count by a 64-bit tick 'period' in units of seconds 
> > >> 64
> > + * and add the fractional second part of the reference time.
> > + *
> > + * The result is a 128-bit value, the top 64 bits of which are seconds, and
> > + * the low 64 bits are (seconds >> 64).
> > + *
> > + * If __int128 isn't available, perform the calculation 32 bits at a time 
> > to
> > + * avoid overflow.
> > + */
> > +static inline uint64_t mul_u64_u64_shr_add_u64(uint64_t *res_hi, uint64_t 
> > delta,
> > +  uint64_t period, uint8_t 
> > shift,
> > +  uint64_t frac_sec)
> > +{
> > +   unsigned __int128 res = (unsigned __int128)delta * period;
> > +
> > +   res >>= shift;
> > +   res += frac_sec;
> > +   *res_hi = res >> 64;
> > +   return (uint64_t)res;
> > +}
> > +
> > +static int vmclock_get_crosststamp(struct vmclock_state *st,
> > +  struct ptp_system_timestamp *sts,
> > +  struct system_counterval_t 
> > *system_counter,
> > +  struct timespec64 *tspec)
> > +{
> > +   ktime_t deadline = ktime_add(ktime_get(), VMCLOCK_MAX_WAIT);
> > +   struct system_time_snapshot systime_snapshot;
> > +   uint64_t cycle, delta, seq, frac_sec;
> > +
> > +#ifdef CONFIG_X86
> > +   /*
> > +    * We'd expect the hypervisor to know this and to report the clock
> > +    * status as VMCLOCK_STATUS_UNRELIABLE. But be paranoid.
> > +    */
> > +   if (check_tsc_unstable())
> > +   return -EINVAL;
> > +#endif
> > +
> > +   while (1) {
> > +   seq = st->clk->seq_count & ~1ULL;
> > +   virt_rmb();
> > +
> > +   if (st->clk->clock_status == VMCLOCK_STATUS_UNRELIABLE)
> > +   return -EINVAL;
> > +
> > +   /*
> > +    * When invoked for gettimex64(), fill in the pre/post 
> > system
> > +    * times. The simple case is when system time is based on 
> > the
> > +    * same counter as st->cs_id, in which case all three times
> > +    * will be derived from the *same* counter value.
> > +    *
> > +    * If the system isn't using the same counter, then 

Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support

2024-06-27 Thread Peter Hilber
On 25.06.24 21:01, David Woodhouse wrote:
> From: David Woodhouse 
> 
> The vmclock "device" provides a shared memory region with precision clock
> information. By using shared memory, it is safe across Live Migration.
> 
> Like the KVM PTP clock, this can convert TSC-based cross timestamps into
> KVM clock values. Unlike the KVM PTP clock, it does so only when such is
> actually helpful.
> 
> The memory region of the device is also exposed to userspace so it can be
> read or memory mapped by application which need reliable notification of
> clock disruptions.
> 
> Signed-off-by: David Woodhouse 
> ---
> 
> v2: 
>  • Add gettimex64() support
>  • Convert TSC values to KVM clock when appropriate
>  • Require int128 support
>  • Add counter_period_shift 
>  • Add timeout when seq_count is invalid
>  • Add flags field
>  • Better comments in vmclock ABI structure
>  • Explicitly forbid smearing (as clock rates would need to change)

Leap second smearing information could still be conveyed through the
vmclock_abi. AFAIU, to cover the popular smearing variants, it should be
enough to indicate whether the driver should apply linear or cosine
smearing, and the start time and end time.

> 
>  drivers/ptp/Kconfig  |  13 +
>  drivers/ptp/Makefile |   1 +
>  drivers/ptp/ptp_vmclock.c| 516 +++
>  include/uapi/linux/vmclock.h | 138 ++
>  4 files changed, 668 insertions(+)
>  create mode 100644 drivers/ptp/ptp_vmclock.c
>  create mode 100644 include/uapi/linux/vmclock.h
> 

[...]

> +
> +/*
> + * Multiply a 64-bit count by a 64-bit tick 'period' in units of seconds >> 
> 64
> + * and add the fractional second part of the reference time.
> + *
> + * The result is a 128-bit value, the top 64 bits of which are seconds, and
> + * the low 64 bits are (seconds >> 64).
> + *
> + * If __int128 isn't available, perform the calculation 32 bits at a time to
> + * avoid overflow.
> + */
> +static inline uint64_t mul_u64_u64_shr_add_u64(uint64_t *res_hi, uint64_t 
> delta,
> +uint64_t period, uint8_t shift,
> +uint64_t frac_sec)
> +{
> + unsigned __int128 res = (unsigned __int128)delta * period;
> +
> + res >>= shift;
> + res += frac_sec;
> + *res_hi = res >> 64;
> + return (uint64_t)res;
> +}
> +
> +static int vmclock_get_crosststamp(struct vmclock_state *st,
> +struct ptp_system_timestamp *sts,
> +struct system_counterval_t *system_counter,
> +struct timespec64 *tspec)
> +{
> + ktime_t deadline = ktime_add(ktime_get(), VMCLOCK_MAX_WAIT);
> + struct system_time_snapshot systime_snapshot;
> + uint64_t cycle, delta, seq, frac_sec;
> +
> +#ifdef CONFIG_X86
> + /*
> +  * We'd expect the hypervisor to know this and to report the clock
> +  * status as VMCLOCK_STATUS_UNRELIABLE. But be paranoid.
> +  */
> + if (check_tsc_unstable())
> + return -EINVAL;
> +#endif
> +
> + while (1) {
> + seq = st->clk->seq_count & ~1ULL;
> + virt_rmb();
> +
> + if (st->clk->clock_status == VMCLOCK_STATUS_UNRELIABLE)
> + return -EINVAL;
> +
> + /*
> +  * When invoked for gettimex64(), fill in the pre/post system
> +  * times. The simple case is when system time is based on the
> +  * same counter as st->cs_id, in which case all three times
> +  * will be derived from the *same* counter value.
> +  *
> +  * If the system isn't using the same counter, then the value
> +  * from ktime_get_snapshot() will still be used as pre_ts, and
> +  * ptp_read_system_postts() is called to populate postts after
> +  * calling get_cycles().
> +  *
> +  * The conversion to timespec64 happens further down, outside
> +  * the seq_count loop.
> +  */
> + if (sts) {
> + ktime_get_snapshot(_snapshot);
> + if (systime_snapshot.cs_id == st->cs_id) {
> + cycle = systime_snapshot.cycles;
> + } else {
> + cycle = get_cycles();
> + ptp_read_system_postts(sts);
> + }
> + } else
> + cycle = get_cycles();
> +
> + delta = cycle - st->clk->counter_value;

AFAIU in the general case this needs to be masked for non 64-bit counters.

> +
> + frac_sec = mul_u64_u64_shr_add_u64(>tv_sec, delta,
> +
> st->clk->counter_period_frac_sec,
> +
> st->clk->counter_period_shift,
> +st->clk->utc_time_frac_sec);

Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support

2024-06-26 Thread Richard Cochran
On Tue, Jun 25, 2024 at 11:34:24PM +0200, Thomas Gleixner wrote:

> There is effort underway to expose PTP clocks to user space via
> VDSO.

That sounds interesting.  Has anything been posted?

Thanks,
Richard




Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support

2024-06-26 Thread David Woodhouse
On Tue, 2024-06-25 at 15:22 -0700, John Stultz wrote:
> On Tue, Jun 25, 2024 at 2:48 PM David Woodhouse  wrote:
> > On Tue, 2024-06-25 at 23:34 +0200, Thomas Gleixner wrote:
> > > On Tue, Jun 25 2024 at 20:01, David Woodhouse wrote:
> > > > From: David Woodhouse 
> > > > 
> > > > The vmclock "device" provides a shared memory region with precision 
> > > > clock
> > > > information. By using shared memory, it is safe across Live Migration.
> > > > 
> > > > Like the KVM PTP clock, this can convert TSC-based cross timestamps into
> > > > KVM clock values. Unlike the KVM PTP clock, it does so only when such is
> > > > actually helpful.
> > > > 
> > > > The memory region of the device is also exposed to userspace so it can 
> > > > be
> > > > read or memory mapped by application which need reliable notification of
> > > > clock disruptions.
> > > 
> > > There is effort underway to expose PTP clocks to user space via VDSO.
> > 
> > Ooh, interesting. Got a reference to that please?
> > 
> > >  Can we please not expose an ad hoc interface for that?
> > 
> > Absolutely. I'm explicitly trying to intercept the virtio-rtc
> > specification here, to *avoid* having to do anything ad hoc.
> > 
> > Note that this is a "vDSO-style" interface from hypervisor to guest via
> > a shared memory region, not necessarily an actual vDSO.
> > 
> > But yes, it *is* intended to be exposed to userspace, so that userspace
> > can know the *accurate* time without a system call, and know that it
> > hasn't been perturbed by live migration.
> 
> Yea, I was going to raise a concern that just defining an mmaped
> structure means it has to trust the guest logic is as expected. It's
> good that it's versioned! :)

Right. Although it's basically a pvclock, and we've had those for ages.

The main difference here is that we add an indicator that tells the
guest that it's been live migrated, so any additional NTP/PTP
refinement that the *guest* has done of its oscillator, should now be
discarded.

It's also designed to be useful in "disruption-only" mode, where the
pvclock information isn't actually populated, so *all* it does is tell
guests that their clock is now hosed due to live migration.

That part is why it needs to be mappable directly to userspace, so that
userspace can not only get a timestamp but *also* know that it's
actually valid. All without a system call.

The critical use cases are financial systems where they incur massive
fines if they submit mis-timestamped transactions, and distributed
databases which rely on accurate timestamps (and error bounds) for
eventual coherence. Live migration can screw those completely.

I'm open to changing fairly much anything about the proposal as long as
we can address those use cases (which the existing virtio-rtc and other
KVM enlightenments do not).

> I'd fret a bit about exposing this to userland. It feels very similar
> to the old powerpc systemcfg implementation that similarly mapped just
> kernel data out to userland and was difficult to maintain as changes
> were made. Would including a code page like a proper vdso make sense
> to make this more flexible of an UABI to maintain?

I think the structure itself should be stable once we've bikeshedded it
a bit. But there is certainly some potential for vDSO functions which
help us expose it to the user...

This structure exposes a 'disruption count' which is updated every time
the TSC/counter is messed with by live migration. But what is userspace
actually going to *compare* it with?

It basically needs to compare it with the disruption count when the
clock was last synchronized, so maybe the kernel could export *that* to
vDSO too, then expose a simple vDSO function which reports whether the
clock is valid?

The 'invalid' code path could turn into an actual system call which
makes the kernel (check for itself and) call ntp_clear() when the
disruption occurs. Or maybe not just ntp_clear() but actually consume
the pvclock rate information directly and apply the *new* calibration?

That kind of thing would be great, and I've definitely tried to design
the structure so that it *can* be made a first-class citizen within the
kernel's timekeeping code and used like that.

But I was going to start with a more modest proposal that it's "just a
device", and applications which care about reliable time after LM would
have to /dev/vmclock0 and mmap it and check for themselves. (Which
would be assisted by things like the ClockBound library).




smime.p7s
Description: S/MIME cryptographic signature


Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support

2024-06-25 Thread John Stultz
On Tue, Jun 25, 2024 at 2:48 PM David Woodhouse  wrote:
> On Tue, 2024-06-25 at 23:34 +0200, Thomas Gleixner wrote:
> > On Tue, Jun 25 2024 at 20:01, David Woodhouse wrote:
> > > From: David Woodhouse 
> > >
> > > The vmclock "device" provides a shared memory region with precision clock
> > > information. By using shared memory, it is safe across Live Migration.
> > >
> > > Like the KVM PTP clock, this can convert TSC-based cross timestamps into
> > > KVM clock values. Unlike the KVM PTP clock, it does so only when such is
> > > actually helpful.
> > >
> > > The memory region of the device is also exposed to userspace so it can be
> > > read or memory mapped by application which need reliable notification of
> > > clock disruptions.
> >
> > There is effort underway to expose PTP clocks to user space via VDSO.
>
> Ooh, interesting. Got a reference to that please?
>
> >  Can we please not expose an ad hoc interface for that?
>
> Absolutely. I'm explicitly trying to intercept the virtio-rtc
> specification here, to *avoid* having to do anything ad hoc.
>
> Note that this is a "vDSO-style" interface from hypervisor to guest via
> a shared memory region, not necessarily an actual vDSO.
>
> But yes, it *is* intended to be exposed to userspace, so that userspace
> can know the *accurate* time without a system call, and know that it
> hasn't been perturbed by live migration.

Yea, I was going to raise a concern that just defining an mmaped
structure means it has to trust the guest logic is as expected. It's
good that it's versioned! :)

I'd fret a bit about exposing this to userland. It feels very similar
to the old powerpc systemcfg implementation that similarly mapped just
kernel data out to userland and was difficult to maintain as changes
were made. Would including a code page like a proper vdso make sense
to make this more flexible of an UABI to maintain?

thanks
-john



Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support

2024-06-25 Thread David Woodhouse
On Tue, 2024-06-25 at 23:34 +0200, Thomas Gleixner wrote:
> On Tue, Jun 25 2024 at 20:01, David Woodhouse wrote:
> > From: David Woodhouse 
> > 
> > The vmclock "device" provides a shared memory region with precision clock
> > information. By using shared memory, it is safe across Live Migration.
> > 
> > Like the KVM PTP clock, this can convert TSC-based cross timestamps into
> > KVM clock values. Unlike the KVM PTP clock, it does so only when such is
> > actually helpful.
> > 
> > The memory region of the device is also exposed to userspace so it can be
> > read or memory mapped by application which need reliable notification of
> > clock disruptions.
> 
> There is effort underway to expose PTP clocks to user space via VDSO.

Ooh, interesting. Got a reference to that please?

>  Can we please not expose an ad hoc interface for that?

Absolutely. I'm explicitly trying to intercept the virtio-rtc
specification here, to *avoid* having to do anything ad hoc.

Note that this is a "vDSO-style" interface from hypervisor to guest via
a shared memory region, not necessarily an actual vDSO.

But yes, it *is* intended to be exposed to userspace, so that userspace
can know the *accurate* time without a system call, and know that it
hasn't been perturbed by live migration.

> As you might have heard the sad news, I'm not feeling up to the task to
> dig deeper into this right now. Give me a couple of days to look at this
> with working brain.

I have not heard any news, although now I'm making inferences.

Wishing you the best!


smime.p7s
Description: S/MIME cryptographic signature


Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support

2024-06-25 Thread Thomas Gleixner
On Tue, Jun 25 2024 at 20:01, David Woodhouse wrote:
> From: David Woodhouse 
>
> The vmclock "device" provides a shared memory region with precision clock
> information. By using shared memory, it is safe across Live Migration.
>
> Like the KVM PTP clock, this can convert TSC-based cross timestamps into
> KVM clock values. Unlike the KVM PTP clock, it does so only when such is
> actually helpful.
>
> The memory region of the device is also exposed to userspace so it can be
> read or memory mapped by application which need reliable notification of
> clock disruptions.

There is effort underway to expose PTP clocks to user space via
VDSO. Can we please not expose an ad hoc interface for that?

As you might have heard the sad news, I'm not feeling up to the task to
dig deeper into this right now. Give me a couple of days to look at this
with working brain.

Thanks,

tglx