Re: [RFC PATCH v4] ptp: Add vDSO-style vmclock support
On 16.07.24 14:32, David Woodhouse wrote: > On 16 July 2024 12:54:52 BST, Peter Hilber > wrote: >> On 11.07.24 09:50, David Woodhouse wrote: >>> On Thu, 2024-07-11 at 09:25 +0200, Peter Hilber wrote: >>>> >>>> IMHO this phrasing is better, since it directly refers to the state of the >>>> structure. >>> >>> Thanks. I'll update it. >>> >>>> AFAIU if there would be abnormal delays in store buffers, causing some >>>> driver to still see the old clock for some time, the monotonicity could be >>>> violated: >>>> >>>> 1. device writes new, much slower clock to store buffer >>>> 2. some time passes >>>> 3. driver reads old, much faster clock >>>> 4. device writes store buffer to cache >>>> 5. driver reads new, much slower clock >>>> >>>> But I hope such delays do not occur. >>> >>> For the case of the hypervisor←→guest interface this should be handled >>> by the use of memory barriers and the seqcount lock. >>> >>> The guest driver reads the seqcount, performs a read memory barrier, >>> then reads the contents of the structure. Then performs *another* read >>> memory barrier, and checks the seqcount hasn't changed: >>> https://git.infradead.org/?p=users/dwmw2/linux.git;a=blob;f=drivers/ptp/ptp_vmclock.c;hb=vmclock#l351 >>> >>> The converse happens with write barriers on the hypervisor side: >>> https://git.infradead.org/?p=users/dwmw2/qemu.git;a=blob;f=hw/acpi/vmclock.c;hb=vmclock#l68 >> >> My point is that, looking at the above steps 1. - 5.: >> >> 3. read HW counter, smp_rmb, read seqcount >> 4. store seqcount, smp_wmb, stores, smp_wmb, store seqcount become effective >> 5. read seqcount, smp_rmb, read HW counter >> >> AFAIU this would still be a theoretical problem suggesting the use of >> stronger barriers. > > This seems like a bug on the guest side. The HW counter needs to be read > *within* the (paired, matching) seqcount reads, not before or after. > > There would be paired reads: 1. device writes new, much slower clock to store buffer 2. some time passes 3. read seqcount, smp_rmb, ..., read HW counter, smp_rmb, read seqcount 4. store seqcount, smp_wmb, stores, smp_wmb, store seqcount all become effective only now 5. read seqcount, smp_rmb, read HW counter, ..., smp_rmb, read seqcount I just omitted the parts which do not necessarily need to happen close to 4. for the monotonicity to be violated. My point is that 1. could become visible to other cores long after it happened on the local core (during 4.).
Re: [RFC PATCH v4] ptp: Add vDSO-style vmclock support
On 08.07.24 11:27, David Woodhouse wrote: > + > + /* > + * Time according to time_type field above. > + */ > + uint64_t time_sec; /* Seconds since time_type epoch */ > + uint64_t time_frac_sec; /* (seconds >> 64) */ > + uint64_t time_esterror_picosec; /* (± picoseconds) */ > + uint64_t time_maxerror_picosec; /* (± picoseconds) */ Is this unsigned or signed?
Re: [RFC PATCH v4] ptp: Add vDSO-style vmclock support
On 11.07.24 09:50, David Woodhouse wrote: > On Thu, 2024-07-11 at 09:25 +0200, Peter Hilber wrote: >> >> IMHO this phrasing is better, since it directly refers to the state of the >> structure. > > Thanks. I'll update it. > >> AFAIU if there would be abnormal delays in store buffers, causing some >> driver to still see the old clock for some time, the monotonicity could be >> violated: >> >> 1. device writes new, much slower clock to store buffer >> 2. some time passes >> 3. driver reads old, much faster clock >> 4. device writes store buffer to cache >> 5. driver reads new, much slower clock >> >> But I hope such delays do not occur. > > For the case of the hypervisor←→guest interface this should be handled > by the use of memory barriers and the seqcount lock. > > The guest driver reads the seqcount, performs a read memory barrier, > then reads the contents of the structure. Then performs *another* read > memory barrier, and checks the seqcount hasn't changed: > https://git.infradead.org/?p=users/dwmw2/linux.git;a=blob;f=drivers/ptp/ptp_vmclock.c;hb=vmclock#l351 > > The converse happens with write barriers on the hypervisor side: > https://git.infradead.org/?p=users/dwmw2/qemu.git;a=blob;f=hw/acpi/vmclock.c;hb=vmclock#l68 My point is that, looking at the above steps 1. - 5.: 3. read HW counter, smp_rmb, read seqcount 4. store seqcount, smp_wmb, stores, smp_wmb, store seqcount become effective 5. read seqcount, smp_rmb, read HW counter AFAIU this would still be a theoretical problem suggesting the use of stronger barriers. > > Do we need to think harder about the ordering across a real PCI bus? It > isn't entirely unreasonable for this to be implemented in hardware if > we eventually add a counter_id value for a bus-visible counter like the > Intel Always Running Timer (ART). I'm also OK with saying that device > implementations may only provide the shared memory structure if they > can ensure memory ordering. Sounds good to me. This statement would then also address the above.
Re: [RFC PATCH v4] ptp: Add vDSO-style vmclock support
On 10.07.24 18:01, David Woodhouse wrote: > On Wed, 2024-07-10 at 15:07 +0200, Peter Hilber wrote: >> On 08.07.24 11:27, David Woodhouse wrote: >>> From: David Woodhouse >>> >>> The vmclock "device" provides a shared memory region with precision clock >>> information. By using shared memory, it is safe across Live Migration. >>> >>> Like the KVM PTP clock, this can convert TSC-based cross timestamps into >>> KVM clock values. Unlike the KVM PTP clock, it does so only when such is >>> actually helpful. >>> >>> The memory region of the device is also exposed to userspace so it can be >>> read or memory mapped by application which need reliable notification of >>> clock disruptions. >>> >>> Signed-off-by: David Woodhouse >> >> [...] >> >>> + >>> +struct vmclock_abi { >>> + /* CONSTANT FIELDS */ >>> + uint32_t magic; >>> +#define VMCLOCK_MAGIC 0x4b4c4356 /* "VCLK" */ >>> + uint32_t size; /* Size of region containing this structure >>> */ >>> + uint16_t version; /* 1 */ >>> + uint8_t counter_id; /* Matches VIRTIO_RTC_COUNTER_xxx except >>> INVALID */ >>> +#define VMCLOCK_COUNTER_ARM_VCNT 0 >>> +#define VMCLOCK_COUNTER_X86_TSC1 >>> +#define VMCLOCK_COUNTER_INVALID0xff >>> + uint8_t time_type; /* Matches VIRTIO_RTC_TYPE_xxx */ >>> +#define VMCLOCK_TIME_UTC 0 /* Since 1970-01-01 >>> 00:00:00z */ >>> +#define VMCLOCK_TIME_TAI 1 /* Since 1970-01-01 >>> 00:00:00z */ >>> +#define VMCLOCK_TIME_MONOTONIC 2 /* Since undefined >>> epoch */ >>> +#define VMCLOCK_TIME_INVALID_SMEARED 3 /* Not supported */ >>> +#define VMCLOCK_TIME_INVALID_MAYBE_SMEARED 4 /* Not supported */ >>> + >>> + /* NON-CONSTANT FIELDS PROTECTED BY SEQCOUNT LOCK */ >>> + uint32_t seq_count; /* Low bit means an update is in progress */ >>> + /* >>> + * This field changes to another non-repeating value when the CPU >>> + * counter is disrupted, for example on live migration. This lets >>> + * the guest know that it should discard any calibration it has >>> + * performed of the counter against external sources (NTP/PTP/etc.). >>> + */ >>> + uint64_t disruption_marker; >>> + uint64_t flags; >>> + /* Indicates that the tai_offset_sec field is valid */ >>> +#define VMCLOCK_FLAG_TAI_OFFSET_VALID (1 << 0) >>> + /* >>> + * Optionally used to notify guests of pending maintenance events. >>> + * A guest which provides latency-sensitive services may wish to >>> + * remove itself from service if an event is coming up. Two flags >>> + * indicate the approximate imminence of the event. >>> + */ >>> +#define VMCLOCK_FLAG_DISRUPTION_SOON (1 << 1) /* About a day */ >>> +#define VMCLOCK_FLAG_DISRUPTION_IMMINENT (1 << 2) /* About an hour */ >>> +#define VMCLOCK_FLAG_PERIOD_ESTERROR_VALID (1 << 3) >>> +#define VMCLOCK_FLAG_PERIOD_MAXERROR_VALID (1 << 4) >>> +#define VMCLOCK_FLAG_TIME_ESTERROR_VALID (1 << 5) >>> +#define VMCLOCK_FLAG_TIME_MAXERROR_VALID (1 << 6) >>> + /* >>> + * Even regardless of leap seconds, the time presented through this >>> + * mechanism may not be strictly monotonic. If the counter slows >>> down >>> + * and the host adapts to this discovery, the time calculated from >>> + * the value of the counter immediately after an update to this >>> + * structure, may appear to be *earlier* than a calculation just >>> + * before the update (while the counter was believed to be running >>> + * faster than it now is). A guest operating system will typically >>> + * *skew* its own system clock back towards the reference clock >>> + * exposed here, rather than following this clock directly. If, >>> + * however, this structure is being populated from such a system >>> + * clock which is already handled in such a fashion and the results >>> + * *are* guaranteed to be monotonic, such monotonicity can be >>> + * advertised by setting this bit. >>> +
Re: [RFC PATCH v4] ptp: Add vDSO-style vmclock support
On 08.07.24 11:27, David Woodhouse wrote: > From: David Woodhouse > > The vmclock "device" provides a shared memory region with precision clock > information. By using shared memory, it is safe across Live Migration. > > Like the KVM PTP clock, this can convert TSC-based cross timestamps into > KVM clock values. Unlike the KVM PTP clock, it does so only when such is > actually helpful. > > The memory region of the device is also exposed to userspace so it can be > read or memory mapped by application which need reliable notification of > clock disruptions. > > Signed-off-by: David Woodhouse [...] > + > +struct vmclock_abi { > + /* CONSTANT FIELDS */ > + uint32_t magic; > +#define VMCLOCK_MAGIC0x4b4c4356 /* "VCLK" */ > + uint32_t size; /* Size of region containing this structure */ > + uint16_t version; /* 1 */ > + uint8_t counter_id; /* Matches VIRTIO_RTC_COUNTER_xxx except INVALID */ > +#define VMCLOCK_COUNTER_ARM_VCNT 0 > +#define VMCLOCK_COUNTER_X86_TSC 1 > +#define VMCLOCK_COUNTER_INVALID 0xff > + uint8_t time_type; /* Matches VIRTIO_RTC_TYPE_xxx */ > +#define VMCLOCK_TIME_UTC 0 /* Since 1970-01-01 > 00:00:00z */ > +#define VMCLOCK_TIME_TAI 1 /* Since 1970-01-01 > 00:00:00z */ > +#define VMCLOCK_TIME_MONOTONIC 2 /* Since > undefined epoch */ > +#define VMCLOCK_TIME_INVALID_SMEARED 3 /* Not supported */ > +#define VMCLOCK_TIME_INVALID_MAYBE_SMEARED 4 /* Not supported */ > + > + /* NON-CONSTANT FIELDS PROTECTED BY SEQCOUNT LOCK */ > + uint32_t seq_count; /* Low bit means an update is in progress */ > + /* > + * This field changes to another non-repeating value when the CPU > + * counter is disrupted, for example on live migration. This lets > + * the guest know that it should discard any calibration it has > + * performed of the counter against external sources (NTP/PTP/etc.). > + */ > + uint64_t disruption_marker; > + uint64_t flags; > + /* Indicates that the tai_offset_sec field is valid */ > +#define VMCLOCK_FLAG_TAI_OFFSET_VALID(1 << 0) > + /* > + * Optionally used to notify guests of pending maintenance events. > + * A guest which provides latency-sensitive services may wish to > + * remove itself from service if an event is coming up. Two flags > + * indicate the approximate imminence of the event. > + */ > +#define VMCLOCK_FLAG_DISRUPTION_SOON (1 << 1) /* About a day */ > +#define VMCLOCK_FLAG_DISRUPTION_IMMINENT (1 << 2) /* About an hour */ > +#define VMCLOCK_FLAG_PERIOD_ESTERROR_VALID (1 << 3) > +#define VMCLOCK_FLAG_PERIOD_MAXERROR_VALID (1 << 4) > +#define VMCLOCK_FLAG_TIME_ESTERROR_VALID (1 << 5) > +#define VMCLOCK_FLAG_TIME_MAXERROR_VALID (1 << 6) > + /* > + * Even regardless of leap seconds, the time presented through this > + * mechanism may not be strictly monotonic. If the counter slows down > + * and the host adapts to this discovery, the time calculated from > + * the value of the counter immediately after an update to this > + * structure, may appear to be *earlier* than a calculation just > + * before the update (while the counter was believed to be running > + * faster than it now is). A guest operating system will typically > + * *skew* its own system clock back towards the reference clock > + * exposed here, rather than following this clock directly. If, > + * however, this structure is being populated from such a system > + * clock which is already handled in such a fashion and the results > + * *are* guaranteed to be monotonic, such monotonicity can be > + * advertised by setting this bit. > + */ I wonder if this might be difficult to define in a standard. Is there a need to define device and driver behavior in more detail? What would happen if e.g. the device first decides how to update the clock, but is then slow to update the SHM? > +#define VMCLOCK_FLAG_TIME_MONOTONIC (1 << 7) > + > + uint8_t pad[2]; > + uint8_t clock_status; > +#define VMCLOCK_STATUS_UNKNOWN 0 > +#define VMCLOCK_STATUS_INITIALIZING 1 > +#define VMCLOCK_STATUS_SYNCHRONIZED 2 > +#define VMCLOCK_STATUS_FREERUNNING 3 > +#define VMCLOCK_STATUS_UNRELIABLE4 > + > + /* > + * The time exposed through this device is never smeared. This field > + * corresponds to the 'subtype' field in virtio-rtc, which indicates > + * the smearing method. However in this case it provides a *hint* to > + * the guest operating system, such that *if* the guest OS wants to > + * provide its users with an alternative clock which does not follow > + * the POSIX CLOCK_REALTIME standard, it may do so in a fashion > + * consistent with the other systems in the nearby environment. AFA
Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support
On 05.07.24 17:02, David Woodhouse wrote: > On Fri, 2024-07-05 at 10:12 +0200, Peter Hilber wrote: >> On 03.07.24 12:40, David Woodhouse wrote: [...] >>> • Why is maxerror in picoseconds? It's the only use of that unit > > Between us we now have picoseconds, nanoseconds, (seconds >> 64) and > (seconds >> 64+n). > > The power-of-two fractions seem to make a lot of sense for the counter > period, because they mean we don't have to perform divisions. > > Does it makes sense to harmonise on (seconds >> 64) for all of the > fractional seconds? Again I don't have a strong opinion; I only want us > to have a *reason* for any differences that exist. > I don't have the expertise with fixed-point arithmetic to judge if this would become unwieldy. I selected ns for the virtio-rtc drafts so far because that didn't have any impact on the precision with the Linux kernel driver message-based use cases, but that would be different for SHM in my understanding. So I would tend to retain ns for convenience for messages (where it doesn't impact precision) but do not have any preference for SHM. >>> • Where do the clock_status values come from? Do they make sense? >>> • Are signed integers OK? (I think so!). >> >> Signed integers would need to be introduced to Virtio, which so far only >> uses explicitly unsigned types: u8, le16 etc. > > Perhaps. Although it would also be possible (if not ideal) to define > that e.g. the tai_offset field is a 16-bit "unsigned" integer according > to virtio, but to be interpreted as follows: > > If the number is <= 32767 then the TAI offset is that value, but if the > number is >= 32768 then the TAI offset is that value minus 65536. > > Perhaps not pretty, but there isn't a *fundamental* dependency on > virtio supporting signed integers as a primary type. > Agreed.
Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support
On 03.07.24 12:40, David Woodhouse wrote: [...] > > > This is what I currently have for 'struct vmclock_abi' that I'd like to > persuade you to adopt. I need to tweak it some more, for at least the > following reasons, as well as any more you can see: > > • size isn't big enough for 64KiB pages > • Should be explicitly little-endian > • Does it need esterror as well as maxerror? I have no opinion about this. I can drop esterror if unwanted. > • Why is maxerror in picoseconds? It's the only use of that unit > • Where do the clock_status values come from? Do they make sense? > • Are signed integers OK? (I think so!). Signed integers would need to be introduced to Virtio, which so far only uses explicitly unsigned types: u8, le16 etc.
Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support
On 02.07.24 20:40, David Woodhouse wrote: > On 2 July 2024 19:12:00 BST, Peter Hilber > wrote: >> On 02.07.24 18:39, David Woodhouse wrote: >>> To clarify then, the main types are >>> >>> VIRTIO_RTC_CLOCK_UTC == 0 >>> VIRTIO_RTC_CLOCK_TAI == 1 >>> VIRTIO_RTC_CLOCK_MONOTONIC == 2 >>> VIRTIO_RTC_CLOCK_SMEARED_UTC == 3 >>> >>> And the subtypes are *only* for the case of >>> VIRTIO_RTC_CLOCK_SMEARED_UTC. They include >>> >>> VIRTIO_RTC_SUBTYPE_STRICT >>> VIRTIO_RTC_SUBTYPE_UNDEFINED /* or whatever you want to call it */ >>> VIRTIO_RTC_SUBTYPE_SMEAR_NOON_LINEAR >>> VIRTIO_RTC_SUBTYPE_UTC_SLS /* if it's worth doing this one */ >>> >>> Is that what we just agreed on? >>> >>> >> >> This is a misunderstanding. My idea was that the main types are >> >>> VIRTIO_RTC_CLOCK_UTC == 0 >>> VIRTIO_RTC_CLOCK_TAI == 1 >>> VIRTIO_RTC_CLOCK_MONOTONIC == 2 >>> VIRTIO_RTC_CLOCK_SMEARED_UTC == 3 >> >> VIRTIO_RTC_CLOCK_MAYBE_SMEARED_UTC == 4 >> >> The subtypes would be (1st for clocks other than >> VIRTIO_RTC_CLOCK_SMEARED_UTC, 2nd to last for >> VIRTIO_RTC_CLOCK_SMEARED_UTC): >> >> #define VIRTIO_RTC_SUBTYPE_STRICT 0 >> #define VIRTIO_RTC_SUBTYPE_SMEAR_NOON_LINEAR 1 >> #define VIRTIO_RTC_SUBTYPE_SMEAR_UTC_SLS 2 >> > > Thanks. I really do think that from the guest point of view there's really no > distinction between "maybe smeared" and "undefined smearing", and have a > preference for using the latter form, which is the key difference there? > > Again though, not a hill for me to die on. I have no issue with staying with "undefined smearing", so would you agree to something like VIRTIO_RTC_CLOCK_SMEAR_UNDEFINED_UTC == 4 (or another name if you prefer)?
Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support
On 02.07.24 18:39, David Woodhouse wrote: > On Tue, 2024-07-02 at 17:03 +0200, Peter Hilber wrote: >>> On 01.07.24 10:57, David Woodhouse wrote: >>>>> If my proposed memory structure is subsumed into the virtio-rtc >>>>> proposal we'd literally use the same names, but for the time being I'll >>>>> update mine to: >>> >>> Do you intend vmclock and virtio-rtc to be ABI compatible? > > I intend you to incorporate a shared memory structure like the vmclock > one into the virtio-rtc standard :) > > Because precision clocks in a VM are fundamentally nonsensical without > a way for the guest to know when live migration has screwed them up. > > So yes, so facilitate that, I'm trying to align things so that the > numeric values of fields like the time_type and smearing hint are > *precisely* the same as the VIRTIO_RTC_xxx values. > > Time pressure *may* mean I have to ship an ACPI-based device with a > preliminary version of the structure before I've finished persuading > you, and before we've completely finalized the structure. I am hoping > to avoid that. > > (In fact, my time pressure only applies to a version of the structure > which carries the disruption_marker field; the actual clock calibration > information doesn't have to be present in the interim implementation.) > > >>> FYI, I see a >>> potential problem in that Virtio does avoid the use of signed integers so >>> far. I did not check carefully if there might be other problems, yet. > > Hm, you use an unsigned integer to convey the tai_offset. I suppose at > +37 and with a plan to stop doing leap seconds in the next decade, > we're unlikely to get back below zero? > I think so. > The other signed integer I had was for the leap second direction, but I > think I'm happy to drop that and just adopt your uint8_t leap field > with VIRTIO_RTC_LEAP_{PRE_POS,PRE_NEG,etc.}. > > > > > >>>>> >>>>> /* >>>>> * What time is exposed in the time_sec/time_frac_sec fields? >>>>> */ >>>>> uint8_t time_type; >>>>> #define VMCLOCK_TIME_UTC0 /* Since 1970-01-01 >>>>> 00:00:00z */ >>>>> #define VMCLOCK_TIME_TAI1 /* Since 1970-01-01 >>>>> 00:00:00z */ >>>>> #define VMCLOCK_TIME_MONOTONIC 2 /* Since undefined epoch >>>>> */ >>>>> #define VMCLOCK_TIME_INVALID3 /* virtio-rtc uses this >>>>> for smeared UTC */ >>>>> >>>>> >>>>> I can then use your smearing subtype values as the 'hint' field in the >>>>> shared memory structure. You currently have: >>>>> >>>>> +\begin{lstlisting} >>>>> +#define VIRTIO_RTC_SUBTYPE_STRICT 0 >>>>> +#define VIRTIO_RTC_SUBTYPE_SMEAR 1 >>>>> +#define VIRTIO_RTC_SUBTYPE_SMEAR_NOON_LINEAR 2 >>>>> +#define VIRTIO_RTC_SUBTYPE_LEAP_UNSPECIFIED 3 >>>>> +\end{lstlisting} >>>>> >>> >>> I agree with the above part of your proposal. >>> >>>>> I can certainly ensure that 'noon linear' has the same value. I don't >>>>> think you need both 'SMEAR' and 'LEAP_UNSPECIFIED' though: >>>>> >>>>> >>>>> +\item VIRTIO_RTC_SUBTYPE_SMEAR deviates from the UTC standard by >>>>> + smearing time in the vicinity of the leap second, in a not >>>>> + precisely defined manner. This avoids clock steps due to UTC >>>>> + leap seconds. >>>>> >>>>> ... >>>>> >>>>> +\item VIRTIO_RTC_SUBTYPE_LEAP_UNSPECIFIED may deviate from the UTC >>>>> + standard w.r.t.\ leap second introduction in an unspecified >>>>> way >>>>> + (leap seconds may, or may not, be smeared). >>>>> >>>>> To the client, both of those just mean "for a day or so around a leap >>>>> second event, you can't trust this device to know what the time is". >>>>> There isn't any point in separating "does lie to you" from "might lie >>>>> to you", surely? The guest can't do anything useful with that >>>>> distinction. Let's drop SMEAR and keep only LEAP_UNSPECIFIED? >>> >>> As for VIRTIO_RTC_SUBTYPE_SMEAR, I think this c
Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support
On 01.07.24 10:57, David Woodhouse wrote: > On Fri, 2024-06-28 at 22:27 +0100, David Woodhouse wrote: >> On 28 June 2024 17:38:15 BST, Peter Hilber >> wrote: >>> On 28.06.24 14:15, David Woodhouse wrote: >>>> On Fri, 2024-06-28 at 13:33 +0200, Peter Hilber wrote: >>>>> On 27.06.24 16:52, David Woodhouse wrote: >>>>>> I already added a flags field, so this might look something like: >>>>>> >>>>>> /* >>>>>> * Smearing flags. The UTC clock exposed through this structure >>>>>> * is only ever true UTC, but a guest operating system may >>>>>> * choose to offer a monotonic smeared clock to its users. This >>>>>> * merely offers a hint about what kind of smearing to perform, >>>>>> * for consistency with systems in the nearby environment. >>>>>> */ >>>>>> #define VMCLOCK_FLAGS_SMEAR_UTC_SLS (1<<5) /* >>>>>> draft-kuhn-leapsecond-00.txt */ >>>>>> >>>>>> (UTC-SLS is probably a bad example but are there formal definitions for >>>>>> anything else?) >>>>> >>>>> I think it could also be more generic, like flags for linear smearing, >>>>> cosine smearing(?), and smear_start_sec and smear_end_sec fields (relative >>>>> to the leap second start). That could also represent UTC-SLS, and >>>>> noon-to-noon, and it would be well-defined. >>>>> >>>>> This should reduce the likelihood that the guest doesn't know the smearing >>>>> variant. >>>> >>>> I'm wary of making it too generic. That would seem to encourage a >>>> *proliferation* of false "UTC-like" clocks. >>>> >>>> It's bad enough that we do smearing at all, let alone that we don't >>>> have a single definition of how to do it. >>>> >>>> I made the smearing hint a full uint8_t instead of using bits in flags, >>>> in the end. That gives us a full 255 ways of lying to users about what >>>> the time is, so we're unlikely to run out. And it's easy enough to add >>>> a new VMCLOCK_SMEARING_XXX type to the 'registry' for any new methods >>>> that get invented. >>>> >>>> >>> >>> My concern is that the registry update may come after a driver has already >>> been implemented, so that it may be hard to ensure that the smearing which >>> has been chosen is actually implemented. >> >> Well yes, but why in the name of all that is holy would anyone want >> to invent *new* ways to lie to users about the time? If we capture >> the existing ones as we write this, surely it's a good thing that >> there's a barrier to entry for adding more? > > Ultimately though, this isn't the hill for me to die on. I'm pushing on > that topic because I want to avoid the proliferation of *ambiguity*. If > we have a precision clock, we should *know* what the time is. > > So how about this proposal. I line up the fields in the proposed shared > memory structure to match your virtio-rtc proposal, using 'subtype' as > you proposed. But, instead of the 'subtype' being valid only for > VIRTIO_RTC_CLOCK_UTC, we define a new top-level type for *smeared* UTC. > > So, you have: > > +\begin{lstlisting} > +#define VIRTIO_RTC_CLOCK_UTC 0 > +#define VIRTIO_RTC_CLOCK_TAI 1 > +#define VIRTIO_RTC_CLOCK_MONO 2 > +\end{lstlisting} > > I propose that you add > > #define VIRTIO_RTC_CLOCK_SMEARED_UTC 3 > > If my proposed memory structure is subsumed into the virtio-rtc > proposal we'd literally use the same names, but for the time being I'll > update mine to: Do you intend vmclock and virtio-rtc to be ABI compatible? FYI, I see a potential problem in that Virtio does avoid the use of signed integers so far. I did not check carefully if there might be other problems, yet. > > /* >* What time is exposed in the time_sec/time_frac_sec fields? >*/ > uint8_t time_type; > #define VMCLOCK_TIME_UTC 0 /* Since 1970-01-01 00:00:00z */ > #define VMCLOCK_TIME_TAI 1 /* Since 1970-01-01 00:00:00z */ > #define VMCLOCK_TIME_MONOTONIC2 /* Since undefined > epoch */ > #define VMCLOCK_TIME_INVALID 3 /* virtio-rtc uses this for > smeared UTC */ > > > I can then use your smearing subtype values as the &
Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support
On 28.06.24 14:15, David Woodhouse wrote: > On Fri, 2024-06-28 at 13:33 +0200, Peter Hilber wrote: >> On 27.06.24 16:52, David Woodhouse wrote: >>> I already added a flags field, so this might look something like: >>> >>> /* >>> * Smearing flags. The UTC clock exposed through this structure >>> * is only ever true UTC, but a guest operating system may >>> * choose to offer a monotonic smeared clock to its users. This >>> * merely offers a hint about what kind of smearing to perform, >>> * for consistency with systems in the nearby environment. >>> */ >>> #define VMCLOCK_FLAGS_SMEAR_UTC_SLS (1<<5) /* draft-kuhn-leapsecond-00.txt >>> */ >>> >>> (UTC-SLS is probably a bad example but are there formal definitions for >>> anything else?) >> >> I think it could also be more generic, like flags for linear smearing, >> cosine smearing(?), and smear_start_sec and smear_end_sec fields (relative >> to the leap second start). That could also represent UTC-SLS, and >> noon-to-noon, and it would be well-defined. >> >> This should reduce the likelihood that the guest doesn't know the smearing >> variant. > > I'm wary of making it too generic. That would seem to encourage a > *proliferation* of false "UTC-like" clocks. > > It's bad enough that we do smearing at all, let alone that we don't > have a single definition of how to do it. > > I made the smearing hint a full uint8_t instead of using bits in flags, > in the end. That gives us a full 255 ways of lying to users about what > the time is, so we're unlikely to run out. And it's easy enough to add > a new VMCLOCK_SMEARING_XXX type to the 'registry' for any new methods > that get invented. > > My concern is that the registry update may come after a driver has already been implemented, so that it may be hard to ensure that the smearing which has been chosen is actually implemented. >>>>> + /* >>>>> + * This field changes to another non-repeating value when the CPU >>>>> + * counter is disrupted, for example on live migration. >>>>> + */ >>>>> + uint64_t disruption_marker; >>>> >>>> The field could also change when the clock is stepped (leap seconds >>>> excepted), or when the clock frequency is slewed. >>> >>> I'm not sure. The concept of the disruption marker is that it tells the >>> guest to throw away any calibration of the counter that the guest has >>> done for *itself* (with NTP, other PTP devices, etc.). >>> >>> One mode for this device would be not to populate the clock fields at >>> all, but *only* to signal disruption when it occurs. So the guest can >>> abort transactions until it's resynced its clocks (to avoid incurring >>> fines if breaking databases, etc.). >>> >>> Exposing the host timekeeping through the structure means that the >>> migrated guest can keep working because it can trust the timekeeping >>> performed by the (new) host and exposed to it. >>> >>> If the counter is actually varying in frequency over time, and the host >>> is slewing the clock frequency that it reports, that *isn't* a step >>> change and doesn't mean that the guest should throw away any >>> calibration that it's been doing for itself. One hopes that the guest >>> would have detected the *same* frequency change, and be adapting for >>> itself. So I don't think that should indicate a disruption. >>> >>> I think the same is even true if the clock is stepped by the host. The >>> actual *counter* hasn't changed, so the guest is better off ignoring >>> the vacillating host and continuing to derive its idea of time from the >>> hardware counter itself, as calibrated against some external NTP/PTP >>> sources. Surely we actively *don't* to tell the guest to throw its own >>> calibrations away, in this case? >> >> In case the guest is also considering other time sources, it might indeed >> not be a good idea to mix host clock changes into the hardware counter >> disruption marker. >> >> But if the vmclock is the authoritative source of time, it can still be >> helpful to know about such changes, maybe through another marker. > > Could that be the existing seq_count field? > > Skewing the counter_period_frac_sec as the underlying oscillator speeds > up and slows dow
Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support
On 27.06.24 18:03, David Woodhouse wrote: > > I've updated the tree at > https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/vmclock > (but not yet the qemu one). > > I think I've taken into account all your comments apart from the one > about non-64-bit counters wrapping. I reduced the seq_count to 32 bit > to make room for a 32-bit flags field, added the time type > (UTC/TAI/MONOTONIC) and a smearing hint, with some straw man > definitions for smearing algorithms for which I could actually find > definitions. > > The structure now looks like this: > > > struct vmclock_abi { [...] > > /* >* What time is exposed in the time_sec/time_frac_sec fields? >*/ > uint8_t time_type; > #define VMCLOCK_TIME_UNKNOWN 0 /* Invalid / no time exposed */ > #define VMCLOCK_TIME_UTC 1 /* Since 1970-01-01 00:00:00z */ > #define VMCLOCK_TIME_TAI 2 /* Since 1970-01-01 00:00:00z */ > #define VMCLOCK_TIME_MONOTONIC3 /* Since undefined > epoch */ > > /* Bit shift for counter_period_frac_sec and its error rate */ > uint8_t counter_period_shift; > > /* >* Unlike in NTP, this can indicate a leap second in the past. This >* is needed to allow guests to derive an imprecise clock with >* smeared leap seconds for themselves, as some modes of smearing >* need the adjustments to continue even after the moment at which >* the leap second should have occurred. >*/ > int8_t leapsecond_direction; > uint64_t leapsecond_tai_sec; /* Since 1970-01-01 00:00:00z */ > > /* >* Paired values of counter and UTC at a given point in time. >*/ > uint64_t counter_value; > uint64_t time_sec; /* Since 1970-01-01 00:00:00z */ Nitpick: The comment is not valid any more for TIME_MONOTONIC.
Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support
On 27.06.24 16:52, David Woodhouse wrote: > On Thu, 2024-06-27 at 15:50 +0200, Peter Hilber wrote: >> On 25.06.24 21:01, David Woodhouse wrote: >>> From: David Woodhouse >>> >>> The vmclock "device" provides a shared memory region with precision clock >>> information. By using shared memory, it is safe across Live Migration. >>> >>> Like the KVM PTP clock, this can convert TSC-based cross timestamps into >>> KVM clock values. Unlike the KVM PTP clock, it does so only when such is >>> actually helpful. >>> >>> The memory region of the device is also exposed to userspace so it can be >>> read or memory mapped by application which need reliable notification of >>> clock disruptions. >>> >>> Signed-off-by: David Woodhouse >>> --- >>> >>> v2: >>> • Add gettimex64() support >>> • Convert TSC values to KVM clock when appropriate >>> • Require int128 support >>> • Add counter_period_shift >>> • Add timeout when seq_count is invalid >>> • Add flags field >>> • Better comments in vmclock ABI structure >>> • Explicitly forbid smearing (as clock rates would need to change) >> >> Leap second smearing information could still be conveyed through the >> vmclock_abi. AFAIU, to cover the popular smearing variants, it should be >> enough to indicate whether the driver should apply linear or cosine >> smearing, and the start time and end time. > > Yes. The clock information actually conveyed through the {counter, > time, rate} tuple should never be smeared, and should only ever be UTC. > > But we could provide a hint to the guest operating system about what > type of smearing to perform, *if* it chooses to offer a clock other > than the standard CLOCK_REALTIME to its users. > > I already added a flags field, so this might look something like: > > /* > * Smearing flags. The UTC clock exposed through this structure > * is only ever true UTC, but a guest operating system may > * choose to offer a monotonic smeared clock to its users. This > * merely offers a hint about what kind of smearing to perform, > * for consistency with systems in the nearby environment. > */ > #define VMCLOCK_FLAGS_SMEAR_UTC_SLS (1<<5) /* draft-kuhn-leapsecond-00.txt */ > > > (UTC-SLS is probably a bad example but are there formal definitions for > anything else?) > > I think it could also be more generic, like flags for linear smearing, cosine smearing(?), and smear_start_sec and smear_end_sec fields (relative to the leap second start). That could also represent UTC-SLS, and noon-to-noon, and it would be well-defined. This should reduce the likelihood that the guest doesn't know the smearing variant. [...] >>> diff --git a/include/uapi/linux/vmclock.h b/include/uapi/linux/vmclock.h >>> new file mode 100644 >>> index ..cf0f22205e79 >>> --- /dev/null >>> +++ b/include/uapi/linux/vmclock.h >>> @@ -0,0 +1,138 @@ >>> +/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR >>> BSD-2-Clause) */ >>> + >>> +/* >>> + * This structure provides a vDSO-style clock to VM guests, exposing the >>> + * relationship (or lack thereof) between the CPU clock (TSC, timebase, >>> arch >>> + * counter, etc.) and real time. It is designed to address the problem of >>> + * live migration, which other clock enlightenments do not. >>> + * >>> + * When a guest is live migrated, this affects the clock in two ways. >>> + * >>> + * First, even between identical hosts the actual frequency of the >>> underlying >>> + * counter will change within the tolerances of its specification >>> (typically >>> + * ±50PPM, or 4 seconds a day). The frequency also varies over time on the >>> + * same host, but can be tracked by NTP as it generally varies slowly. With >>> + * live migration there is a step change in the frequency, with no warning. >>> + * >>> + * Second, there may be a step change in the value of the counter itself, >>> as >>> + * its accuracy is limited by the precision of the NTP synchronization on >>> the >>> + * source and destination hosts. >>> + * >>> + * So any calibration (NTP, PTP, etc.) which the guest has done on the >>> source >>> + * host before migration is invalid, and needs to be redone on the new >>> host. >>> + * >>> + * In its most basic mode, this structure pro
Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support
On 25.06.24 21:01, David Woodhouse wrote: > From: David Woodhouse > > The vmclock "device" provides a shared memory region with precision clock > information. By using shared memory, it is safe across Live Migration. > > Like the KVM PTP clock, this can convert TSC-based cross timestamps into > KVM clock values. Unlike the KVM PTP clock, it does so only when such is > actually helpful. > > The memory region of the device is also exposed to userspace so it can be > read or memory mapped by application which need reliable notification of > clock disruptions. > > Signed-off-by: David Woodhouse > --- > > v2: > • Add gettimex64() support > • Convert TSC values to KVM clock when appropriate > • Require int128 support > • Add counter_period_shift > • Add timeout when seq_count is invalid > • Add flags field > • Better comments in vmclock ABI structure > • Explicitly forbid smearing (as clock rates would need to change) Leap second smearing information could still be conveyed through the vmclock_abi. AFAIU, to cover the popular smearing variants, it should be enough to indicate whether the driver should apply linear or cosine smearing, and the start time and end time. > > drivers/ptp/Kconfig | 13 + > drivers/ptp/Makefile | 1 + > drivers/ptp/ptp_vmclock.c| 516 +++ > include/uapi/linux/vmclock.h | 138 ++ > 4 files changed, 668 insertions(+) > create mode 100644 drivers/ptp/ptp_vmclock.c > create mode 100644 include/uapi/linux/vmclock.h > [...] > + > +/* > + * Multiply a 64-bit count by a 64-bit tick 'period' in units of seconds >> > 64 > + * and add the fractional second part of the reference time. > + * > + * The result is a 128-bit value, the top 64 bits of which are seconds, and > + * the low 64 bits are (seconds >> 64). > + * > + * If __int128 isn't available, perform the calculation 32 bits at a time to > + * avoid overflow. > + */ > +static inline uint64_t mul_u64_u64_shr_add_u64(uint64_t *res_hi, uint64_t > delta, > +uint64_t period, uint8_t shift, > +uint64_t frac_sec) > +{ > + unsigned __int128 res = (unsigned __int128)delta * period; > + > + res >>= shift; > + res += frac_sec; > + *res_hi = res >> 64; > + return (uint64_t)res; > +} > + > +static int vmclock_get_crosststamp(struct vmclock_state *st, > +struct ptp_system_timestamp *sts, > +struct system_counterval_t *system_counter, > +struct timespec64 *tspec) > +{ > + ktime_t deadline = ktime_add(ktime_get(), VMCLOCK_MAX_WAIT); > + struct system_time_snapshot systime_snapshot; > + uint64_t cycle, delta, seq, frac_sec; > + > +#ifdef CONFIG_X86 > + /* > + * We'd expect the hypervisor to know this and to report the clock > + * status as VMCLOCK_STATUS_UNRELIABLE. But be paranoid. > + */ > + if (check_tsc_unstable()) > + return -EINVAL; > +#endif > + > + while (1) { > + seq = st->clk->seq_count & ~1ULL; > + virt_rmb(); > + > + if (st->clk->clock_status == VMCLOCK_STATUS_UNRELIABLE) > + return -EINVAL; > + > + /* > + * When invoked for gettimex64(), fill in the pre/post system > + * times. The simple case is when system time is based on the > + * same counter as st->cs_id, in which case all three times > + * will be derived from the *same* counter value. > + * > + * If the system isn't using the same counter, then the value > + * from ktime_get_snapshot() will still be used as pre_ts, and > + * ptp_read_system_postts() is called to populate postts after > + * calling get_cycles(). > + * > + * The conversion to timespec64 happens further down, outside > + * the seq_count loop. > + */ > + if (sts) { > + ktime_get_snapshot(&systime_snapshot); > + if (systime_snapshot.cs_id == st->cs_id) { > + cycle = systime_snapshot.cycles; > + } else { > + cycle = get_cycles(); > + ptp_read_system_postts(sts); > + } > + } else > + cycle = get_cycles(); > + > + delta = cycle - st->clk->counter_value; AFAIU in the general case this needs to be masked for non 64-bit counters. > + > + frac_sec = mul_u64_u64_shr_add_u64(&tspec->tv_sec, delta, > + > st->clk->counter_period_frac_sec, > + > st->clk->counter_period_shift, > +st->clk->utc_ti
Re: [RFC PATCH v3 0/7] Add virtio_rtc module and related changes
On 21.06.24 10:45, David Woodhouse wrote: > On Thu, 2024-06-20 at 17:19 +0100, David Woodhouse wrote: >> >>> + + /* Counter frequency, and error margin. Units of (second >> 64) */ + uint64_t counter_period_frac_sec; >>> >>> AFAIU this might limit the precision in case of high counter frequencies. >>> Could the unit be aligned to the expected frequency band of counters? >> >> This field indicates the period of a single tick, in units of 1>>64 of >> a second. That's about 5.4e-20 seconds, or 54 zeptoseconds? >> >> Can you walk me through a calculation where you believe that level of >> precision is insufficient? >> >> I guess the precision matters if the structure isn't updated for a long >> period of time, and the delta between the current counter and the >> snapshot is high? That's a *lot* of 54 zeptosecondses? But you really >> would need a *lot* of them before you care? And if nobody's been >> calibrating your counter for that long, surely you have bigger worries? >> >> Am I missing something there? > > Hm, that was a bit rushed at the end of the day; let's take a better look... > > Let's take a hypothetical example of a 100GHz counter. That's two > orders of magnitude more than today's Arm arch counter. > > The period of such a counter would be 10 picoseconds. > > (Let's ignore the question of how far light actually travels in that > time and how *realistic* that example is, for the moment.) > > It turns out that at that rate, there *are* a lot of 54 zeptosecondses > of precision loss in the day. It could be half a millisecond a day, or > 20µs an hour. > > That particular example of 10 picoseconds is 184467440.7370955 > (seconds>>64) which could be truncated to 184467440 — losing about 4PPB > (a third of a millisecond a day; 14µs an hour). > > So yeah, I suppose a 'shift' field could make sense. It's easy enough > to consume on the guest side as it doesn't really perturb the 128-bit > multiplication very much; especially if we don't let it be negative. > > And implementations *can* just set it to zero. It hurts nobody. > > Or were you thinking of just using a fixed shift like (seconds>>80) > instead? The 'shift' field should be fine.
Re: [RFC PATCH v3 0/7] Add virtio_rtc module and related changes
Changing virtio-dev address to the new one. The discussion might also be relevant for virtio-comment, but it is discouraged to forward it to both. On 15.06.24 10:40, David Woodhouse wrote: > On Mon, 2023-12-18 at 08:38 +0100, Peter Hilber wrote: >> RFC v3 updates >> -- >> >> This series implements a driver for a virtio-rtc device conforming to spec >> RFC v3 [1]. It now includes an RTC class driver with alarm, in addition to >> the PTP clock driver already present before. >> >> This patch series depends on the patch series "treewide: Use clocksource id >> for get_device_system_crosststamp()" [3]. Pull [4] to get the combined >> series on top of mainline. >> >> Overview >> >> >> This patch series adds the virtio_rtc module, and related bugfixes. The >> virtio_rtc module implements a driver compatible with the proposed Virtio >> RTC device specification [1]. The Virtio RTC (Real Time Clock) device >> provides information about current time. The device can provide different >> clocks, e.g. for the UTC or TAI time standards, or for physical time >> elapsed since some past epoch. The driver can read the clocks with simple >> or more accurate methods. Optionally, the driver can set an alarm. >> >> The series first fixes some bugs in the get_device_system_crosststamp() >> interpolation code, which is required for reliable virtio_rtc operation. >> Then, add the virtio_rtc implementation. >> >> For the Virtio RTC device, there is currently a proprietary implementation, >> which has been used for provisional testing. > > As discussed before, I don't think it makes sense to design a new high- > precision virtual clock which only gets it right *most* of the time. We > absolutely need to address the issue of live migration. > > When live migration occurs, the guest's time precision suffers in two > ways. > > First, even when migrating to a supposedly identical host the precise > rate of the underlying counter (TSC, arch counter, etc.) can change > within the tolerances (e.g. ±50PPM) of the hardware. Unlike the natural > changes that NTP normally manages to track, this is a *step* change, > potentially from -50PPM to +50PPM from one host to the next. > > Second, the accuracy of the counter as preserved across migration is > limited by the accuracy of each host's NTP synchronization. So there is > also a step change in the value of the counter itself. > > At the moment of migration, the guest's timekeeping should be > considered invalid. Any previous cross-timestamps are meaningless, and > blindly using the previously-known relationship between the counter and > real time can lead to problems such as corruption in distributed > databases, fines for mis-timestamped transactions, and other general > unhappiness. > > We obviously can't get a new timestamp from the virtio_rtc device every > time an application wants to know the time reliably. We don't even want > *system* calls for that, which is why we have it in a vDSO. > > We can take the same approach to warning the guest about clock > disruption due to live migration. A shared memory region from the > virtual clock device even be mapped all the way to userspace, for those > applications which need precise and *reliable* time to check a > 'disruption' marker in it, and do whatever is appropriate (e.g. abort > transactions and wait for time to resync) when it happens. > > We can do better than just letting the guest know about disruption, of > course. We can put the actual counter→realtime relationship into the > memory region too. As well as being able to provide a PTP driver with > this, the applications which care about *reliable* timestamps can mmap > the page directly and use it, vDSO-style, to have accurate timestamps > even from the first cycle after migration. > > When disruption is signalled, the guest needs to throw away any > *additional* refinement that it's done with NTP/PTP/PPS/etc. and revert > to knowing nothing more than what the hypervisor advertises here. > > Here's a first attempt at defining such a memory structure. For now > I've done it as a "vmclock" ACPI device based loosely on vmgenid, but I > think it makes most sense for this to be part of the virtio_rtc spec. > Ultimately it doesn't matter *how* the guest finds the memory region. This looks sensible to me. I also think it would be possible to adapt this for the virtio-rtc spec. The proposal also supports some other use cases which are not in the virtio-rtc RFC spec v4 [2], notably vDSO-style clock reading, and others such as indication of a past leap second. Comp
Re: [RFC PATCH v3 6/7] virtio_rtc: Add Arm Generic Timer cross-timestamping
Changing virtio-dev address to the new one. On 15.06.24 09:50, David Woodhouse wrote: > On Mon, 2023-12-18 at 08:38 +0100, Peter Hilber wrote: >> >> +int viortc_hw_xtstamp_params(u16 *hw_counter, enum clocksource_ids *cs_id) >> +{ >> + *hw_counter = VIRTIO_RTC_COUNTER_ARM_VIRT; > > Hm, but what if it isn't? I think you need to put this in > drivers/clocksource/arm_arch_timer.c where it can do something like > kvm_arch_ptp_get_crosststamp() does to decide: > > if (arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI) > ptp_counter = KVM_PTP_VIRT_COUNTER; > else > ptp_counter = KVM_PTP_PHYS_COUNTER; > > >> + *cs_id = CSID_ARM_ARCH_COUNTER; >> + >> + return 0; >> +} > I had such a check in v1, but Marc Zyngier didn't like the distinction [1, 2] - maybe also exposing the timer through a generic helper was not desired. Quoting from [2]: >> This was the rationale to come up with the physical/virtual counter >> distinction for the Virtio RTC device. Looking at extensions such as >> FEAT_ECV, where the CNTPCT_EL0 value can depend on the EL, or FEAT_NV*, >> it might be a bit simplistic. > > Not just simplistic. It doesn't make sense. For this to work, you'd > need to know the global offset that KVM applies to the global counter, > plus the *virtualised* CNTPOFF/CNTVOFF values that the guest can > change at any time on a *per-CPU* basis. None of that is available > outside of KVM, nor would it make any sense anyway. > >> Does this physical/virtual counter distinction sound like a good idea? >> Otherwise I would drop the arch_timer_counter_get_type() in the next >> iteration. > > My take on this is that only the global counter value makes any sense. > That value is already available from the host as the virtual counter, > because we guarantee that CNTVOFF is 0 when outside of the guest > (well, technically, outside of the vcpu_load/vcpu_put section). So I put the assumption that the virtual counter will always be the right choice for current Linux kernels, from the Virtio device POV. Thanks for the comment, Peter [1] https://lore.kernel.org/lkml/20230630171052.985577-1-peter.hil...@opensynergy.com/T/#ma8d596de1cbc8f4a78a18b2aa995db18423494a7 [2] https://lore.kernel.org/lkml/20230630171052.985577-1-peter.hil...@opensynergy.com/T/#m65fa1d715933360498c4e33d7225e4220215a9d6
Re: [RFC PATCH v3 5/7] virtio_rtc: Add PTP clocks
On 15.06.24 10:01, David Woodhouse wrote: > On Mon, 2023-12-18 at 08:38 +0100, Peter Hilber wrote: >> >> + ret = viortc_hw_xtstamp_params(&hw_counter, &cs_id); >> + if (ret) >> + return ret; >> + >> + ktime_get_snapshot(&history_begin); >> + if (history_begin.cs_id != cs_id) >> + return -EOPNOTSUPP; > > I think you have to call ktime_get_snapshot() anyway to get a snapshot > from before your crosststamp? But I still don't much like the fact that > you need to use it to work out which cs_id is being used. The actual cs_id check is in get_device_system_crosststamp(), where it was added recently [1]. So this additional check is just verifying that the history_begin is usable. > > Shouldn't get_device_system_crosststamp() pass that to its get_time_fn > as a hint? This is unneeded in this case, since get_device_system_crosststamp() does the check already (but the driver is free to pass it through the get_time_fn parameter ctx). > > On x86, you are likely to find that history_begin.cs_id is the KVM > clock, so this will return -EOPNOTSUPP and userspace will have to fall > back to PTP_SYS_OFFSET. I note the KVM PTP clock actually *converts* a > TSC-based crosststamp to kvmclock µs for itself, so that it can report > *cs_id = CSID_X86_KVM_CLK. Not sure how I feel about that though. I'm > inclined to suggest that it shouldn't, as anyone who wants accurate > timekeeping shouldn't be using the KVM clock anyway. > > But we should at least be relatively consistent about it. ATM, the driver does indeed not have TSC support (for cross-timestamping) enabled at all, so would always use fallback. If *not* using the KVM clock, I think TSC can just be enabled by adding architecture-specific code similar to virtio_rtc_arm.c. I am not familiar with the KVM clock, but maybe it would be sufficient to allow CSID_X86_KVM_CLK as well? Thanks for the comments, Peter [1] https://git.kernel.org/torvalds/c/4b7f521229ef
Re: [RFC PATCH v3 0/7] Add virtio_rtc module and related changes
While the virtio-comment list is not available, now also CC'ing Parav, which may be interested in this virtio-rtc spec related discussion thread. On 14.03.24 15:19, David Woodhouse wrote: > On 14 March 2024 11:13:37 CET, Peter Hilber > wrote: >>> To a certain extent, as long as the virtio-rtc device is designed to expose >>> time precisely and unambiguously, it's less important if the Linux kernel >>> *today* can use that. Although of course we should strive for that. Let's >>> be...well, *unambiguous*, I suppose... that we've changed topics to discuss >>> that though. >>> >> >> As Virtio is extensible (unlike hardware), my approach is to mostly specify >> only what also has a PoC user and a use case. > > If we get memory-mapped (X, Y, Z, ±x, ±y) I'll have a user and a use case on > day one. Otherwise, as I said in my first response, I can go do that as a > separate device and decide that virtio_rtc doesn't meet our needs (especially > for maintaining accuracy over LM). We plan to add - leap second indication, - UTC-to-TAI offset, - clock smearing indication (including the noon-to-noon linear smearing variant which seems to be somewhat popular), and - clock accuracy indication to the initial spec and to the PoC implementation. However, due to resource restrictions, we cannot ourselves add the memory-mapped clock to the initial spec. Everyone is very welcome to contribute the memory-mapped clock to the spec, and I think it might then still make it to the initial version. > > My main concern for virto_rtc is that we avoid *ambiguity*. Yes, I get that > it's extensible but we don't want a v1.0 of the spec, implemented by various > hypervisors, which still leaves guests not knowing what the actual time is. > That would not be good. And even UTC without a leap second indicator has that > problem. Agreed. That should be addressed by the above changes. Best regards, Peter
Re: [RFC PATCH v3 0/7] Add virtio_rtc module and related changes
Now CC'ing the previous commenters to the virtio-rtc spec draft, since this discussion is mostly about the spec, and the Virtio mailing lists still seem to be in a migration hiatus... On 13.03.24 19:18, David Woodhouse wrote: > On 13 March 2024 17:50:48 GMT, Peter Hilber > wrote: >> On 13.03.24 13:45, David Woodhouse wrote: >>> Surely the whole point of this effort is to provide guests with precise >>> and *unambiguous* knowledge of what the time is? >> >> I would say, a fundamental point of this effort is to enable such >> implementations, and to detect if a device is promising to support this. >> >> Where we might differ is as to whether the Virtio clock *for every >> implementation* has to be *continuously* accurate w.r.t. a time standard, >> or whether *for some implementations* it could be enough that all guests in >> the local system have the same, precise local notion of time, which might >> be off from the actual time standard. > > That makes sense, but remember I don't just want {X, Y, Z} but *also* the > error bounds of ±deltaY and ±deltaZ too. > > So your example just boils down to "I'm calling it UTC, and it's really > precise, but we make no promises about its *accuracy*". And that's fine. > >> Also, cf. ptp_kvm, which AFAIU doesn't address leap seconds at all... > > KVM is not an exemplar of good time practices. > Not in *any* respect :) > >> With your described use case the UTC_SMEARED clock should of course not be >> used. The UTC_SMEARED clock would get a distinct name through udev, like >> /dev/ptp_virtio_utc_smeared, so the incompatibility could at least be >> detected. > > As long as it's clear to all concerned that this is fundamentally not usable > as an accurate time source, and is only for the local-sync case you > described, sure. > >>> Using UTC is bad enough, because for a UTC timestamp in the middle of a >>> leap second the guest can't know know *which* occurrence of that leap >>> second it is, so it might be wrong by a second. To resolve that >>> ambiguity needs a leap indicator and/or tai_offset field. >> >> I agree that virtio-rtc should communicate this. The question is, what >> exactly, and for which clock read request? > > Are we now conflating software architecture (and Linux in particular) with > "hardware" design? > > To a certain extent, as long as the virtio-rtc device is designed to expose > time precisely and unambiguously, it's less important if the Linux kernel > *today* can use that. Although of course we should strive for that. Let's > be...well, *unambiguous*, I suppose... that we've changed topics to discuss > that though. > As Virtio is extensible (unlike hardware), my approach is to mostly specify only what also has a PoC user and a use case. >> As for PTP clocks: >> >> - It doesn't fit into the ioctl PTP_SYS_OFFSET_PRECISE2. >> >> - The clock_adjtime(2) tai_offset and return value could be set (if >> upstream will accept this). Would this help? As discussed, user space >> would need to interpret this (and currently no dynamic POSIX clock sets >> this). > > Hm, maybe? > > >>>> I think I can add a SHOULD requirement which vaguely refers to vCPU 0, or >>>> boot vCPU. But the Virtio device is not necessarily hosted by a hypervisor, >>>> so the device might not even know which vCPUs there are. E.g. there is even >>>> interest to make virtio-rtc work as part of the virtio-net device (which >>>> might be implemented in hardware). >>> >>> Sure, but those implementations aren't going to offer the TSC pairing >>> at all, are they? >>> >> >> They could offer an Intel ART pairing (some physical PTP NICs are already >> doing this, look for the convert_art_to_tsc() users). > > Right, but isn't that software's problem? The time pairing is defined against > the ART in that case. My point was that such a device would then not necessarily have an idea what vCPU 0 is. But let's just say that this will be phrased as a SHOULD best-effort requirement anyway. Thanks for the comments, Peter
Re: [RFC PATCH v3 0/7] Add virtio_rtc module and related changes
On 13.03.24 21:12, Andrew Lunn wrote: >> As long as it doesn't behave differently from the other RTC, I'm fine >> with this. This is important because I don't want to carry any special >> infrastructure for this driver or to have to special case this driver >> later on because it is incompatible with some evolution of the >> subsystem. > > Maybe deliberately throw away all the sub-second accuracy when > accessing the time via the RTC API? That helps to make it look like an > RTC. And when doing the rounding, add a constant offset of 10ms to > emulate the virtual i2c bus it is hanging off, like most RTCs? > > Andrew The truncating to whole seconds is already done. As to the offset, I do not understand how this would help. I can read out my CMOS RTC in ~50 us. Thanks for the comment, Peter
Re: [RFC PATCH v3 0/7] Add virtio_rtc module and related changes
On 13.03.24 15:06, David Woodhouse wrote: > On Wed, 2024-03-13 at 13:58 +0100, Alexandre Belloni wrote: >> The TSC or whatever CPU counter/clock that is used to keep the system >> time is not an RTC, I don't get why it has to be exposed as such to the >> guests. PTP is fine and precise, RTC is not. > > Ah, I see. But the point of the virtio_rtc is not really to expose that > CPU counter. The point is to report the wallclock time, just like an > actual RTC. The real difference is the *precision*. > > The virtio_rtc device has a facility to *also* expose the counter, > because that's what we actually need to gain that precision... > > Applications don't read the RTC every time they want to know what the > time is. These days, they don't even make a system call; it's done > entirely in userspace mode. The kernel exposes some shared memory, > essentially saying "the counter was X at time Y, and runs at Z Hz". > Then applications just read the CPU counter and do some arithmetic. > > As we require more and more precision in the calibration, it becomes > important to get *paired* readings of the CPU counter and the wallclock > time at precisely the same moment. If the guest has to read one and > then the other, potentially taking interrupts, getting preempted and > suffering steal/SMI time in the middle, that introduces an error which > is increasingly significant as we increasingly care about precision. > > Peter's proposal exposes the pairs of {X,Y} and leaves *all* the guest > kernels having to repeat readings over time and perform the calibration > as the underlying hardware oscillator frequency (Z) drifts with > temperature. I'm trying to get him to let the hypervisor expose the > calibrated frequency Z too. Along with *error* bounds for ±δX and ±δZ. > Which aside from reducing the duplication of effort, will *also* fix > the problem of live migration where *all* those things suffer a step > change and leave the guest with an inaccurate clock but not knowing it. I am already convinced that this would work significantly better than the {X,Y} pair (but would be a bit more effort to implement): 1. when accessed by user space, obviously 2. when backing the PTP clock, it saves CPU time and makes non-paired reads more precise. I would just prefer to try upstreaming the {X,Y} pairing first. I think the {X,Y,Z...} pairing could be discussed and developed in parallel. Thanks for the comments, Peter
Re: [RFC PATCH v3 0/7] Add virtio_rtc module and related changes
On 13.03.24 13:45, David Woodhouse wrote: > On Wed, 2024-03-13 at 10:45 +0100, Peter Hilber wrote: >> On 12.03.24 18:15, David Woodhouse wrote: >>> On Mon, 2024-03-11 at 19:24 +0100, Peter Hilber wrote: >>>> On 08.03.24 13:33, David Woodhouse wrote: >>>>> On Fri, 2024-03-08 at 11:32 +0100, Peter Hilber wrote: >>>>>> On 07.03.24 15:02, David Woodhouse wrote: >>>>>>> Hm, should we allow UTC? If you tell me the time in UTC, then >>>>>>> (sometimes) I still don't actually know what the time is, because some >>>>>>> UTC seconds occur twice. UTC only makes sense if you provide the TAI >>>>>>> offset, surely? Should the virtio_rtc specification make it mandatory >>>>>>> to provide such? >>>>>>> >>>>>>> Otherwise you're just designing it to allow crappy hypervisors to >>>>>>> expose incomplete information. >>>>>>> >>>>>> >>>>>> Hi David, >>>>>> >>>>>> (adding virtio-comm...@lists.oasis-open.org for spec discussion), >>>>>> >>>>>> thank you for your insightful comments. I think I take a broadly similar >>>>>> view. The reason why the current spec and driver is like this is that I >>>>>> took a pragmatic approach at first and only included features which work >>>>>> out-of-the-box for the current Linux ecosystem. >>>>>> >>>>>> The current virtio_rtc features work similar to ptp_kvm, and therefore >>>>>> can work out-of-the-box with time sync daemons such as chrony. >>>>>> >>>>>> As of RFC spec v3, UTC clock only is allowed. If mandating a TAI clock >>>>>> as well, I am afraid that >>>>>> >>>>>> - in some (embedded) scenarios, the TAI clock may not be available >>>>>> >>>>>> - crappy hypervisors will pass off the UTC clock as the TAI clock. >>>>>> >>>>>> For the same reasons, I am also not sure about adding a *mandatory* TAI >>>>>> offset to each readout. I don't know user-space software which would >>>>>> leverage this already (at least not through the PTP clock interface). >>>>>> And why would such software not go straight for the TAI clock instead? >>>>>> >>>>>> How about adding a requirement to the spec that the virtio-rtc device >>>>>> SHOULD expose the TAI clock whenever it is available - would this >>>>>> address your concerns? >>>>> >>>>> I think that would be too easy for implementors to miss, or decide not >>>>> to obey. Or to get *wrong*, by exposing a TAI clock but actually >>>>> putting UTC in it. >>>>> >>>>> I think I prefer to mandate the tai_offset field with the UTC clock. >>>>> Crappy implementations will just set it to zero, but at least that >>>>> gives a clear signal to the guests that it's *their* problem to >>>>> resolve. >>>> >>>> To me there are some open questions regarding how this would work. Is there >>>> a use case for this with the v3 clock reading methods, or would it be >>>> enough to address this with the Virtio timekeeper? >>>> >>>> Looking at clock_adjtime(2), the tai_offset could be exposed, but probably >>>> best alongside some additional information about leap seconds. I am not >>>> aware about any user-space user. In addition, leap second smearing should >>>> also be addressed. >>>> >>> >>> Is there even a standard yet for leap-smearing? Will it be linear over >>> 1000 seconds like UTC-SLS? Or semi-raised-cosine over 24 hours, which I >>> think is what Google does? Meta does something different again, don't >>> they? >>> >>> Exposing UTC as the only clock reference is bad enough; when leap >>> seconds happen there's a whole second during which you don't *know* >>> which second it is. It seems odd to me, for a precision clock to be >>> deliberately ambiguous about what the time is! >> >> Just to be clear, the device can perfectly expose only a TAI reference >> clock (or both UTC and TAI), the spec is just completely open about this, >> as it tries to work for diverse use cases. > > As long as the guest *knows* what it's
Re: [RFC PATCH v3 0/7] Add virtio_rtc module and related changes
On 13.03.24 12:18, Alexandre Belloni wrote: > On 13/03/2024 10:45:54+0100, Peter Hilber wrote: >>> Exposing UTC as the only clock reference is bad enough; when leap >>> seconds happen there's a whole second during which you don't *know* >>> which second it is. It seems odd to me, for a precision clock to be >>> deliberately ambiguous about what the time is! >> >> Just to be clear, the device can perfectly expose only a TAI reference >> clock (or both UTC and TAI), the spec is just completely open about this, >> as it tries to work for diverse use cases. >> >>> >>> But if the virtio-rtc clock is defined as UTC and then expose something >>> *different* in it, that's even worse. You potentially end up providing >>> inaccurate time for a whole *day* leading up to the leap second. >>> >>> I think you're right that leap second smearing should be addressed. At >>> the very least, by making it clear that the virtio-rtc clock which >>> advertises UTC shall be used *only* for UTC, never UTC-SLS or any other >>> yet-to-be-defined variant. >>> >> >> Agreed. >> >>> Please make it explicit that any hypervisor which wants to advertise a >>> smeared clock shall define a new type which specifies the precise >>> smearing algorithm and cannot be conflated with the one you're defining >>> here. >>> >> >> I will add a requirement that the UTC clock can never have smeared/smoothed >> leap seconds. >> >> I think that not every vendor would bother to first add a definition of a >> smearing algorithm. Also, I think in some cases knowing the precise >> smearing algorithm might not be important (when having the same time as the >> hypervisor is enough and accuracy w.r.t. actual time is less important). >> >> So maybe I should add a VIRTIO_RTC_CLOCK_UTC_SMEARED clock type, which for >> now could catch every UTC-like clock which smears/smoothes leap seconds, >> where the vendor cannot be bothered to add the smearing algorithm to spec >> and implementations. >> > > I still don't know anything about virtio but under Linux, an RTC is > always UTC (or localtime when dual booting but let's not care) and never > accounts for leap seconds. Having an RTC and RTC driver behaving > differently would be super inconvenient. Why don't you leave this to > userspace? > > I guess I'm still questioning whether this is the correct interface to > expose the host system time instead of an actual RTC. > virtio_rtc only registers RTC class devices for virtio_rtc clock type UTC, so adding an UTC_SMEARED clock type would avoid leap seconds for the RTC class. But I understand that there are more concerns and I will re-consider. From my POV CLOCK_{REALTIME,MONOTONIC}_ALARM support is very important however. So the only alternative to me seems to be adding an alternative alarmtimer backend (and time synchronization through user space). Thanks for the comment, Peter
Re: [RFC PATCH v3 0/7] Add virtio_rtc module and related changes
On 12.03.24 18:15, David Woodhouse wrote: > On Mon, 2024-03-11 at 19:24 +0100, Peter Hilber wrote: >> On 08.03.24 13:33, David Woodhouse wrote: >>> On Fri, 2024-03-08 at 11:32 +0100, Peter Hilber wrote: >>>> On 07.03.24 15:02, David Woodhouse wrote: >>>>> Hm, should we allow UTC? If you tell me the time in UTC, then >>>>> (sometimes) I still don't actually know what the time is, because some >>>>> UTC seconds occur twice. UTC only makes sense if you provide the TAI >>>>> offset, surely? Should the virtio_rtc specification make it mandatory >>>>> to provide such? >>>>> >>>>> Otherwise you're just designing it to allow crappy hypervisors to >>>>> expose incomplete information. >>>>> >>>> >>>> Hi David, >>>> >>>> (adding virtio-comm...@lists.oasis-open.org for spec discussion), >>>> >>>> thank you for your insightful comments. I think I take a broadly similar >>>> view. The reason why the current spec and driver is like this is that I >>>> took a pragmatic approach at first and only included features which work >>>> out-of-the-box for the current Linux ecosystem. >>>> >>>> The current virtio_rtc features work similar to ptp_kvm, and therefore >>>> can work out-of-the-box with time sync daemons such as chrony. >>>> >>>> As of RFC spec v3, UTC clock only is allowed. If mandating a TAI clock >>>> as well, I am afraid that >>>> >>>> - in some (embedded) scenarios, the TAI clock may not be available >>>> >>>> - crappy hypervisors will pass off the UTC clock as the TAI clock. >>>> >>>> For the same reasons, I am also not sure about adding a *mandatory* TAI >>>> offset to each readout. I don't know user-space software which would >>>> leverage this already (at least not through the PTP clock interface). >>>> And why would such software not go straight for the TAI clock instead? >>>> >>>> How about adding a requirement to the spec that the virtio-rtc device >>>> SHOULD expose the TAI clock whenever it is available - would this >>>> address your concerns? >>> >>> I think that would be too easy for implementors to miss, or decide not >>> to obey. Or to get *wrong*, by exposing a TAI clock but actually >>> putting UTC in it. >>> >>> I think I prefer to mandate the tai_offset field with the UTC clock. >>> Crappy implementations will just set it to zero, but at least that >>> gives a clear signal to the guests that it's *their* problem to >>> resolve. >> >> To me there are some open questions regarding how this would work. Is there >> a use case for this with the v3 clock reading methods, or would it be >> enough to address this with the Virtio timekeeper? >> >> Looking at clock_adjtime(2), the tai_offset could be exposed, but probably >> best alongside some additional information about leap seconds. I am not >> aware about any user-space user. In addition, leap second smearing should >> also be addressed. >> > > Is there even a standard yet for leap-smearing? Will it be linear over > 1000 seconds like UTC-SLS? Or semi-raised-cosine over 24 hours, which I > think is what Google does? Meta does something different again, don't > they? > > Exposing UTC as the only clock reference is bad enough; when leap > seconds happen there's a whole second during which you don't *know* > which second it is. It seems odd to me, for a precision clock to be > deliberately ambiguous about what the time is! Just to be clear, the device can perfectly expose only a TAI reference clock (or both UTC and TAI), the spec is just completely open about this, as it tries to work for diverse use cases. > > But if the virtio-rtc clock is defined as UTC and then expose something > *different* in it, that's even worse. You potentially end up providing > inaccurate time for a whole *day* leading up to the leap second. > > I think you're right that leap second smearing should be addressed. At > the very least, by making it clear that the virtio-rtc clock which > advertises UTC shall be used *only* for UTC, never UTC-SLS or any other > yet-to-be-defined variant. > Agreed. > Please make it explicit that any hypervisor which wants to advertise a > smeared clock shall define a new type which specifies the precise > smearing algorithm and cannot be conflated with the one you're defining > here. >
Re: [RFC PATCH v3 7/7] virtio_rtc: Add RTC class driver
On 11.03.24 20:46, Alexandre Belloni wrote: > On 11/03/2024 19:28:50+0100, Peter Hilber wrote: >>>> Perhaps this might be handled by the driver also setting a virtio-rtc >>>> monotonic alarm (which uses a clock similar to CLOCK_BOOTTIME_ALARM). >>>> The >>>> virtio-rtc monotonic alarm would just be used to wake up in case it was >>>> a >>>> CLOCK_BOOTTIME_ALARM alarm. >>>> >>>> Otherwise, the behavior should not differ from other RTC class drivers. >>>> >>> >>> What I don't quite get is how this is actually related to RTCs. This >>> would be a super imprecise mechanism to get the current time and date >>> from the host to the guest which is what I think your are trying to do, >>> especially since this is not supporting UIE. >>> The host system clock may come from reading the RTC at some point in >>> time but more likely from another source so is it really the best >>> synchronization mechanism? >> >> Hello, >> >> thank you for your comments. >> >> The main motivation to have the RTC class driver is the RTC alarm >> (discussed below). >> >> As for synchronization, virtio_rtc also offers a PTP clock [1] which will >> be more precise, but which needs a user space daemon. As for RTC-based >> initial synchronization, my idea was to propose, in a second step, an >> optional op for rtc_class_ops, which would read the clock with nanosecond >> precision. This optional op could then be used in rtc_hctosys(), so there >> would be no need for UIE waiting. > > This would be a clear NAK, rtc_hctosys should use UIE to have proper > synchronisation. It currently does a very bad job reading the RTC and it > is a pity it has been mandated by systemd as useerspace is definitively > better placed to set the system time. I'm still very tempted delaying > everyone's boot by one second and make rtc_hctosys precise for all the > supported HW and not just a single driver. > OK. I plan to add a PPS feature to virtio_rtc so that it can support UIE. AFAIU this is not required for the initial driver version. Thanks for the comments, Peter [...]
Re: [RFC PATCH v3 0/7] Add virtio_rtc module and related changes
On 08.03.24 13:33, David Woodhouse wrote: > On Fri, 2024-03-08 at 11:32 +0100, Peter Hilber wrote: >> On 07.03.24 15:02, David Woodhouse wrote: >>> On Mon, 2023-12-18 at 08:38 +0100, Peter Hilber wrote: >>>> RFC v3 updates >>>> -- >>>> >>>> This series implements a driver for a virtio-rtc device conforming to >>>> spec >>>> RFC v3 [1]. It now includes an RTC class driver with alarm, in >>>> addition to >>>> the PTP clock driver already present before. >>>> >>>> This patch series depends on the patch series "treewide: Use >>>> clocksource id >>>> for get_device_system_crosststamp()" [3]. Pull [4] to get the combined >>>> series on top of mainline. >>>> >>>> Overview >>>> >>>> >>>> This patch series adds the virtio_rtc module, and related bugfixes. >>>> The >>>> virtio_rtc module implements a driver compatible with the proposed >>>> Virtio >>>> RTC device specification [1]. The Virtio RTC (Real Time Clock) device >>>> provides information about current time. The device can provide >>>> different >>>> clocks, e.g. for the UTC or TAI time standards, or for physical time >>>> elapsed since some past epoch. >>> >>> Hm, should we allow UTC? If you tell me the time in UTC, then >>> (sometimes) I still don't actually know what the time is, because some >>> UTC seconds occur twice. UTC only makes sense if you provide the TAI >>> offset, surely? Should the virtio_rtc specification make it mandatory >>> to provide such? >>> >>> Otherwise you're just designing it to allow crappy hypervisors to >>> expose incomplete information. >>> >> >> Hi David, >> >> (adding virtio-comm...@lists.oasis-open.org for spec discussion), >> >> thank you for your insightful comments. I think I take a broadly similar >> view. The reason why the current spec and driver is like this is that I >> took a pragmatic approach at first and only included features which work >> out-of-the-box for the current Linux ecosystem. >> >> The current virtio_rtc features work similar to ptp_kvm, and therefore >> can >> work out-of-the-box with time sync daemons such as chrony. >> >> As of RFC spec v3, UTC clock only is allowed. If mandating a TAI clock >> as >> well, I am afraid that >> >> - in some (embedded) scenarios, the TAI clock may not be available >> >> - crappy hypervisors will pass off the UTC clock as the TAI clock. >> >> For the same reasons, I am also not sure about adding a *mandatory* TAI >> offset to each readout. I don't know user-space software which would >> leverage this already (at least not through the PTP clock interface). >> And >> why would such software not go straight for the TAI clock instead? >> >> How about adding a requirement to the spec that the virtio-rtc device >> SHOULD expose the TAI clock whenever it is available - would this >> address >> your concerns? > > I think that would be too easy for implementors to miss, or decide not > to obey. Or to get *wrong*, by exposing a TAI clock but actually > putting UTC in it. > > I think I prefer to mandate the tai_offset field with the UTC clock. > Crappy implementations will just set it to zero, but at least that > gives a clear signal to the guests that it's *their* problem to > resolve. To me there are some open questions regarding how this would work. Is there a use case for this with the v3 clock reading methods, or would it be enough to address this with the Virtio timekeeper? Looking at clock_adjtime(2), the tai_offset could be exposed, but probably best alongside some additional information about leap seconds. I am not aware about any user-space user. In addition, leap second smearing should also be addressed. > > > > >>>> PTP clock interface >>>> --- >>>> >>>> virtio_rtc exposes clocks as PTP clocks to userspace, similar to >>>> ptp_kvm. >>>> If both the Virtio RTC device and this driver have special support for >>>> the >>>> current clocksource, time synchronization programs can use >>>> cross-timestamping using ioctl PTP_SYS_OFFSET_PRECISE2 aka >>>> PTP_SYS_OFFSET_PRECISE. Similar to ptp_kvm, system time >>>> synchronization >>>> with single-digit ns precision is possible with a quiescent reference >>>
Re: [RFC PATCH v3 7/7] virtio_rtc: Add RTC class driver
On 08.03.24 18:03, Alexandre Belloni wrote: > Hello, > > I'll start by saying that I'm sorry, I have a very very high level > knowledge about what virtio is. > > On 18/12/2023 08:38:45+0100, Peter Hilber wrote: >> Expose the virtio-rtc UTC clock as an RTC clock to userspace, if it is >> present. Support RTC alarm if the virtio-rtc alarm feature is present. >> The >> virtio-rtc device signals an alarm by marking an alarmq buffer as used. >> >> Peculiarities >> - >> >> A virtio-rtc clock is a bit special for an RTC clock in that >> >> - the clock may step (also backwards) autonomously at any time and >> >> - the device, and its notification mechanism, will be reset during boot >> or >> resume from sleep. >> >> The virtio-rtc device avoids that the driver might miss an alarm. The >> device signals an alarm whenever the clock has reached or passed the >> alarm >> time, and also when the device is reset (on boot or resume from sleep), >> if >> the alarm time is in the past. >> >> Open Issue >> -- >> >> The CLOCK_BOOTTIME_ALARM will use the RTC clock to wake up from sleep, >> and >> implicitly assumes that no RTC clock steps will occur during sleep. The >> RTC >> class driver does not know whether the current alarm is a real-time >> alarm >> or a boot-time alarm. >> >> Perhaps this might be handled by the driver also setting a virtio-rtc >> monotonic alarm (which uses a clock similar to CLOCK_BOOTTIME_ALARM). >> The >> virtio-rtc monotonic alarm would just be used to wake up in case it was >> a >> CLOCK_BOOTTIME_ALARM alarm. >> >> Otherwise, the behavior should not differ from other RTC class drivers. >> > > What I don't quite get is how this is actually related to RTCs. This > would be a super imprecise mechanism to get the current time and date > from the host to the guest which is what I think your are trying to do, > especially since this is not supporting UIE. > The host system clock may come from reading the RTC at some point in > time but more likely from another source so is it really the best > synchronization mechanism? Hello, thank you for your comments. The main motivation to have the RTC class driver is the RTC alarm (discussed below). As for synchronization, virtio_rtc also offers a PTP clock [1] which will be more precise, but which needs a user space daemon. As for RTC-based initial synchronization, my idea was to propose, in a second step, an optional op for rtc_class_ops, which would read the clock with nanosecond precision. This optional op could then be used in rtc_hctosys(), so there would be no need for UIE waiting. [1] https://lore.kernel.org/all/20231218073849.35294-6-peter.hil...@opensynergy.com/ > > The other thing is that I don't quite get the point of the RTC alarm > versus a regular timer in this context. RTC alarms allow to resume from suspend and poweroff (esp. also through alarmtimers), which is of interest in embedded virtualization. In my understanding RTC is ATM the only way to do this. (I was indeed thinking about adding an alternate alarmtimer backend for CLOCK_BOOTTIME_ALARM, which should deal with the CLOCK_REALTIME_ALARM vs CLOCK_BOOTTIME_ALARM issue which is described in the commit message.) > > > [...] > >> +static const struct rtc_class_ops viortc_class_with_alarm_ops = { >> +.read_time = viortc_class_read_time, >> +.read_alarm = viortc_class_read_alarm, >> +.set_alarm = viortc_class_set_alarm, >> +.alarm_irq_enable = viortc_class_alarm_irq_enable, >> +}; >> + >> +static const struct rtc_class_ops viortc_class_no_alarm_ops = { >> +.read_time = viortc_class_read_time, >> +}; >> + > > [...] > >> +/** >> +/** >> + * viortc_class_init() - init RTC class wrapper and device >> + * @viortc: device data >> + * @vio_clk_id: virtio_rtc clock id >> + * @have_alarm: expose alarm ops >> + * @parent_dev: virtio device >> + * >> + * Context: Process context. >> + * Return: RTC class wrapper on success, ERR_PTR otherwise. >> + */ >> +struct viortc_class *viortc_class_init(struct viortc_dev *viortc, >> + u16 vio_clk_id, bool have_alarm, >> + struct device *parent_dev) >> +{ >> +struct viortc_class *viortc_class; >> +struct rtc_device *rtc; >> + >> +viortc_class = >> +devm_kzalloc(parent_dev, sizeof(*viortc_class), >> GFP_KERNEL); >> +if (!viortc_class) >> +return ERR_PTR(-ENOMEM); >> + >> +viortc_class->viortc = viortc; >> + >> +rtc = devm_rtc_allocate_device(parent_dev); >> +if (IS_ERR(rtc)) >> +return ERR_PTR(PTR_ERR(rtc)); >> + >> +viortc_class->rtc = rtc; >> + >> +clear_bit(RTC_FEATURE_UPDATE_INTERRUPT, rtc->features); >> + >> +rtc->ops = have_alarm ? &viortc_class_with_alarm_ops : >> +&viortc_class_no_alarm_ops; > > Don't do this, simply clear the alarm feature. > OK (sorry, was obviously very inelegant). Best regards, Peter
Re: [RFC PATCH v3 0/7] Add virtio_rtc module and related changes
On 07.03.24 15:02, David Woodhouse wrote: > On Mon, 2023-12-18 at 08:38 +0100, Peter Hilber wrote: >> RFC v3 updates >> -- >> >> This series implements a driver for a virtio-rtc device conforming to spec >> RFC v3 [1]. It now includes an RTC class driver with alarm, in addition to >> the PTP clock driver already present before. >> >> This patch series depends on the patch series "treewide: Use clocksource id >> for get_device_system_crosststamp()" [3]. Pull [4] to get the combined >> series on top of mainline. >> >> Overview >> >> >> This patch series adds the virtio_rtc module, and related bugfixes. The >> virtio_rtc module implements a driver compatible with the proposed Virtio >> RTC device specification [1]. The Virtio RTC (Real Time Clock) device >> provides information about current time. The device can provide different >> clocks, e.g. for the UTC or TAI time standards, or for physical time >> elapsed since some past epoch. > > Hm, should we allow UTC? If you tell me the time in UTC, then > (sometimes) I still don't actually know what the time is, because some > UTC seconds occur twice. UTC only makes sense if you provide the TAI > offset, surely? Should the virtio_rtc specification make it mandatory > to provide such? > > Otherwise you're just designing it to allow crappy hypervisors to > expose incomplete information. > Hi David, (adding virtio-comm...@lists.oasis-open.org for spec discussion), thank you for your insightful comments. I think I take a broadly similar view. The reason why the current spec and driver is like this is that I took a pragmatic approach at first and only included features which work out-of-the-box for the current Linux ecosystem. The current virtio_rtc features work similar to ptp_kvm, and therefore can work out-of-the-box with time sync daemons such as chrony. As of RFC spec v3, UTC clock only is allowed. If mandating a TAI clock as well, I am afraid that - in some (embedded) scenarios, the TAI clock may not be available - crappy hypervisors will pass off the UTC clock as the TAI clock. For the same reasons, I am also not sure about adding a *mandatory* TAI offset to each readout. I don't know user-space software which would leverage this already (at least not through the PTP clock interface). And why would such software not go straight for the TAI clock instead? How about adding a requirement to the spec that the virtio-rtc device SHOULD expose the TAI clock whenever it is available - would this address your concerns? >> PTP clock interface >> --- >> >> virtio_rtc exposes clocks as PTP clocks to userspace, similar to ptp_kvm. >> If both the Virtio RTC device and this driver have special support for the >> current clocksource, time synchronization programs can use >> cross-timestamping using ioctl PTP_SYS_OFFSET_PRECISE2 aka >> PTP_SYS_OFFSET_PRECISE. Similar to ptp_kvm, system time synchronization >> with single-digit ns precision is possible with a quiescent reference clock >> (from the Virtio RTC device). This works even when the Virtio device >> response is slow compared to ptp_kvm hypercalls. > > Is PTP the right mechanism for this? As I understand it, PTP is a way > to precisely synchronize one clock with another. But in the case of > virt guests synchronizing against the host, it isn't really *another* > clock. It really is the *same* underlying clock. As the host clock > varies with temperature, for example, so does the guest clock. The only > difference is an offset and (on x86 perhaps) a mathematical scaling of > the frequency. > > I was looking at this another way, when I came across this virtio-rtc > work. > > My idea was just for the hypervisor to expose its own timekeeping > information — the counter/TSC value and TAI time at a given moment, > frequency of the counter, and the precision of both that frequency > (±PPM) and the TAI timestamp (±µs). > > By putting that in a host/guest shared data structure with a seqcount > for lockless updates, we can update it as time synchronization on the > host is refined, and we can even cleanly handle live migration where > the guest ends up on a completely different host. It allows for use > cases which *really* care (e.g. timestamping financial transactions) to > ensure that there is never even a moment of getting *wrong* timestamps > if they haven't yet resynced after a migration. I considered a similar approach as well, but integrating that with the kernel timekeeping seemed too much effort for the first step. However, reading the clock from user space would be much simpler. > > Now I'm trying to work out if I should attempt t
[RFC PATCH v3 7/7] virtio_rtc: Add RTC class driver
Expose the virtio-rtc UTC clock as an RTC clock to userspace, if it is present. Support RTC alarm if the virtio-rtc alarm feature is present. The virtio-rtc device signals an alarm by marking an alarmq buffer as used. Peculiarities - A virtio-rtc clock is a bit special for an RTC clock in that - the clock may step (also backwards) autonomously at any time and - the device, and its notification mechanism, will be reset during boot or resume from sleep. The virtio-rtc device avoids that the driver might miss an alarm. The device signals an alarm whenever the clock has reached or passed the alarm time, and also when the device is reset (on boot or resume from sleep), if the alarm time is in the past. Open Issue -- The CLOCK_BOOTTIME_ALARM will use the RTC clock to wake up from sleep, and implicitly assumes that no RTC clock steps will occur during sleep. The RTC class driver does not know whether the current alarm is a real-time alarm or a boot-time alarm. Perhaps this might be handled by the driver also setting a virtio-rtc monotonic alarm (which uses a clock similar to CLOCK_BOOTTIME_ALARM). The virtio-rtc monotonic alarm would just be used to wake up in case it was a CLOCK_BOOTTIME_ALARM alarm. Otherwise, the behavior should not differ from other RTC class drivers. Signed-off-by: Peter Hilber --- Notes: Added in v3. drivers/virtio/Kconfig | 21 +- drivers/virtio/Makefile | 1 + drivers/virtio/virtio_rtc_class.c| 269 +++ drivers/virtio/virtio_rtc_driver.c | 477 ++- drivers/virtio/virtio_rtc_internal.h | 53 +++ include/uapi/linux/virtio_rtc.h | 88 - 6 files changed, 902 insertions(+), 7 deletions(-) create mode 100644 drivers/virtio/virtio_rtc_class.c diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig index d35c728778d2..e97bb2e9eca1 100644 --- a/drivers/virtio/Kconfig +++ b/drivers/virtio/Kconfig @@ -180,7 +180,8 @@ config VIRTIO_RTC help This driver provides current time from a Virtio RTC device. The driver provides the time through one or more clocks. The Virtio RTC PTP -clocks must be enabled to expose the clocks to userspace. +clocks and/or the Real Time Clock driver for Virtio RTC must be +enabled to expose the clocks to userspace. To compile this code as a module, choose M here: the module will be called virtio_rtc. @@ -189,8 +190,8 @@ config VIRTIO_RTC if VIRTIO_RTC -comment "WARNING: Consider enabling VIRTIO_RTC_PTP." - depends on !VIRTIO_RTC_PTP +comment "WARNING: Consider enabling VIRTIO_RTC_PTP and/or VIRTIO_RTC_CLASS." + depends on !VIRTIO_RTC_PTP && !VIRTIO_RTC_CLASS comment "Enable PTP_1588_CLOCK in order to enable VIRTIO_RTC_PTP." depends on PTP_1588_CLOCK=n @@ -218,6 +219,20 @@ config VIRTIO_RTC_ARM If unsure, say Y. +comment "Enable RTC_CLASS in order to enable VIRTIO_RTC_CLASS." + depends on RTC_CLASS=n + +config VIRTIO_RTC_CLASS + bool "Real Time Clock driver for Virtio RTC" + default y + depends on RTC_CLASS + help +This exposes the Virtio RTC UTC clock as a Linux Real Time Clock. It +only has an effect if the Virtio RTC device has a UTC clock. The Real +Time Clock is read-only, and may support setting an alarm. + +If unsure, say Y. + endif # VIRTIO_RTC endif # VIRTIO_MENU diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile index 781dff9f8822..6c26bad777db 100644 --- a/drivers/virtio/Makefile +++ b/drivers/virtio/Makefile @@ -16,3 +16,4 @@ obj-$(CONFIG_VIRTIO_RTC) += virtio_rtc.o virtio_rtc-y := virtio_rtc_driver.o virtio_rtc-$(CONFIG_VIRTIO_RTC_PTP) += virtio_rtc_ptp.o virtio_rtc-$(CONFIG_VIRTIO_RTC_ARM) += virtio_rtc_arm.o +virtio_rtc-$(CONFIG_VIRTIO_RTC_CLASS) += virtio_rtc_class.o diff --git a/drivers/virtio/virtio_rtc_class.c b/drivers/virtio/virtio_rtc_class.c new file mode 100644 index ..fcb694f0f9a0 --- /dev/null +++ b/drivers/virtio/virtio_rtc_class.c @@ -0,0 +1,269 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * virtio_rtc RTC class driver + * + * Copyright (C) 2023 OpenSynergy GmbH + */ + +#include +#include +#include + +#include + +#include "virtio_rtc_internal.h" + +/** + * struct viortc_class - RTC class wrapper + * @viortc: virtio_rtc device data + * @rtc: RTC device + * @vio_clk_id: virtio_rtc clock id + * @stopped: Whether RTC ops are disallowed. Access protected by rtc_lock(). + */ +struct viortc_class { + struct viortc_dev *viortc; + struct rtc_device *rtc; + u16 vio_clk_id; + bool stopped; +}; + +/** + * viortc_class_get_locked() - get RTC class wrapper, if ops allowed + * @dev: virtio device + * + * Gets the RTC class wrapper from the virtio device, if it is available and + * ops are allowed. + * + * Context: Caller must
[RFC PATCH v3 4/7] virtio_rtc: Add module and driver core
Add the virtio_rtc module and driver core. The virtio_rtc module implements a driver compatible with the proposed Virtio RTC device specification. The Virtio RTC (Real Time Clock) device provides information about current time. The device can provide different clocks, e.g. for the UTC or TAI time standards, or for physical time elapsed since some past epoch. The driver can read the clocks with simple or more accurate methods. Implement the core, which interacts with the Virtio RTC device. Apart from this, the core does not expose functionality outside of the virtio_rtc module. A follow-up patch will expose PTP clocks. Provide synchronous messaging, which is enough for the expected time synchronization use cases through PTP clocks (similar to ptp_kvm) or RTC Class driver. Signed-off-by: Peter Hilber --- Notes: v3: - merge readq and controlq into a single requestq (spec v3) - don't guard cross-timestamping with feature bit (spec v3) - pad message headers to 64 bits (spec v3) - reduce clock id to 16 bits (spec v3) - change Virtio status codes (spec v3) - use 'VIRTIO_RTC_REQ_' prefix for request messages (spec v3) MAINTAINERS | 7 + drivers/virtio/Kconfig | 13 + drivers/virtio/Makefile | 2 + drivers/virtio/virtio_rtc_driver.c | 761 +++ drivers/virtio/virtio_rtc_internal.h | 23 + include/uapi/linux/virtio_rtc.h | 144 + 6 files changed, 950 insertions(+) create mode 100644 drivers/virtio/virtio_rtc_driver.c create mode 100644 drivers/virtio/virtio_rtc_internal.h create mode 100644 include/uapi/linux/virtio_rtc.h diff --git a/MAINTAINERS b/MAINTAINERS index b589218605b4..0c157a19bbfd 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -23200,6 +23200,13 @@ S: Maintained F: drivers/nvdimm/nd_virtio.c F: drivers/nvdimm/virtio_pmem.c +VIRTIO RTC DRIVER +M: Peter Hilber +L: virtualizat...@lists.linux.dev +S: Maintained +F: drivers/virtio/virtio_rtc_* +F: include/uapi/linux/virtio_rtc.h + VIRTIO SOUND DRIVER M: Anton Yakovlev M: "Michael S. Tsirkin" diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig index 0a53a61231c2..834dd14bc070 100644 --- a/drivers/virtio/Kconfig +++ b/drivers/virtio/Kconfig @@ -173,4 +173,17 @@ config VIRTIO_DMA_SHARED_BUFFER This option adds a flavor of dma buffers that are backed by virtio resources. +config VIRTIO_RTC + tristate "Virtio RTC driver" + depends on VIRTIO + depends on PTP_1588_CLOCK_OPTIONAL + help +This driver provides current time from a Virtio RTC device. The driver +provides the time through one or more clocks. + +To compile this code as a module, choose M here: the module will be +called virtio_rtc. + +If unsure, say M. + endif # VIRTIO_MENU diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile index 8e98d24917cc..f760414ed6ab 100644 --- a/drivers/virtio/Makefile +++ b/drivers/virtio/Makefile @@ -12,3 +12,5 @@ obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o obj-$(CONFIG_VIRTIO_VDPA) += virtio_vdpa.o obj-$(CONFIG_VIRTIO_MEM) += virtio_mem.o obj-$(CONFIG_VIRTIO_DMA_SHARED_BUFFER) += virtio_dma_buf.o +obj-$(CONFIG_VIRTIO_RTC) += virtio_rtc.o +virtio_rtc-y := virtio_rtc_driver.o diff --git a/drivers/virtio/virtio_rtc_driver.c b/drivers/virtio/virtio_rtc_driver.c new file mode 100644 index ..ef1ea14b3bec --- /dev/null +++ b/drivers/virtio/virtio_rtc_driver.c @@ -0,0 +1,761 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * virtio_rtc driver core + * + * Copyright (C) 2022-2023 OpenSynergy GmbH + */ + +#include +#include +#include +#include +#include + +#include + +#include "virtio_rtc_internal.h" + +/* virtqueue order */ +enum { + VIORTC_REQUESTQ, + VIORTC_MAX_NR_QUEUES, +}; + +/** + * struct viortc_vq - virtqueue abstraction + * @vq: virtqueue + * @lock: protects access to vq + */ +struct viortc_vq { + struct virtqueue *vq; + spinlock_t lock; +}; + +/** + * struct viortc_dev - virtio_rtc device data + * @vdev: virtio device + * @vqs: virtqueues + * @num_clocks: # of virtio_rtc clocks + */ +struct viortc_dev { + struct virtio_device *vdev; + struct viortc_vq vqs[VIORTC_MAX_NR_QUEUES]; + u16 num_clocks; +}; + +/** + * struct viortc_msg - Message requested by driver, responded by device. + * @viortc: device data + * @req: request buffer + * @resp: response buffer + * @responded: vqueue callback signals response reception + * @refcnt: Message reference count, message and buffers will be deallocated + * once 0. refcnt is decremented in the vqueue callback and in the + * thread waiting on the responded completion. + * If a message response wait function times out, the message will be + * freed upon late reception (refcnt will rea
[RFC PATCH v3 5/7] virtio_rtc: Add PTP clocks
Expose the virtio_rtc clocks as PTP clocks to userspace, similar to ptp_kvm. virtio_rtc can expose multiple clocks, e.g. a UTC clock and a monotonic clock. Userspace should distinguish different clocks through the name assigned by the driver. A udev rule such as the following can be used to get a symlink /dev/ptp_virtio to the UTC clock: SUBSYSTEM=="ptp", ATTR{clock_name}=="Virtio PTP UTC", SYMLINK += "ptp_virtio" The preferred PTP clock reading method is ioctl PTP_SYS_OFFSET_PRECISE2, through the ptp_clock_info.getcrosststamp() op. For now, PTP_SYS_OFFSET_PRECISE2 will return -EOPNOTSUPP through a weak function. PTP_SYS_OFFSET_PRECISE2 requires cross-timestamping support for specific clocksources, which will be added in the following. If the clocksource specific code is enabled, check that the Virtio RTC device supports the respective HW counter before obtaining an actual cross-timestamp from the Virtio device. The Virtio RTC device response time may be higher than the timekeeper seqcount increment interval. Therefore, obtain the cross-timestamp before calling get_device_system_crosststamp(). As a fallback, support the ioctl PTP_SYS_OFFSET_EXTENDED2 for all platforms. Assume that concurrency issues during PTP clock removal are avoided by the posix_clock framework. Kconfig recursive dependencies prevent virtio_rtc from implicitly enabling PTP_1588_CLOCK, therefore just warn the user if PTP_1588_CLOCK is not available. Since virtio_rtc should in the future also expose clocks as RTC class devices, do not have VIRTIO_RTC depend on PTP_1588_CLOCK. Signed-off-by: Peter Hilber --- Notes: v3: - don't guard cross-timestamping with feature bit (spec v3) - reduce clock id to 16 bits (spec v3) v2: - Depend on prerequisite patch series "treewide: Use clocksource id for get_device_system_crosststamp()". - Check clocksource id before sending crosststamp message to device. - Do not support multiple hardware counters at runtime any more, since distinction of Arm physical and virtual counter appears unneeded after discussion with Marc Zyngier. drivers/virtio/Kconfig | 23 +- drivers/virtio/Makefile | 1 + drivers/virtio/virtio_rtc_driver.c | 131 +- drivers/virtio/virtio_rtc_internal.h | 46 drivers/virtio/virtio_rtc_ptp.c | 342 +++ 5 files changed, 539 insertions(+), 4 deletions(-) create mode 100644 drivers/virtio/virtio_rtc_ptp.c diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig index 834dd14bc070..8542b2f20201 100644 --- a/drivers/virtio/Kconfig +++ b/drivers/virtio/Kconfig @@ -179,11 +179,32 @@ config VIRTIO_RTC depends on PTP_1588_CLOCK_OPTIONAL help This driver provides current time from a Virtio RTC device. The driver -provides the time through one or more clocks. +provides the time through one or more clocks. The Virtio RTC PTP +clocks must be enabled to expose the clocks to userspace. To compile this code as a module, choose M here: the module will be called virtio_rtc. If unsure, say M. +if VIRTIO_RTC + +comment "WARNING: Consider enabling VIRTIO_RTC_PTP." + depends on !VIRTIO_RTC_PTP + +comment "Enable PTP_1588_CLOCK in order to enable VIRTIO_RTC_PTP." + depends on PTP_1588_CLOCK=n + +config VIRTIO_RTC_PTP + bool "Virtio RTC PTP clocks" + default y + depends on PTP_1588_CLOCK + help +This exposes any Virtio RTC clocks as PTP Hardware Clocks (PHCs) to +userspace. + +If unsure, say Y. + +endif # VIRTIO_RTC + endif # VIRTIO_MENU diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile index f760414ed6ab..4d48cbcae6bb 100644 --- a/drivers/virtio/Makefile +++ b/drivers/virtio/Makefile @@ -14,3 +14,4 @@ obj-$(CONFIG_VIRTIO_MEM) += virtio_mem.o obj-$(CONFIG_VIRTIO_DMA_SHARED_BUFFER) += virtio_dma_buf.o obj-$(CONFIG_VIRTIO_RTC) += virtio_rtc.o virtio_rtc-y := virtio_rtc_driver.o +virtio_rtc-$(CONFIG_VIRTIO_RTC_PTP) += virtio_rtc_ptp.o diff --git a/drivers/virtio/virtio_rtc_driver.c b/drivers/virtio/virtio_rtc_driver.c index ef1ea14b3bec..c331b7383285 100644 --- a/drivers/virtio/virtio_rtc_driver.c +++ b/drivers/virtio/virtio_rtc_driver.c @@ -35,11 +35,16 @@ struct viortc_vq { * struct viortc_dev - virtio_rtc device data * @vdev: virtio device * @vqs: virtqueues + * @clocks_to_unregister: Clock references, which are only used during device + *removal. + * For other uses, there would be a race between device + * creation and setting the pointers here. * @num_clocks: # of virtio_rtc clocks */ struct viortc_dev { struct virtio_device *vdev; struct viortc_vq vqs[VIORTC_MAX_NR_QUEUES]; + struct viortc_ptp_clock **cl
[RFC PATCH v3 6/7] virtio_rtc: Add Arm Generic Timer cross-timestamping
Add PTP_SYS_OFFSET_PRECISE2 support on platforms using the Arm Generic Timer. Always report the CP15 virtual counter as the HW counter in use by arm_arch_timer, since the Linux kernel's usage of the Arm Generic Timer should always be compatible with this. Signed-off-by: Peter Hilber --- Notes: v2: - Depend on prerequisite patch series "treewide: Use clocksource id for get_device_system_crosststamp()". - Return clocksource id instead of calling dropped arm_arch_timer helpers. - Always report the CP15 virtual counter to be in use by arm_arch_timer, since distinction of Arm physical and virtual counter appears unneeded after discussion with Marc Zyngier. drivers/virtio/Kconfig | 13 + drivers/virtio/Makefile | 1 + drivers/virtio/virtio_rtc_arm.c | 22 ++ 3 files changed, 36 insertions(+) create mode 100644 drivers/virtio/virtio_rtc_arm.c diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig index 8542b2f20201..d35c728778d2 100644 --- a/drivers/virtio/Kconfig +++ b/drivers/virtio/Kconfig @@ -205,6 +205,19 @@ config VIRTIO_RTC_PTP If unsure, say Y. +config VIRTIO_RTC_ARM + bool "Virtio RTC cross-timestamping using Arm Generic Timer" + default y + depends on VIRTIO_RTC_PTP && ARM_ARCH_TIMER + help +This enables Virtio RTC cross-timestamping using the Arm Generic Timer. +It only has an effect if the Virtio RTC device also supports this. The +cross-timestamp is available through the PTP clock driver precise +cross-timestamp ioctl (PTP_SYS_OFFSET_PRECISE2 or +PTP_SYS_OFFSET_PRECISE). + +If unsure, say Y. + endif # VIRTIO_RTC endif # VIRTIO_MENU diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile index 4d48cbcae6bb..781dff9f8822 100644 --- a/drivers/virtio/Makefile +++ b/drivers/virtio/Makefile @@ -15,3 +15,4 @@ obj-$(CONFIG_VIRTIO_DMA_SHARED_BUFFER) += virtio_dma_buf.o obj-$(CONFIG_VIRTIO_RTC) += virtio_rtc.o virtio_rtc-y := virtio_rtc_driver.o virtio_rtc-$(CONFIG_VIRTIO_RTC_PTP) += virtio_rtc_ptp.o +virtio_rtc-$(CONFIG_VIRTIO_RTC_ARM) += virtio_rtc_arm.o diff --git a/drivers/virtio/virtio_rtc_arm.c b/drivers/virtio/virtio_rtc_arm.c new file mode 100644 index ..5185b130b3f1 --- /dev/null +++ b/drivers/virtio/virtio_rtc_arm.c @@ -0,0 +1,22 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Provides cross-timestamp params for Arm. + * + * Copyright (C) 2022-2023 OpenSynergy GmbH + */ + +#include + +#include + +#include "virtio_rtc_internal.h" + +/* see header for doc */ + +int viortc_hw_xtstamp_params(u16 *hw_counter, enum clocksource_ids *cs_id) +{ + *hw_counter = VIRTIO_RTC_COUNTER_ARM_VIRT; + *cs_id = CSID_ARM_ARCH_COUNTER; + + return 0; +} -- 2.40.1
[RFC PATCH v3 0/7] Add virtio_rtc module and related changes
== Date (UTC) Time Refid DP L P Raw offset Cooked offset Disp. === 2023-06-28 14:11:26.697782 PHCV0 N 0 3.318200e-05 3.450891e-05 4.611e-06 2023-06-28 14:11:26.697782 PHCV- N - -3.450891e-05 4.611e-06 ioctl(5, PTP_SYS_OFFSET_EXTENDED, {n_samples=10, ts=OMITTED}) = 0 2023-06-28 14:11:27.208763 PHCV0 N 0 -3.792800e-05 -4.023965e-05 4.611e-06 2023-06-28 14:11:27.208763 PHCV- N - - -4.023965e-05 4.611e-06 ioctl(5, PTP_SYS_OFFSET_EXTENDED, {n_samples=10, ts=OMITTED}) = 0 2023-06-28 14:11:27.722818 PHCV0 N 0 -3.328600e-05 -3.134404e-05 4.611e-06 2023-06-28 14:11:27.722818 PHCV- N - - -3.134404e-05 4.611e-06 ioctl(5, PTP_SYS_OFFSET_EXTENDED, {n_samples=10, ts=OMITTED}) = 0 2023-06-28 14:11:28.233572 PHCV0 N 0 -4.966900e-05 -4.584331e-05 4.611e-06 2023-06-28 14:11:28.233572 PHCV- N - - -4.584331e-05 4.611e-06 ioctl(5, PTP_SYS_OFFSET_EXTENDED, {n_samples=10, ts=OMITTED}) = 0 2023-06-28 14:11:28.742737 PHCV0 N 0 4.902700e-05 5.361388e-05 4.611e-06 2023-06-28 14:11:28.742737 PHCV- N - -5.361388e-05 4.611e-06 PTP clock setup --- The following udev rule can be used to get a symlink /dev/ptp_virtio to the UTC clock: SUBSYSTEM=="ptp", ATTR{clock_name}=="Virtio PTP UTC", SYMLINK += "ptp_virtio" The following chrony configuration directive can then be added in /etc/chrony/chrony.conf to synchronize to the Virtio UTC clock: refclock PHC /dev/ptp_virtio refid PHCV poll -1 dpoll -1 RTC interface - This patch series adds virtio_rtc as a generic Virtio driver, including both a PTP clock driver and an RTC class driver. Feedback is greatly appreciated. [1] https://lore.kernel.org/virtio-comment/20231218064253.9734-1-peter.hil...@opensynergy.com/ [2] https://chrony.tuxfamily.org/ [3] https://lore.kernel.org/lkml/20231215220612.173603-1-peter.hil...@opensynergy.com/ [4] https://github.com/OpenSynergy/linux.git virtio-rtc-v3-on-master v3: - Update to conform to virtio spec RFC v3 (no significant behavioral changes). - Add RTC class driver with alarm according to virtio spec RFC v3. - For cross-timestamp corner case fix, switch back to v1 style closed interval test (Thomas Gleixner). v2: - Depend on patch series "treewide: Use clocksource id for get_device_system_crosststamp()" to avoid requiring a clocksource pointer with get_device_system_crosststamp(). - Assume Arm Generic Timer will use CP15 virtual counter. Drop arm_arch_timer helper functions (Marc Zyngier). - Improve cross-timestamp fixes problem description and implementation (John Stultz). Peter Hilber (7): timekeeping: Fix cross-timestamp interpolation on counter wrap timekeeping: Fix cross-timestamp interpolation corner case decision timekeeping: Fix cross-timestamp interpolation for non-x86 virtio_rtc: Add module and driver core virtio_rtc: Add PTP clocks virtio_rtc: Add Arm Generic Timer cross-timestamping virtio_rtc: Add RTC class driver MAINTAINERS |7 + drivers/virtio/Kconfig | 62 ++ drivers/virtio/Makefile |5 + drivers/virtio/virtio_rtc_arm.c | 22 + drivers/virtio/virtio_rtc_class.c| 269 + drivers/virtio/virtio_rtc_driver.c | 1357 ++ drivers/virtio/virtio_rtc_internal.h | 122 +++ drivers/virtio/virtio_rtc_ptp.c | 342 +++ include/uapi/linux/virtio_rtc.h | 230 + kernel/time/timekeeping.c| 24 +- 10 files changed, 2428 insertions(+), 12 deletions(-) create mode 100644 drivers/virtio/virtio_rtc_arm.c create mode 100644 drivers/virtio/virtio_rtc_class.c create mode 100644 drivers/virtio/virtio_rtc_driver.c create mode 100644 drivers/virtio/virtio_rtc_internal.h create mode 100644 drivers/virtio/virtio_rtc_ptp.c create mode 100644 include/uapi/linux/virtio_rtc.h base-commit: 2c41b211d72c1eb350c7629a8c85234fef0d12c1 -- 2.40.1
Re: [PATCH v7 1/1] iio/scmi: Adding support for IIO SCMI Based Sensors
On 10.03.21 18:19, Jyoti Bhayana wrote: > Hi Peter, > > As already discussed with ARM, the spec clearly mentions that it has > to be uppercase and not case insensitive. So this patch is consistent > with the specs and changing it with means that the spec would need to > change as well. Therefore, there is no need for another version of the > patch > > "A NULL terminated UTF-8 format string with the sensor axis name, of > up to 16 bytes. It is recommended that the name ends with ‘_’ followed > by the axis of the sensor in uppercase. > For example, the name for the x-axis of a triaxial accelerometer could > be “acc_X” or “_X" > My understanding of the part of the spec quoted above is different: The spec only makes a recommendation and does allow additional or even contradictory sensor axis naming schemes. So the change to the implementation only would in principle be possible (disregarding integration timeline constraints). Best regards, Peter > Thanks, > Jyoti > > On Wed, Mar 10, 2021 at 3:16 AM Peter Hilber > wrote: >> >> On 10.03.21 00:12, Jyoti Bhayana wrote: >>> This change provides ARM SCMI Protocol based IIO device. >>> This driver provides support for Accelerometer and Gyroscope using >>> SCMI Sensor Protocol extensions added in the SCMIv3.0 ARM specification >>> >> >> [snip] >> >>> + >>> +static int scmi_iio_get_chan_modifier(const char *name, >>> + enum iio_modifier *modifier) >>> +{ >>> + char *pch, mod; >>> + >>> + if (!name) >>> + return -EINVAL; >>> + >>> + pch = strrchr(name, '_'); >>> + if (!pch) >>> + return -EINVAL; >>> + >>> + mod = *(pch + 1); >>> + switch (mod) { >>> + case 'X': >>> + *modifier = IIO_MOD_X; >>> + return 0; >>> + case 'Y': >>> + *modifier = IIO_MOD_Y; >>> + return 0; >>> + case 'Z': >>> + *modifier = IIO_MOD_Z; >>> + return 0; >>> + default: >>> + return -EINVAL; >>> + } >>> +} >>> + >> >> Hi Jyoti, >> >> could you still change the above code to also accept lower case 'x', >> 'y', 'z'? >> >> Supporting lower case as well would establish compatibility with the >> lower case naming conventions used for IIO channels. By this change, >> channels could be forwarded without name changes (as long as they fit >> into the name field). I'm sorry to notice this only now. >> >> Best regards, >> >> Peter >> >
Re: [PATCH v7 1/1] iio/scmi: Adding support for IIO SCMI Based Sensors
On 10.03.21 00:12, Jyoti Bhayana wrote: > This change provides ARM SCMI Protocol based IIO device. > This driver provides support for Accelerometer and Gyroscope using > SCMI Sensor Protocol extensions added in the SCMIv3.0 ARM specification > [snip] > + > +static int scmi_iio_get_chan_modifier(const char *name, > + enum iio_modifier *modifier) > +{ > + char *pch, mod; > + > + if (!name) > + return -EINVAL; > + > + pch = strrchr(name, '_'); > + if (!pch) > + return -EINVAL; > + > + mod = *(pch + 1); > + switch (mod) { > + case 'X': > + *modifier = IIO_MOD_X; > + return 0; > + case 'Y': > + *modifier = IIO_MOD_Y; > + return 0; > + case 'Z': > + *modifier = IIO_MOD_Z; > + return 0; > + default: > + return -EINVAL; > + } > +} > + Hi Jyoti, could you still change the above code to also accept lower case 'x', 'y', 'z'? Supporting lower case as well would establish compatibility with the lower case naming conventions used for IIO channels. By this change, channels could be forwarded without name changes (as long as they fit into the name field). I'm sorry to notice this only now. Best regards, Peter
Re: [RFC PATCH v3 1/1] iio/scmi: Adding support for IIO SCMI Based Sensors
On 22.01.21 00:21, Jyoti Bhayana wrote: > + > +static int scmi_iio_get_sensor_max_range(struct iio_dev *iio_dev, int *val, > + int *val2) > +{ > + struct scmi_iio_priv *sensor = iio_priv(iio_dev); > + int max_range_high, max_range_low; > + long long max_range; > + > + /* > + * All the axes are supposed to have the same value for max range. > + * We are just using the values from the Axis 0 here. > + */ > + if (sensor->sensor_info->axis[0].extended_attrs) { > + max_range = sensor->sensor_info->axis[0].attrs.max_range; > + max_range_high = H32(max_range); > + max_range_low = L32(max_range); > + > + /* > + * As IIO Val types have no provision for 64 bit values, > + * and currently there are no known sensors using 64 bit > + * for the range, this driver only supports sensor with > + * 32 bit range value. > + */ This comment and the corresponding one in scmi_iio_get_sensor_min_range() seem to be misleading to me. The extrema will probably exceed 32 bits even for physical sensors which do have less than 32 bits of resolution. The reason is that the SCMI sensor management protocol does not transmit a `scale' value as used by IIO. Instead, SCMI transmits an exponent with base ten. So, an SCMI sensor with a unit/LSB value which is not a power of ten will have its unit/LSB value split into an exponent (with base ten) and a mantissa. The SCMI platform (which exposes the physical sensor) will have to incorporate the mantissa into the sensor value. The simplest approach is to just multiply the mantissa with the raw physical sensor value, which will use more bits than the raw physical sensor value, depending on the unit/LSB value (and on the split of the unit/LSB value into exponent and mantissa). So I think the comment should at least make clear that the overflow may also happen for physical sensors with less than 32 bit of resolution, since it cannot be assumed that SCMI platforms will, without any apparent need, restrict the values to 32 bits, when that would mean additional complexity and potential loss of accuracy. (And in the long term this driver could IMHO try to handle a greater value range by adjusting the `scale' value accordingly.) Best regards, Peter > + if (max_range_high != 0) > + return -EINVAL; > + > + *val = max_range_low; > + *val2 = 1; > + } > + return 0; > +} > + > +static void scmi_iio_get_sensor_resolution(struct iio_dev *iio_dev, int *val, > +int *val2) > +{ > + struct scmi_iio_priv *sensor = iio_priv(iio_dev); > + > + /* > + * All the axes are supposed to have the same value for resolution > + * and exponent. We are just using the values from the Axis 0 here. > + */ > + if (sensor->sensor_info->axis[0].extended_attrs) { > + uint resolution = sensor->sensor_info->axis[0].resolution; > + s8 exponent = sensor->sensor_info->axis[0].exponent; > + s8 scale = sensor->sensor_info->axis[0].scale; > + > + /* > + * To provide the raw value for the resolution to the userspace, > + * need to divide the resolution exponent by the sensor scale > + */ > + exponent = exponent - scale; > + if (exponent >= 0) { > + *val = resolution * int_pow(10, exponent); > + *val2 = 1; > + } else { > + *val = resolution; > + *val2 = int_pow(10, abs(exponent)); > + } > + } > +} > + > +static int scmi_iio_get_sensor_min_range(struct iio_dev *iio_dev, int *val, > + int *val2) > +{ > + struct scmi_iio_priv *sensor = iio_priv(iio_dev); > + int min_range_high, min_range_low; > + long long min_range; > + > + /* > + * All the axes are supposed to have the same value for min range. > + * We are just using the values from the Axis 0 here. > + */ > + if (sensor->sensor_info->axis[0].extended_attrs) { > + min_range = sensor->sensor_info->axis[0].attrs.min_range; > + min_range_high = H32(min_range); > + min_range_low = L32(min_range); > + > + /* > + * As IIO Val types have no provision for 64 bit values, > + * and currently there are no known sensors using 64 bit > + * for the range, this driver only supports sensor with > + * 32 bit range value. > + */ > + if (min_range_high != 0x) > + return -EINVAL; > + > + *val = min_range_low; > + *val2 = 1; > + } > + return 0; > +} > + > +static int scmi_iio_set_sensor_range_avail(struct iio_dev *iio_dev) > +{ > +
Re: [RFC PATCH v2 10/10] firmware: arm_scmi: Add virtio transport
On 10.11.20 22:32, Cristian Marussi wrote: > Hi Peter/Igor, > > I went through this series while trying to grasp a bit more of all the > virtio inner workings and needs and I'll leave a few detailed comments > down below. > > In short at first, I think I can say that there are a couple of places > where I noticed you had to play all sort of tricks to fit into the current > SCMI transport layer frameowrk or to avoid adding DT references. > > It's clear that we'll have to somehow extend/fix/abstract the SCMI > transport layer (in my opinion), because I doubt that some of the tricks > you played would be well received for upstream ever (...but it's worth > noting it's not up to me saying the last...) > > So in these days I'm trying to play with this series and the SCMI > stack to see if I can extend it in a more sensible way to fit some of > the observations I make down below (now transport core layer is pretty > much shmem/mailbox oriented)...still nothing to share anyway as of now. Hi Cristian, thanks for your review. I agree that some changes to the patch series are needed to move it beyond RFC state. Please see individual responses below. Best regards, Peter > > > On Thu, Nov 05, 2020 at 10:21:16PM +0100, Peter Hilber wrote: >> From: Igor Skalkin >> >> This transport enables accessing an SCMI platform as a virtio device. >> >> Implement an SCMI virtio driver according to the virtio SCMI device spec >> patch v5 [1]. Virtio device id 32 has been reserved for the SCMI device >> [2]. >> >> The virtio transport has one tx channel (virtio cmdq, A2P channel) and >> at most one rx channel (virtio eventq, P2A channel). >> >> The following feature bit defined in [1] is not implemented: >> VIRTIO_SCMI_F_SHARED_MEMORY. >> >> After the preparatory patches, implement the virtio transport as >> paraphrased: >> >> Only support a single arm-scmi device (which is consistent with the SCMI >> spec). Call scmi-virtio init from arm-scmi module init. During the >> arm-scmi probing, link to the first probed scmi-virtio device. Defer >> arm-scmi probing if no scmi-virtio device is bound yet. >> >> Use the scmi_xfer tx/rx buffers for data exchange with the virtio device >> in order to avoid redundant maintenance of additional buffers. Allocate >> the buffers in the SCMI transport, and prepend room for a small header >> used by the virtio transport to the tx/rx buffers. >> >> For simplicity, restrict the number of messages which can be pending >> simultaneously according to the virtqueue capacity. (The virtqueue sizes >> are negotiated with the virtio device.) >> >> As soon as rx channel message buffers are allocated or have been read >> out by the arm-scmi driver, feed them to the virtio device. >> >> Since some virtio devices may not have the short response time exhibited >> by SCMI platforms using other transports, set a generous response >> timeout. >> >> Limitations: >> >> Do not adjust the other SCMI timeouts for delayed response and polling >> for now, since these timeouts are only relevant in special cases which >> are not yet deemed relevant for this transport. >> >> To do (as discussed in the cover letter): >> >> - Avoid re-use of buffers still being used by the virtio device on >> timeouts. >> >> - Avoid race conditions on receiving messages during/after channel free >> on driver probe failure or remove. >> >> [1] https://lists.oasis-open.org/archives/virtio-comment/202005/msg00096.html >> [2] https://www.oasis-open.org/committees/ballot.php?id=3496 >> >> Co-developed-by: Peter Hilber >> Signed-off-by: Peter Hilber >> Signed-off-by: Igor Skalkin >> --- >> MAINTAINERS| 1 + >> drivers/firmware/Kconfig | 12 +- >> drivers/firmware/arm_scmi/Makefile | 1 + >> drivers/firmware/arm_scmi/common.h | 14 + >> drivers/firmware/arm_scmi/driver.c | 11 + >> drivers/firmware/arm_scmi/virtio.c | 493 + >> include/uapi/linux/virtio_ids.h| 1 + >> include/uapi/linux/virtio_scmi.h | 41 +++ >> 8 files changed, 573 insertions(+), 1 deletion(-) >> create mode 100644 drivers/firmware/arm_scmi/virtio.c >> create mode 100644 include/uapi/linux/virtio_scmi.h >> >> diff --git a/MAINTAINERS b/MAINTAINERS >> index deaafb617361..8df73d6ddfc1 100644 >> --- a/MAINTAINERS >> +++ b/MAINTAINERS >> @@ -16772,6 +16772,7 @@ F:drivers/firmware/arm_scpi.c >> F: drivers/reset/reset-scmi.c >
Re: [PATCH v2 2/6] firmware: arm_scmi: add SCMIv3.0 Sensors descriptors extensions
Hi Cristian, sorry, I mistakenly used the wrong sender ("Mailing Lists") for the original comment mail. Please see below for my reply. On 10.11.20 18:21, Cristian Marussi wrote: > On Tue, Nov 10, 2020 at 05:00:05PM +0100, Mailing Lists wrote: >> On 26.10.20 21:10, Cristian Marussi wrote: >>> Add support for new SCMIv3.0 Sensors extensions related to new sensors' >>> features, like multiple axis and update intervals, while keeping >>> compatibility with SCMIv2.0 features. >>> While at that, refactor and simplify all the internal helpers macros and >>> move struct scmi_sensor_info to use only non-fixed-size typing. >>> >>> Signed-off-by: Cristian Marussi >>> --- >>> v1 --> v2 >>> - restrict segmented intervals descriptors to single triplet >>> - add proper usage of scmi_reset_rx_to_maxsz >>> --- >>> drivers/firmware/arm_scmi/sensors.c | 391 ++-- >>> include/linux/scmi_protocol.h | 219 +++- >>> 2 files changed, 584 insertions(+), 26 deletions(-) >>> >>> diff --git a/drivers/firmware/arm_scmi/sensors.c >>> b/drivers/firmware/arm_scmi/sensors.c >>> index 6aaff478d032..5a18f8c84bef 100644 >>> --- a/drivers/firmware/arm_scmi/sensors.c >>> +++ b/drivers/firmware/arm_scmi/sensors.c >>> @@ -7,16 +7,21 @@ >>> >>> #define pr_fmt(fmt) "SCMI Notifications SENSOR - " fmt >>> >>> +#include >>> #include >>> >>> #include "common.h" >>> #include "notify.h" >>> >>> +#define SCMI_MAX_NUM_SENSOR_AXIS 64 >> >> IMHO the related 6 bit wide fields, like SENSOR_DESCRIPTION_GET "Number >> of axes", should determine the maximum value, so 64 -> 63. >> > > Yes, bits [21:16] 'Number of Axes' in sensor_attributes_high, but this > #define was meant to represent the maximum number of sensor axis (64...ranging > from 0 to 63) not the highest possible numbered (63). > But in my understanding the actual maximum number of sensor axes is 63 due to the maximum value 63 of 'Number of Axes', 64 would overflow already. The ids would range from 0 to 62. That said, in my understanding there is no problem with retaining a higher value ATM. Best regards, Peter
Re: [PATCH v2 4/6] firmware: arm_scmi: add SCMIv3.0 Sensors timestamped reads
On 10.11.20 18:04, Cristian Marussi wrote: > Hi Peter > > thanks for the review. > > On Tue, Nov 10, 2020 at 05:01:26PM +0100, Peter Hilber wrote: >> On 26.10.20 21:10, Cristian Marussi wrote: >>> Add new .reading_get_timestamped() method to sensor_ops to support SCMIv3.0 >>> timestamped reads. >>> >>> Signed-off-by: Cristian Marussi >>> --- >>> drivers/firmware/arm_scmi/sensors.c | 134 ++-- >>> include/linux/scmi_protocol.h | 22 + >>> 2 files changed, 151 insertions(+), 5 deletions(-) >>> >>> diff --git a/drivers/firmware/arm_scmi/sensors.c >>> b/drivers/firmware/arm_scmi/sensors.c >>> index 5a18f8c84bef..bdb0ed0df683 100644 >>> --- a/drivers/firmware/arm_scmi/sensors.c >>> +++ b/drivers/firmware/arm_scmi/sensors.c >>> @@ -156,6 +156,27 @@ struct scmi_msg_sensor_reading_get { >>> #define SENSOR_READ_ASYNC BIT(0) >>> }; >>> >>> +struct scmi_resp_sensor_reading_get { >>> + __le64 readings; >>> +}; >>> + >>> +struct scmi_resp_sensor_reading_complete { >>> + __le32 id; >>> + __le64 readings; >>> +}; >> >> In my understanding the id field is not present in the spec. The >> implementation seems to have introduced it already before this patch. >> > > Well, it is indeed defined in 4.7.3.1 "SENSOR_READING_COMPLETE" both in > SCMI V3.0 and in V2.0: it is the async delayed response and this 'id' > represents the sensor_id: in fact it is used only the in the async path > in the reading funcs; the sync version uses directly sensor_reading_le. > (which has no id n it) You are right, sorry for the noise. >>> +/** >>> + * scmi_sensor_reading_get - Read scalar sensor value >>> + * @handle: Platform handle >>> + * @sensor_id: Sensor ID >>> + * @value: The 64bit value sensor reading >>> + * >>> + * This function returns a single 64 bit reading value representing the >>> sensor >>> + * value; if the platform SCMI Protocol implementation and the sensor >>> support >>> + * multiple axis and timestamped-reads, this just returns the first axis >>> while >>> + * dropping the timestamp value. >>> + * Use instead the @scmi_sensor_reading_get_timestamped to retrieve the >>> array of >>> + * timestamped multi-axis values. >>> + * >>> + * Return: 0 on Success >>> + */ >>> static int scmi_sensor_reading_get(const struct scmi_handle *handle, >>>u32 sensor_id, u64 *value) >>> { >>> @@ -593,18 +629,105 @@ static int scmi_sensor_reading_get(const struct >>> scmi_handle *handle, >> >> How about changing the scmi_xfer_get_init() rx_size to 0 (in the >> immediately preceding, not shown lines)? An SCMI platform might not >> expect to just have room for the first reading, excluding the timestamp. >> > > Ah right, because this is the old v2.0 interface which I kept unchanged but > now internally the same v3.0 SENSOR_READING_GET message on a v3.0 platform > could return multiple per-axis timestamped values even if I just return > the first u64 without timestamp. Is this that you mean ? Yes. Best regards, Peter
Re: [PATCH v2 6/6] firmware: arm_scmi: add SCMIv3.0 Sensor notifications
On 26.10.20 21:10, Cristian Marussi wrote: > Add support for new SCMIv3.0 SENSOR_UPDATE notification. > > Signed-off-by: Cristian Marussi > --- > drivers/firmware/arm_scmi/sensors.c | 124 > include/linux/scmi_protocol.h | 9 ++ > 2 files changed, 116 insertions(+), 17 deletions(-) > > diff --git a/drivers/firmware/arm_scmi/sensors.c > b/drivers/firmware/arm_scmi/sensors.c > index 372a3592e99b..51921e279c9f 100644 > --- a/drivers/firmware/arm_scmi/sensors.c > +++ b/drivers/firmware/arm_scmi/sensors.c > @@ -24,6 +24,7 @@ enum scmi_sensor_protocol_cmd { > SENSOR_LIST_UPDATE_INTERVALS = 0x8, > SENSOR_CONFIG_GET = 0x9, > SENSOR_CONFIG_SET = 0xA, > + SENSOR_CONTINUOUS_UPDATE_NOTIFY = 0xB, > }; > > struct scmi_msg_resp_sensor_attributes { > @@ -133,10 +134,10 @@ struct scmi_msg_resp_sensor_list_update_intervals { > __le32 intervals[]; > }; > > -struct scmi_msg_sensor_trip_point_notify { > +struct scmi_msg_sensor_request_notify { > __le32 id; > __le32 event_control; > -#define SENSOR_TP_NOTIFY_ALL BIT(0) > +#define SENSOR_NOTIFY_ALLBIT(0) > }; > > struct scmi_msg_set_sensor_trip_point { > @@ -198,6 +199,17 @@ struct scmi_sensor_trip_notify_payld { > __le32 trip_point_desc; > }; > > +struct scmi_msg_sensor_continuous_update_notify { > + __le32 id; > + __le32 event_control; > +}; This struct appears unused and redundant to struct scmi_msg_sensor_request_notify. [...] > @@ -850,20 +892,58 @@ static void *scmi_sensor_fill_custom_report(const > struct scmi_handle *handle, > const void *payld, size_t payld_sz, > void *report, u32 *src_id) > { > + void *rep = NULL; > const struct scmi_sensor_trip_notify_payld *p = payld; > struct scmi_sensor_trip_point_report *r = report; Above two variables should be moved into the first case block. Best regards, Peter > > - if (evt_id != SCMI_EVENT_SENSOR_TRIP_POINT_EVENT || > - sizeof(*p) != payld_sz) > - return NULL; > + switch (evt_id) { > + case SCMI_EVENT_SENSOR_TRIP_POINT_EVENT: > + { > + if (sizeof(*p) != payld_sz) > + break; > > - r->timestamp = timestamp; > - r->agent_id = le32_to_cpu(p->agent_id); > - r->sensor_id = le32_to_cpu(p->sensor_id); > - r->trip_point_desc = le32_to_cpu(p->trip_point_desc); > - *src_id = r->sensor_id; > + r->timestamp = timestamp; > + r->agent_id = le32_to_cpu(p->agent_id); > + r->sensor_id = le32_to_cpu(p->sensor_id); > + r->trip_point_desc = le32_to_cpu(p->trip_point_desc); > + *src_id = r->sensor_id; > + rep = r; > + break; > + } > + case SCMI_EVENT_SENSOR_UPDATE: > + { > + int i; > + struct scmi_sensor_info *s; > + const struct scmi_sensor_update_notify_payld *p = payld; > + struct scmi_sensor_update_report *r = report; > + struct sensors_info *sinfo = handle->sensor_priv; > + > + /* payld_sz is variable for this event */ > + r->sensor_id = le32_to_cpu(p->sensor_id); > + if (r->sensor_id >= sinfo->num_sensors) > + break; > + r->timestamp = timestamp; > + r->agent_id = le32_to_cpu(p->agent_id); > + s = &sinfo->sensors[r->sensor_id]; > + /* > + * The generated report r (@struct scmi_sensor_update_report) > + * was pre-allocated to contain up to SCMI_MAX_NUM_SENSOR_AXIS > + * readings: here it is filled with the effective @num_axis > + * readings defined for this sensor or 1 for scalar sensors. > + */ > + r->readings_count = s->num_axis ?: 1; > + for (i = 0; i < r->readings_count; i++) > + scmi_parse_sensor_readings(&r->readings[i], > +&p->readings[i]); > + *src_id = r->sensor_id; > + rep = r; > + break; > + } > + default: > + break; > + } > > - return r; > + return rep; > }
Re: [PATCH v2 4/6] firmware: arm_scmi: add SCMIv3.0 Sensors timestamped reads
On 26.10.20 21:10, Cristian Marussi wrote: > Add new .reading_get_timestamped() method to sensor_ops to support SCMIv3.0 > timestamped reads. > > Signed-off-by: Cristian Marussi > --- > drivers/firmware/arm_scmi/sensors.c | 134 ++-- > include/linux/scmi_protocol.h | 22 + > 2 files changed, 151 insertions(+), 5 deletions(-) > > diff --git a/drivers/firmware/arm_scmi/sensors.c > b/drivers/firmware/arm_scmi/sensors.c > index 5a18f8c84bef..bdb0ed0df683 100644 > --- a/drivers/firmware/arm_scmi/sensors.c > +++ b/drivers/firmware/arm_scmi/sensors.c > @@ -156,6 +156,27 @@ struct scmi_msg_sensor_reading_get { > #define SENSOR_READ_ASYNCBIT(0) > }; > > +struct scmi_resp_sensor_reading_get { > + __le64 readings; > +}; > + > +struct scmi_resp_sensor_reading_complete { > + __le32 id; > + __le64 readings; > +}; In my understanding the id field is not present in the spec. The implementation seems to have introduced it already before this patch. > + > +struct scmi_sensor_reading_le { > + __le32 sensor_value_low; > + __le32 sensor_value_high; > + __le32 timestamp_low; > + __le32 timestamp_high; > +}; > + > +struct scmi_resp_sensor_reading_complete_v3 { > + __le32 id; > + struct scmi_sensor_reading_le readings[]; > +}; As above, IMHO the id field is not present in the spec. > + > struct scmi_sensor_trip_notify_payld { > __le32 agent_id; > __le32 sensor_id; > @@ -576,6 +597,21 @@ scmi_sensor_trip_point_config(const struct scmi_handle > *handle, u32 sensor_id, > return ret; > } > > +/** > + * scmi_sensor_reading_get - Read scalar sensor value > + * @handle: Platform handle > + * @sensor_id: Sensor ID > + * @value: The 64bit value sensor reading > + * > + * This function returns a single 64 bit reading value representing the > sensor > + * value; if the platform SCMI Protocol implementation and the sensor support > + * multiple axis and timestamped-reads, this just returns the first axis > while > + * dropping the timestamp value. > + * Use instead the @scmi_sensor_reading_get_timestamped to retrieve the > array of > + * timestamped multi-axis values. > + * > + * Return: 0 on Success > + */ > static int scmi_sensor_reading_get(const struct scmi_handle *handle, > u32 sensor_id, u64 *value) > { > @@ -593,18 +629,105 @@ static int scmi_sensor_reading_get(const struct > scmi_handle *handle, How about changing the scmi_xfer_get_init() rx_size to 0 (in the immediately preceding, not shown lines)? An SCMI platform might not expect to just have room for the first reading, excluding the timestamp. > > sensor = t->tx.buf; > sensor->id = cpu_to_le32(sensor_id); > + if (s->async) { > + sensor->flags = cpu_to_le32(SENSOR_READ_ASYNC); > + ret = scmi_do_xfer_with_response(handle, t); > + if (!ret) { > + struct scmi_resp_sensor_reading_complete *resp; > + > + resp = t->rx.buf; > + if (le32_to_cpu(resp->id) == sensor_id) > + *value = le64_to_cpu(resp->readings); Maybe this le64_to_cpu() and the one a few lines below should be replaced by get_unaligned_le64()? > + else > + ret = -EPROTO; > + } > + } else { > + sensor->flags = cpu_to_le32(0); > + ret = scmi_do_xfer(handle, t); > + if (!ret) { > + struct scmi_resp_sensor_reading_get *resp; > + > + resp = t->rx.buf; > + *value = le64_to_cpu(resp->readings); > + } > + } > > + scmi_xfer_put(handle, t); > + return ret; > +} > + [...] > + > /** > * scmi_range_attrs - specifies a sensor or axis values' range > * @min_range: The minimum value which can be represented by the sensor/axis. > @@ -387,6 +401,11 @@ enum scmi_sensor_class { > * @info_get: get the information of the specified sensor > * @trip_point_config: selects and configures a trip-point of interest > * @reading_get: gets the current value of the sensor > + * @reading_get_timestamped: gets the current value and timestamp, when > + *available, of the sensor. (as of v2.1 spec) Nitpicking: v2.1 -> v3.0 > + *Supports multi-axis sensors for sensors which > + *supports it and if the @reading array size of > + *@count entry equals the sensor num_axis > */ > struct scmi_sensor_ops { > int (*count_get)(const struct scmi_handle *handle); > @@ -396,6 +415,9 @@ struct scmi_sensor_ops { >u32 sensor_id, u8 trip_id, u64 trip_value); > int (*reading_get)(const struct scmi_handle *handle, u32 sensor_id, > u64 *value); > + int (*reading_get_timestamped)(const struct scmi_handle *handle, > +
[RFC PATCH v2 05/10] firmware: arm_scmi: Add xfer_init_buffers transport op
From: Igor Skalkin The virtio transport in this patch series can be simplified by using the scmi_xfer tx/rx buffers for data exchange with the virtio device, and for saving the message state. But the virtio transport requires prepending a transport-specific header. Also, for data exchange using virtqueues, the tx and rx buffers should not overlap. After the previous patch, this is the second and final step to enable the virtio transport to use the scmi_xfer buffers for data exchange. Add an optional op through which the transport can allocate the tx/rx buffers along with room for the prepended transport-specific headers. Co-developed-by: Peter Hilber Signed-off-by: Peter Hilber Signed-off-by: Igor Skalkin --- drivers/firmware/arm_scmi/common.h | 3 +++ drivers/firmware/arm_scmi/driver.c | 21 +++-- 2 files changed, 18 insertions(+), 6 deletions(-) diff --git a/drivers/firmware/arm_scmi/common.h b/drivers/firmware/arm_scmi/common.h index c998ec29018e..ae5db602e45d 100644 --- a/drivers/firmware/arm_scmi/common.h +++ b/drivers/firmware/arm_scmi/common.h @@ -208,6 +208,7 @@ struct scmi_chan_info { * @get_max_msg: Optional callback to provide max_msg dynamically * @max_msg: Maximum number of messages for the channel type (tx or rx) * that can be pending simultaneously in the system + * @xfer_init_buffers: Callback to initialize buffers for scmi_xfer * @send_message: Callback to send a message * @mark_txdone: Callback to mark tx as done * @fetch_response: Callback to fetch response @@ -222,6 +223,8 @@ struct scmi_transport_ops { int (*chan_free)(int id, void *p, void *data); int (*get_max_msg)(bool tx, struct scmi_chan_info *base_cinfo, int *max_msg); + int (*xfer_init_buffers)(struct scmi_chan_info *cinfo, +struct scmi_xfer *xfer, int max_msg_size); int (*send_message)(struct scmi_chan_info *cinfo, struct scmi_xfer *xfer); void (*mark_txdone)(struct scmi_chan_info *cinfo, int ret); diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c index e2faa526dfce..d7ad5a77b9dc 100644 --- a/drivers/firmware/arm_scmi/driver.c +++ b/drivers/firmware/arm_scmi/driver.c @@ -640,12 +640,21 @@ static int __scmi_xfer_info_init(struct scmi_info *sinfo, /* Pre-initialize the buffer pointer to pre-allocated buffers */ for (i = 0, xfer = info->xfer_block; i < info->max_msg; i++, xfer++) { - xfer->rx.buf = devm_kcalloc(dev, sizeof(u8), desc->max_msg_size, - GFP_KERNEL); - if (!xfer->rx.buf) - return -ENOMEM; - - xfer->tx.buf = xfer->rx.buf; + if (desc->ops->xfer_init_buffers) { + int ret = desc->ops->xfer_init_buffers( + base_cinfo, xfer, desc->max_msg_size); + + if (ret) + return ret; + } else { + xfer->rx.buf = devm_kcalloc(dev, sizeof(u8), + desc->max_msg_size, + GFP_KERNEL); + if (!xfer->rx.buf) + return -ENOMEM; + + xfer->tx.buf = xfer->rx.buf; + } init_completion(&xfer->done); } -- 2.25.1
[RFC PATCH v2 01/10] firmware: arm_scmi, smccc, mailbox: Make shmem based transports optional
From: Igor Skalkin Upon adding the virtio transport in this patch series, SCMI will also work without shared memory based transports. Also, the mailbox transport may not be needed if the smc transport is used. - Compile shmem.c only if a shmem based transport is available. - Remove hard dependency of SCMI on mailbox. Co-developed-by: Peter Hilber Signed-off-by: Peter Hilber Signed-off-by: Igor Skalkin --- drivers/firmware/Kconfig | 9 - drivers/firmware/arm_scmi/Makefile | 2 +- drivers/firmware/arm_scmi/common.h | 2 ++ drivers/firmware/arm_scmi/driver.c | 2 ++ drivers/firmware/smccc/Kconfig | 1 + drivers/mailbox/Kconfig| 1 + 6 files changed, 15 insertions(+), 2 deletions(-) diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig index afdbebba628a..bdde51adb267 100644 --- a/drivers/firmware/Kconfig +++ b/drivers/firmware/Kconfig @@ -9,7 +9,7 @@ menu "Firmware Drivers" config ARM_SCMI_PROTOCOL tristate "ARM System Control and Management Interface (SCMI) Message Protocol" depends on ARM || ARM64 || COMPILE_TEST - depends on MAILBOX + depends on ARM_SCMI_HAVE_SHMEM help ARM System Control and Management Interface (SCMI) protocol is a set of operating system-independent software interfaces that are @@ -27,6 +27,13 @@ config ARM_SCMI_PROTOCOL This protocol library provides interface for all the client drivers making use of the features offered by the SCMI. +config ARM_SCMI_HAVE_SHMEM + bool + default n + help + This declares whether a shared memory based transport for SCMI is + available. + config ARM_SCMI_POWER_DOMAIN tristate "SCMI power domain driver" depends on ARM_SCMI_PROTOCOL || (COMPILE_TEST && OF) diff --git a/drivers/firmware/arm_scmi/Makefile b/drivers/firmware/arm_scmi/Makefile index bc0d54f8e861..3cc7fa40a464 100644 --- a/drivers/firmware/arm_scmi/Makefile +++ b/drivers/firmware/arm_scmi/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0-only scmi-bus-y = bus.o scmi-driver-y = driver.o notify.o -scmi-transport-y = shmem.o +scmi-transport-$(CONFIG_ARM_SCMI_HAVE_SHMEM) = shmem.o scmi-transport-$(CONFIG_MAILBOX) += mailbox.o scmi-transport-$(CONFIG_HAVE_ARM_SMCCC_DISCOVERY) += smc.o scmi-protocols-y = base.o clock.o perf.o power.o reset.o sensors.o system.o diff --git a/drivers/firmware/arm_scmi/common.h b/drivers/firmware/arm_scmi/common.h index 65063fa948d4..aed192238177 100644 --- a/drivers/firmware/arm_scmi/common.h +++ b/drivers/firmware/arm_scmi/common.h @@ -242,7 +242,9 @@ struct scmi_desc { int max_msg_size; }; +#ifdef CONFIG_MAILBOX extern const struct scmi_desc scmi_mailbox_desc; +#endif #ifdef CONFIG_HAVE_ARM_SMCCC extern const struct scmi_desc scmi_smc_desc; #endif diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c index 3dfd8b6a0ebf..9e2f36127793 100644 --- a/drivers/firmware/arm_scmi/driver.c +++ b/drivers/firmware/arm_scmi/driver.c @@ -918,7 +918,9 @@ ATTRIBUTE_GROUPS(versions); /* Each compatible listed below must have descriptor associated with it */ static const struct of_device_id scmi_of_match[] = { +#ifdef CONFIG_MAILBOX { .compatible = "arm,scmi", .data = &scmi_mailbox_desc }, +#endif #ifdef CONFIG_HAVE_ARM_SMCCC_DISCOVERY { .compatible = "arm,scmi-smc", .data = &scmi_smc_desc}, #endif diff --git a/drivers/firmware/smccc/Kconfig b/drivers/firmware/smccc/Kconfig index 15e7466179a6..69c4d6cabf62 100644 --- a/drivers/firmware/smccc/Kconfig +++ b/drivers/firmware/smccc/Kconfig @@ -9,6 +9,7 @@ config HAVE_ARM_SMCCC_DISCOVERY bool depends on ARM_PSCI_FW default y + select ARM_SCMI_HAVE_SHMEM help SMCCC v1.0 lacked discoverability and hence PSCI v1.0 was updated to add SMCCC discovery mechanism though the PSCI firmware diff --git a/drivers/mailbox/Kconfig b/drivers/mailbox/Kconfig index 05b1009e2820..5ffe1ab0c869 100644 --- a/drivers/mailbox/Kconfig +++ b/drivers/mailbox/Kconfig @@ -1,6 +1,7 @@ # SPDX-License-Identifier: GPL-2.0-only menuconfig MAILBOX bool "Mailbox Hardware Support" + select ARM_SCMI_HAVE_SHMEM help Mailbox is a framework to control hardware communication between on-chip processors through queued messages and interrupt driven -- 2.25.1
[RFC PATCH v2 10/10] firmware: arm_scmi: Add virtio transport
From: Igor Skalkin This transport enables accessing an SCMI platform as a virtio device. Implement an SCMI virtio driver according to the virtio SCMI device spec patch v5 [1]. Virtio device id 32 has been reserved for the SCMI device [2]. The virtio transport has one tx channel (virtio cmdq, A2P channel) and at most one rx channel (virtio eventq, P2A channel). The following feature bit defined in [1] is not implemented: VIRTIO_SCMI_F_SHARED_MEMORY. After the preparatory patches, implement the virtio transport as paraphrased: Only support a single arm-scmi device (which is consistent with the SCMI spec). Call scmi-virtio init from arm-scmi module init. During the arm-scmi probing, link to the first probed scmi-virtio device. Defer arm-scmi probing if no scmi-virtio device is bound yet. Use the scmi_xfer tx/rx buffers for data exchange with the virtio device in order to avoid redundant maintenance of additional buffers. Allocate the buffers in the SCMI transport, and prepend room for a small header used by the virtio transport to the tx/rx buffers. For simplicity, restrict the number of messages which can be pending simultaneously according to the virtqueue capacity. (The virtqueue sizes are negotiated with the virtio device.) As soon as rx channel message buffers are allocated or have been read out by the arm-scmi driver, feed them to the virtio device. Since some virtio devices may not have the short response time exhibited by SCMI platforms using other transports, set a generous response timeout. Limitations: Do not adjust the other SCMI timeouts for delayed response and polling for now, since these timeouts are only relevant in special cases which are not yet deemed relevant for this transport. To do (as discussed in the cover letter): - Avoid re-use of buffers still being used by the virtio device on timeouts. - Avoid race conditions on receiving messages during/after channel free on driver probe failure or remove. [1] https://lists.oasis-open.org/archives/virtio-comment/202005/msg00096.html [2] https://www.oasis-open.org/committees/ballot.php?id=3496 Co-developed-by: Peter Hilber Signed-off-by: Peter Hilber Signed-off-by: Igor Skalkin --- MAINTAINERS| 1 + drivers/firmware/Kconfig | 12 +- drivers/firmware/arm_scmi/Makefile | 1 + drivers/firmware/arm_scmi/common.h | 14 + drivers/firmware/arm_scmi/driver.c | 11 + drivers/firmware/arm_scmi/virtio.c | 493 + include/uapi/linux/virtio_ids.h| 1 + include/uapi/linux/virtio_scmi.h | 41 +++ 8 files changed, 573 insertions(+), 1 deletion(-) create mode 100644 drivers/firmware/arm_scmi/virtio.c create mode 100644 include/uapi/linux/virtio_scmi.h diff --git a/MAINTAINERS b/MAINTAINERS index deaafb617361..8df73d6ddfc1 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16772,6 +16772,7 @@ F: drivers/firmware/arm_scpi.c F: drivers/reset/reset-scmi.c F: include/linux/sc[mp]i_protocol.h F: include/trace/events/scmi.h +F: include/uapi/linux/virtio_scmi.h SYSTEM RESET/SHUTDOWN DRIVERS M: Sebastian Reichel diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig index bdde51adb267..c4bdd84f7405 100644 --- a/drivers/firmware/Kconfig +++ b/drivers/firmware/Kconfig @@ -9,7 +9,7 @@ menu "Firmware Drivers" config ARM_SCMI_PROTOCOL tristate "ARM System Control and Management Interface (SCMI) Message Protocol" depends on ARM || ARM64 || COMPILE_TEST - depends on ARM_SCMI_HAVE_SHMEM + depends on ARM_SCMI_HAVE_SHMEM || VIRTIO_SCMI help ARM System Control and Management Interface (SCMI) protocol is a set of operating system-independent software interfaces that are @@ -34,6 +34,16 @@ config ARM_SCMI_HAVE_SHMEM This declares whether a shared memory based transport for SCMI is available. +config VIRTIO_SCMI + bool "Virtio transport for SCMI" + default n + depends on VIRTIO + help + This enables the virtio based transport for SCMI. + + If you want to use the ARM SCMI protocol between the virtio guest and + a host providing a virtio SCMI device, answer Y. + config ARM_SCMI_POWER_DOMAIN tristate "SCMI power domain driver" depends on ARM_SCMI_PROTOCOL || (COMPILE_TEST && OF) diff --git a/drivers/firmware/arm_scmi/Makefile b/drivers/firmware/arm_scmi/Makefile index 3cc7fa40a464..25caea5e1969 100644 --- a/drivers/firmware/arm_scmi/Makefile +++ b/drivers/firmware/arm_scmi/Makefile @@ -4,6 +4,7 @@ scmi-driver-y = driver.o notify.o scmi-transport-$(CONFIG_ARM_SCMI_HAVE_SHMEM) = shmem.o scmi-transport-$(CONFIG_MAILBOX) += mailbox.o scmi-transport-$(CONFIG_HAVE_ARM_SMCCC_DISCOVERY) += smc.o +scmi-transport-$(CONFIG_VIRTIO_SCMI) += virtio.o scmi-protocols-y = base.o clock.o perf.o power.o reset.o sensors.o system.o scmi-module-objs := $
[RFC PATCH v2 04/10] firmware: arm_scmi: Add per message transport data
From: Igor Skalkin The virtio transport in this patch series can be simplified by using the scmi_xfer tx/rx buffers for data exchange with the virtio device, and for saving the message state. But the virtio transport requires prepending a transport-specific header. Also, for data exchange using virtqueues, the tx and rx buffers should not overlap. The first step to solve the aforementioned issues is to add a transport-specific data pointer to scmi_xfer. Co-developed-by: Peter Hilber Signed-off-by: Peter Hilber Signed-off-by: Igor Skalkin --- drivers/firmware/arm_scmi/common.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/firmware/arm_scmi/common.h b/drivers/firmware/arm_scmi/common.h index 9a8359ecd220..c998ec29018e 100644 --- a/drivers/firmware/arm_scmi/common.h +++ b/drivers/firmware/arm_scmi/common.h @@ -131,6 +131,7 @@ struct scmi_msg { * buffer for the rx path as we use for the tx path. * @done: command message transmit completion event * @async_done: pointer to delayed response message received event completion + * @extra_data: Transport-specific private data pointer */ struct scmi_xfer { int transfer_id; @@ -139,6 +140,7 @@ struct scmi_xfer { struct scmi_msg rx; struct completion done; struct completion *async_done; + void *extra_data; }; void scmi_xfer_put(const struct scmi_handle *h, struct scmi_xfer *xfer); -- 2.25.1
[RFC PATCH v2 08/10] firmware: arm_scmi: Add is_scmi_protocol_device()
The scmi-virtio transport driver will need to distinguish SCMI protocol devices from the SCMI instance device in the chan_setup() and chan_free() ops. Add this internal helper to be able to distinguish the two. Signed-off-by: Peter Hilber --- drivers/firmware/arm_scmi/bus.c| 5 + drivers/firmware/arm_scmi/common.h | 2 ++ 2 files changed, 7 insertions(+) diff --git a/drivers/firmware/arm_scmi/bus.c b/drivers/firmware/arm_scmi/bus.c index 1377ec76a45d..4f19faafb2c5 100644 --- a/drivers/firmware/arm_scmi/bus.c +++ b/drivers/firmware/arm_scmi/bus.c @@ -108,6 +108,11 @@ static struct bus_type scmi_bus_type = { .remove = scmi_dev_remove, }; +bool is_scmi_protocol_device(struct device *dev) +{ + return dev->bus == &scmi_bus_type; +} + int scmi_driver_register(struct scmi_driver *driver, struct module *owner, const char *mod_name) { diff --git a/drivers/firmware/arm_scmi/common.h b/drivers/firmware/arm_scmi/common.h index ec9fd7fce3c7..13c9ac176b23 100644 --- a/drivers/firmware/arm_scmi/common.h +++ b/drivers/firmware/arm_scmi/common.h @@ -158,6 +158,8 @@ int scmi_version_get(const struct scmi_handle *h, u8 protocol, u32 *version); void scmi_setup_protocol_implemented(const struct scmi_handle *handle, u8 *prot_imp); +bool is_scmi_protocol_device(struct device *dev); + int scmi_base_protocol_init(struct scmi_handle *h); int __init scmi_bus_init(void); -- 2.25.1
[RFC PATCH v2 07/10] firmware: arm_scmi: Add per-device transport private info
The scmi-virtio transport will link a supplier device to the arm-scmi device in the link_supplier() op. The transport should then save a pointer to the linked device. To enable this, add a transport private info to the scmi_info. (The scmi_info is already reachable through the arm-scmi device driver_data.) Signed-off-by: Peter Hilber --- drivers/firmware/arm_scmi/common.h | 2 ++ drivers/firmware/arm_scmi/driver.c | 35 ++ 2 files changed, 37 insertions(+) diff --git a/drivers/firmware/arm_scmi/common.h b/drivers/firmware/arm_scmi/common.h index 2f55ac71555a..ec9fd7fce3c7 100644 --- a/drivers/firmware/arm_scmi/common.h +++ b/drivers/firmware/arm_scmi/common.h @@ -262,6 +262,8 @@ extern const struct scmi_desc scmi_mailbox_desc; extern const struct scmi_desc scmi_smc_desc; #endif +int scmi_set_transport_info(struct device *dev, void *transport_info); +void *scmi_get_transport_info(struct device *dev); void scmi_rx_callback(struct scmi_chan_info *cinfo, u32 msg_hdr); void scmi_free_channel(struct scmi_chan_info *cinfo, struct idr *idr, int id); diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c index dedc9b3f3e97..244141e45e88 100644 --- a/drivers/firmware/arm_scmi/driver.c +++ b/drivers/firmware/arm_scmi/driver.c @@ -84,6 +84,7 @@ struct scmi_xfers_info { * @rx_idr: IDR object to map protocol id to Rx channel info pointer * @protocols_imp: List of protocols implemented, currently maximum of * MAX_PROTOCOLS_IMP elements allocated by the base protocol + * @transport_info: Transport private info * @node: List head * @users: Number of users of this instance */ @@ -97,6 +98,7 @@ struct scmi_info { struct idr tx_idr; struct idr rx_idr; u8 *protocols_imp; + void *transport_info; struct list_head node; int users; }; @@ -315,6 +317,39 @@ void scmi_rx_callback(struct scmi_chan_info *cinfo, u32 msg_hdr) } } +/** + * scmi_set_transport_info() - Set transport private info + * + * @dev: SCMI instance device + * @transport_info: transport private info + * + * Return: 0 on success, otherwise error. + */ +int scmi_set_transport_info(struct device *dev, void *transport_info) +{ + struct scmi_info *info = dev_get_drvdata(dev); + + if (!info) + return -EBADR; + + info->transport_info = transport_info; + return 0; +} + +/** + * scmi_get_transport_info() - Get transport private info + * + * @dev: SCMI instance device + * + * Return: transport private info on success, otherwise NULL. + */ +void *scmi_get_transport_info(struct device *dev) +{ + struct scmi_info *info = dev_get_drvdata(dev); + + return info ? info->transport_info : NULL; +} + /** * scmi_xfer_put() - Release a transmit message * -- 2.25.1
[RFC PATCH v2 06/10] firmware: arm_scmi: Add optional link_supplier() transport op
For the scmi-virtio transport, it might not be possible to refer to the proper virtio device at device tree build time. Therefore, add an op which will allow scmi-virtio to dynamically link to the proper virtio device during probe. Signed-off-by: Peter Hilber --- drivers/firmware/arm_scmi/common.h | 2 ++ drivers/firmware/arm_scmi/driver.c | 6 ++ 2 files changed, 8 insertions(+) diff --git a/drivers/firmware/arm_scmi/common.h b/drivers/firmware/arm_scmi/common.h index ae5db602e45d..2f55ac71555a 100644 --- a/drivers/firmware/arm_scmi/common.h +++ b/drivers/firmware/arm_scmi/common.h @@ -202,6 +202,7 @@ struct scmi_chan_info { /** * struct scmi_transport_ops - Structure representing a SCMI transport ops * + * @link_supplier: Optional callback to add link to a supplier device * @chan_available: Callback to check if channel is available or not * @chan_setup: Callback to allocate and setup a channel * @chan_free: Callback to free a channel @@ -217,6 +218,7 @@ struct scmi_chan_info { * @poll_done: Callback to poll transfer status */ struct scmi_transport_ops { + int (*link_supplier)(struct device *dev); bool (*chan_available)(struct device *dev, int idx); int (*chan_setup)(struct scmi_chan_info *cinfo, struct device *dev, bool tx); diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c index d7ad5a77b9dc..dedc9b3f3e97 100644 --- a/drivers/firmware/arm_scmi/driver.c +++ b/drivers/firmware/arm_scmi/driver.c @@ -826,6 +826,12 @@ static int scmi_probe(struct platform_device *pdev) handle->dev = info->dev; handle->version = &info->version; + if (desc->ops->link_supplier) { + ret = desc->ops->link_supplier(dev); + if (ret) + return ret; + } + ret = scmi_txrx_setup(info, dev, SCMI_PROTOCOL_BASE); if (ret) return ret; -- 2.25.1
[RFC PATCH v2 02/10] firmware: arm_scmi: Document that max_msg is a per channel type limit
From: Igor Skalkin struct scmi_desc.max_msg specifies a limit for the pending messages. This limit is a per SCMI channel type (tx, rx) limit. State that explicitly in the inline documentation. The following patch will add an op to override the limit per channel type. Co-developed-by: Peter Hilber Signed-off-by: Peter Hilber Signed-off-by: Igor Skalkin --- drivers/firmware/arm_scmi/common.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/firmware/arm_scmi/common.h b/drivers/firmware/arm_scmi/common.h index aed192238177..38e6aabbe3dd 100644 --- a/drivers/firmware/arm_scmi/common.h +++ b/drivers/firmware/arm_scmi/common.h @@ -231,8 +231,8 @@ struct scmi_transport_ops { * * @ops: Pointer to the transport specific ops structure * @max_rx_timeout_ms: Timeout for communication with SoC (in Milliseconds) - * @max_msg: Maximum number of messages that can be pending - * simultaneously in the system + * @max_msg: Maximum number of messages for a channel type (tx or rx) that can + * be pending simultaneously in the system * @max_msg_size: Maximum size of data per message that can be handled. */ struct scmi_desc { -- 2.25.1
[RFC PATCH v2 03/10] firmware: arm_scmi: Add op to override max message #
From: Igor Skalkin The number of messages that the upcoming scmi-virtio transport can support depends on the virtio device (SCMI platform) and can differ for each channel. (The scmi-virtio transport does only have one tx and at most 1 rx channel.) Add an optional transport op so that scmi-virtio can report the actual max message # for each channel type. Respect these new limits. Co-developed-by: Peter Hilber Signed-off-by: Peter Hilber Signed-off-by: Igor Skalkin --- drivers/firmware/arm_scmi/common.h | 8 - drivers/firmware/arm_scmi/driver.c | 49 ++ 2 files changed, 43 insertions(+), 14 deletions(-) diff --git a/drivers/firmware/arm_scmi/common.h b/drivers/firmware/arm_scmi/common.h index 38e6aabbe3dd..9a8359ecd220 100644 --- a/drivers/firmware/arm_scmi/common.h +++ b/drivers/firmware/arm_scmi/common.h @@ -203,6 +203,9 @@ struct scmi_chan_info { * @chan_available: Callback to check if channel is available or not * @chan_setup: Callback to allocate and setup a channel * @chan_free: Callback to free a channel + * @get_max_msg: Optional callback to provide max_msg dynamically + * @max_msg: Maximum number of messages for the channel type (tx or rx) + * that can be pending simultaneously in the system * @send_message: Callback to send a message * @mark_txdone: Callback to mark tx as done * @fetch_response: Callback to fetch response @@ -215,6 +218,8 @@ struct scmi_transport_ops { int (*chan_setup)(struct scmi_chan_info *cinfo, struct device *dev, bool tx); int (*chan_free)(int id, void *p, void *data); + int (*get_max_msg)(bool tx, struct scmi_chan_info *base_cinfo, + int *max_msg); int (*send_message)(struct scmi_chan_info *cinfo, struct scmi_xfer *xfer); void (*mark_txdone)(struct scmi_chan_info *cinfo, int ret); @@ -232,7 +237,8 @@ struct scmi_transport_ops { * @ops: Pointer to the transport specific ops structure * @max_rx_timeout_ms: Timeout for communication with SoC (in Milliseconds) * @max_msg: Maximum number of messages for a channel type (tx or rx) that can - * be pending simultaneously in the system + * be pending simultaneously in the system. May be overridden by the + * get_max_msg op. * @max_msg_size: Maximum size of data per message that can be handled. */ struct scmi_desc { diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c index 9e2f36127793..e2faa526dfce 100644 --- a/drivers/firmware/arm_scmi/driver.c +++ b/drivers/firmware/arm_scmi/driver.c @@ -61,11 +61,13 @@ static atomic_t transfer_last_id; * Index of this bitmap table is also used for message * sequence identifier. * @xfer_lock: Protection for message allocation + * @max_msg: Maximum number of messages that can be pending */ struct scmi_xfers_info { struct scmi_xfer *xfer_block; unsigned long *xfer_alloc_table; spinlock_t xfer_lock; + int max_msg; }; /** @@ -157,13 +159,11 @@ static struct scmi_xfer *scmi_xfer_get(const struct scmi_handle *handle, u16 xfer_id; struct scmi_xfer *xfer; unsigned long flags, bit_pos; - struct scmi_info *info = handle_to_scmi_info(handle); /* Keep the locked section as small as possible */ spin_lock_irqsave(&minfo->xfer_lock, flags); - bit_pos = find_first_zero_bit(minfo->xfer_alloc_table, - info->desc->max_msg); - if (bit_pos == info->desc->max_msg) { + bit_pos = find_first_zero_bit(minfo->xfer_alloc_table, minfo->max_msg); + if (bit_pos == minfo->max_msg) { spin_unlock_irqrestore(&minfo->xfer_lock, flags); return ERR_PTR(-ENOMEM); } @@ -602,32 +602,44 @@ int scmi_handle_put(const struct scmi_handle *handle) } static int __scmi_xfer_info_init(struct scmi_info *sinfo, -struct scmi_xfers_info *info) +struct scmi_xfers_info *info, +bool tx, +struct scmi_chan_info *base_cinfo) { int i; struct scmi_xfer *xfer; struct device *dev = sinfo->dev; const struct scmi_desc *desc = sinfo->desc; + info->max_msg = desc->max_msg; + + if (desc->ops->get_max_msg) { + int ret = + desc->ops->get_max_msg(tx, base_cinfo, &info->max_msg); + + if (ret) + return ret; + } + /* Pre-allocated messages, no more than what hdr.seq can support */ - if (WARN_ON(desc->max_msg >= MSG_TOKEN_MAX)) { + if (WARN_ON(info->max_msg >= MSG_TOKEN_MAX)) { dev_err(dev, "Maximum message of %d exceeds supported %ld\n", -
[RFC PATCH v2 09/10] dt-bindings: arm: Add virtio transport for SCMI
From: Igor Skalkin Document the properties for arm,scmi-virtio compatible nodes. The backing virtio SCMI device is described in patch [1]. [1] https://lists.oasis-open.org/archives/virtio-comment/202005/msg00096.html Co-developed-by: Peter Hilber Signed-off-by: Peter Hilber Signed-off-by: Igor Skalkin --- .../devicetree/bindings/arm/arm,scmi.txt | 35 +-- 1 file changed, 33 insertions(+), 2 deletions(-) diff --git a/Documentation/devicetree/bindings/arm/arm,scmi.txt b/Documentation/devicetree/bindings/arm/arm,scmi.txt index 55deb68230eb..6ded49d82773 100644 --- a/Documentation/devicetree/bindings/arm/arm,scmi.txt +++ b/Documentation/devicetree/bindings/arm/arm,scmi.txt @@ -13,6 +13,9 @@ the device tree. Required properties: The scmi node with the following properties shall be under the /firmware/ node. +Some properties are specific to a transport type. + +shmem-based transports (mailbox, smc/hvc): - compatible : shall be "arm,scmi" or "arm,scmi-smc" for smc/hvc transports - mboxes: List of phandle and mailbox channel specifiers. It should contain @@ -21,6 +24,15 @@ The scmi node with the following properties shall be under the /firmware/ node. supported. - shmem : List of phandle pointing to the shared memory(SHM) area as per generic mailbox client binding. + +Virtio transport: + +- compatible : shall be "arm,scmi-virtio". + +The virtio transport only supports a single device. + +Additional required properties: + - #address-cells : should be '1' if the device has sub-nodes, maps to protocol identifier for a given sub-node. - #size-cells : should be '0' as 'reg' property doesn't have any size @@ -42,7 +54,8 @@ Each protocol supported shall have a sub-node with corresponding compatible as described in the following sections. If the platform supports dedicated communication channel for a particular protocol, the 3 properties namely: mboxes, mbox-names and shmem shall be present in the sub-node corresponding -to that protocol. +to that protocol. The virtio transport does not support dedicated communication +channels. Clock/Performance bindings for the clocks/OPPs based on SCMI Message Protocol @@ -106,7 +119,8 @@ Required sub-node properties: [4] Documentation/devicetree/bindings/sram/sram.yaml [5] Documentation/devicetree/bindings/reset/reset.txt -Example: +Example (mailbox transport): + sram@5000 { compatible = "mmio-sram"; @@ -195,3 +209,20 @@ thermal-zones { ... }; }; + +Example (virtio transport): +--- + +virtio_mmio@4b001000 { + compatible = "virtio,mmio"; + ... +}; + +firmware { + ... + scmi { + compatible = "arm,scmi-virtio"; + ... + +The rest is similar to the mailbox transport example, when omitting the +mailbox/shmem-specific properties. -- 2.25.1
[RFC PATCH v2 00/10] firmware: arm_scmi: Add virtio transport
This series implements an SCMI virtio driver according to the virtio SCMI device spec patch v5 [1], after simple preparatory changes to the existing arm-scmi driver. The virtio transport differs in some respects from the existing shared-memory based SCMI transports. Message timeouts can be a problem if the virtio device (SCMI platform) does not have real-time properties. I set a high message rx timeout value which should work for non real-time virtio devices as well. There are other timeouts for delayed response and polling which were not addressed yet. Delayed responses are not really needed since, with the virtio transport, message responses may be transmitted out of order. Polling doesn't make sense at least for virtio devices without real-time behavior, in my understanding. There are some known issues which will be resolved before removing the RFC tag: - Further work is needed on the scmi_xfer management. Unlike shmem based transports, the virtio transport is actually exchanging messages with the SCMI agent through the scmi_xfer tx and rx buffers. In case of a message rx timeout, the arm-scmi driver could try to re-use the scmi_xfer, while that might still be used by the virtio device. I think part of the scmi_xfers_info bookkeeping could be optionally outsourced to the transport to remediate this. - After arm-scmi driver probe failure, or after remove, the scmi-virtio driver may still try to process and forward message responses from the virtio device. - We must be sure that the virtio transport option (such as virtio over MMIO) is available when the virtio SCMI device is probed. The series is based on for-next/scmi [2], on top of commit b9ceca6be432 ("firmware: arm_scmi: Fix duplicate workqueue name") The series was actually tested with a v5.4 based kernel, with the Base protocol and Sensor management protocol. The virtio SCMI device used was a proprietary implementation by OpenSynergy. Delayed responses were not tested. Changes in RFC v2: - Remove the DT virtio_transport phandle, since the SCMI virtio device may not be known in advance. Instead, use the first suitable probed device. Change due to Rob Herring's comment. Any comments are very welcome. [1] https://lists.oasis-open.org/archives/virtio-comment/202005/msg00096.html [2] https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux.git Igor Skalkin (7): firmware: arm_scmi, smccc, mailbox: Make shmem based transports optional firmware: arm_scmi: Document that max_msg is a per channel type limit firmware: arm_scmi: Add op to override max message # firmware: arm_scmi: Add per message transport data firmware: arm_scmi: Add xfer_init_buffers transport op dt-bindings: arm: Add virtio transport for SCMI firmware: arm_scmi: Add virtio transport Peter Hilber (3): firmware: arm_scmi: Add optional link_supplier() transport op firmware: arm_scmi: Add per-device transport private info firmware: arm_scmi: Add is_scmi_protocol_device() .../devicetree/bindings/arm/arm,scmi.txt | 35 +- MAINTAINERS | 1 + drivers/firmware/Kconfig | 19 +- drivers/firmware/arm_scmi/Makefile| 3 +- drivers/firmware/arm_scmi/bus.c | 5 + drivers/firmware/arm_scmi/common.h| 37 +- drivers/firmware/arm_scmi/driver.c| 124 - drivers/firmware/arm_scmi/virtio.c| 493 ++ drivers/firmware/smccc/Kconfig| 1 + drivers/mailbox/Kconfig | 1 + include/uapi/linux/virtio_ids.h | 1 + include/uapi/linux/virtio_scmi.h | 41 ++ 12 files changed, 736 insertions(+), 25 deletions(-) create mode 100644 drivers/firmware/arm_scmi/virtio.c create mode 100644 include/uapi/linux/virtio_scmi.h base-commit: b9ceca6be43233845be70792be9b5ab315d2e010 -- 2.25.1
Re: [RFC PATCH 6/7] dt-bindings: arm: Add virtio transport for SCMI
On 23.09.20 22:54, Rob Herring wrote: > On Fri, Sep 18, 2020 at 06:55:58PM +0200, Peter Hilber wrote: >> From: Igor Skalkin >> >> Document the properties for arm,scmi-virtio compatible nodes. The >> backing virtio SCMI device is described in patch [1]. >> >> [1] https://lists.oasis-open.org/archives/virtio-comment/202005/msg00096.html >> >> Co-developed-by: Peter Hilber >> Signed-off-by: Peter Hilber >> Signed-off-by: Igor Skalkin >> --- >> .../devicetree/bindings/arm/arm,scmi.txt | 38 ++- >> 1 file changed, 36 insertions(+), 2 deletions(-) >> >> diff --git a/Documentation/devicetree/bindings/arm/arm,scmi.txt >> b/Documentation/devicetree/bindings/arm/arm,scmi.txt >> index 55deb68230eb..844ff3c40a49 100644 >> --- a/Documentation/devicetree/bindings/arm/arm,scmi.txt >> +++ b/Documentation/devicetree/bindings/arm/arm,scmi.txt >> @@ -13,6 +13,9 @@ the device tree. >> Required properties: >> >> The scmi node with the following properties shall be under the /firmware/ >> node. >> +Some properties are specific to a transport type. >> + >> +shmem-based transports (mailbox, smc/hvc): >> >> - compatible : shall be "arm,scmi" or "arm,scmi-smc" for smc/hvc transports >> - mboxes: List of phandle and mailbox channel specifiers. It should contain >> @@ -21,6 +24,17 @@ The scmi node with the following properties shall be >> under the /firmware/ node. >>supported. >> - shmem : List of phandle pointing to the shared memory(SHM) area as per >>generic mailbox client binding. >> + >> +Virtio transport: >> + >> +- compatible : shall be "arm,scmi-virtio". >> +- virtio_transport : phandle of the virtio device. This device must support >> one >> + virtqueue for transmitting commands ("tx", cmdq), and, >> + optionally, one more virtqueue for receiving notifications >> + and delayed responses ("rx", eventq). > > Isn't what the virtio device provides discoverable? We don't have virtio > protocols in DT for anything else. Why is SCMI special? > > Rob > Does your comment refer to the presence of the `virtio_transport' phandle, or to the entire "arm,scmi-virtio" node? The protocol child nodes of "arm,scmi-virtio" can be clock providers, power domain providers, etc. The author and me are currently looking into how to replace the `virtio_transport' phandle. Best regards, Peter
[RFC PATCH 7/7] firmware: arm_scmi: Add virtio transport
From: Igor Skalkin This transport enables accessing an SCMI platform as a virtio device. Implement an SCMI virtio driver according to the virtio SCMI device spec patch v5 [1]. Virtio device id 32 has been reserved for the SCMI device [2]. The virtio transport has one tx channel (virtio cmdq, A2P channel) and at most one rx channel (virtio eventq, P2A channel). The following feature bit defined in [1] is not implemented: VIRTIO_SCMI_F_SHARED_MEMORY. After the preparatory patches, implement the virtio transport as paraphrased: Call scmi-virtio init from arm-scmi to probe for the virtio device before arm-scmi will call any transport ops. Use the scmi_xfer tx/rx buffers for data exchange with the virtio device in order to avoid redundant maintenance of additional buffers. Allocate the buffers in the SCMI transport, and prepend room for a small header used by the virtio transport to the tx/rx buffers. For simplicity, restrict the number of messages which can be pending simultaneously according to the virtqueue capacity. (The virtqueue sizes are negotiated with the virtio device.) As soon as rx channel message buffers are allocated or have been read out by the arm-scmi driver, feed them to the virtio device. Since some virtio devices may not have the short response time exhibited by SCMI platforms using other transports, set a generous response timeout. Limitations: Do not adjust the other SCMI timeouts for delayed response and polling for now, since these timeouts are only relevant in special cases which are not yet deemed relevant for this transport. To do (as discussed in the cover letter): - Avoid re-use of buffers still being used by the virtio device on timeouts. - Avoid race conditions on receiving messages during/after channel free on driver probe failure or remove. [1] https://lists.oasis-open.org/archives/virtio-comment/202005/msg00096.html [2] https://www.oasis-open.org/committees/ballot.php?id=3496 Co-developed-by: Peter Hilber Signed-off-by: Peter Hilber Signed-off-by: Igor Skalkin --- MAINTAINERS| 1 + drivers/firmware/Kconfig | 12 +- drivers/firmware/arm_scmi/Makefile | 1 + drivers/firmware/arm_scmi/common.h | 14 + drivers/firmware/arm_scmi/driver.c | 11 + drivers/firmware/arm_scmi/virtio.c | 470 + include/uapi/linux/virtio_ids.h| 1 + include/uapi/linux/virtio_scmi.h | 41 +++ 8 files changed, 550 insertions(+), 1 deletion(-) create mode 100644 drivers/firmware/arm_scmi/virtio.c create mode 100644 include/uapi/linux/virtio_scmi.h diff --git a/MAINTAINERS b/MAINTAINERS index deaafb617361..8df73d6ddfc1 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16772,6 +16772,7 @@ F: drivers/firmware/arm_scpi.c F: drivers/reset/reset-scmi.c F: include/linux/sc[mp]i_protocol.h F: include/trace/events/scmi.h +F: include/uapi/linux/virtio_scmi.h SYSTEM RESET/SHUTDOWN DRIVERS M: Sebastian Reichel diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig index bdde51adb267..c4bdd84f7405 100644 --- a/drivers/firmware/Kconfig +++ b/drivers/firmware/Kconfig @@ -9,7 +9,7 @@ menu "Firmware Drivers" config ARM_SCMI_PROTOCOL tristate "ARM System Control and Management Interface (SCMI) Message Protocol" depends on ARM || ARM64 || COMPILE_TEST - depends on ARM_SCMI_HAVE_SHMEM + depends on ARM_SCMI_HAVE_SHMEM || VIRTIO_SCMI help ARM System Control and Management Interface (SCMI) protocol is a set of operating system-independent software interfaces that are @@ -34,6 +34,16 @@ config ARM_SCMI_HAVE_SHMEM This declares whether a shared memory based transport for SCMI is available. +config VIRTIO_SCMI + bool "Virtio transport for SCMI" + default n + depends on VIRTIO + help + This enables the virtio based transport for SCMI. + + If you want to use the ARM SCMI protocol between the virtio guest and + a host providing a virtio SCMI device, answer Y. + config ARM_SCMI_POWER_DOMAIN tristate "SCMI power domain driver" depends on ARM_SCMI_PROTOCOL || (COMPILE_TEST && OF) diff --git a/drivers/firmware/arm_scmi/Makefile b/drivers/firmware/arm_scmi/Makefile index 3cc7fa40a464..25caea5e1969 100644 --- a/drivers/firmware/arm_scmi/Makefile +++ b/drivers/firmware/arm_scmi/Makefile @@ -4,6 +4,7 @@ scmi-driver-y = driver.o notify.o scmi-transport-$(CONFIG_ARM_SCMI_HAVE_SHMEM) = shmem.o scmi-transport-$(CONFIG_MAILBOX) += mailbox.o scmi-transport-$(CONFIG_HAVE_ARM_SMCCC_DISCOVERY) += smc.o +scmi-transport-$(CONFIG_VIRTIO_SCMI) += virtio.o scmi-protocols-y = base.o clock.o perf.o power.o reset.o sensors.o system.o scmi-module-objs := $(scmi-bus-y) $(scmi-driver-y) $(scmi-protocols-y) \ $(scmi-transport-y) diff --git a/drivers/firmware/arm_scmi/common.h b/drivers/firmware/arm_scmi/
[RFC PATCH 6/7] dt-bindings: arm: Add virtio transport for SCMI
From: Igor Skalkin Document the properties for arm,scmi-virtio compatible nodes. The backing virtio SCMI device is described in patch [1]. [1] https://lists.oasis-open.org/archives/virtio-comment/202005/msg00096.html Co-developed-by: Peter Hilber Signed-off-by: Peter Hilber Signed-off-by: Igor Skalkin --- .../devicetree/bindings/arm/arm,scmi.txt | 38 ++- 1 file changed, 36 insertions(+), 2 deletions(-) diff --git a/Documentation/devicetree/bindings/arm/arm,scmi.txt b/Documentation/devicetree/bindings/arm/arm,scmi.txt index 55deb68230eb..844ff3c40a49 100644 --- a/Documentation/devicetree/bindings/arm/arm,scmi.txt +++ b/Documentation/devicetree/bindings/arm/arm,scmi.txt @@ -13,6 +13,9 @@ the device tree. Required properties: The scmi node with the following properties shall be under the /firmware/ node. +Some properties are specific to a transport type. + +shmem-based transports (mailbox, smc/hvc): - compatible : shall be "arm,scmi" or "arm,scmi-smc" for smc/hvc transports - mboxes: List of phandle and mailbox channel specifiers. It should contain @@ -21,6 +24,17 @@ The scmi node with the following properties shall be under the /firmware/ node. supported. - shmem : List of phandle pointing to the shared memory(SHM) area as per generic mailbox client binding. + +Virtio transport: + +- compatible : shall be "arm,scmi-virtio". +- virtio_transport : phandle of the virtio device. This device must support one + virtqueue for transmitting commands ("tx", cmdq), and, +optionally, one more virtqueue for receiving notifications +and delayed responses ("rx", eventq). + +Additional required properties: + - #address-cells : should be '1' if the device has sub-nodes, maps to protocol identifier for a given sub-node. - #size-cells : should be '0' as 'reg' property doesn't have any size @@ -42,7 +56,8 @@ Each protocol supported shall have a sub-node with corresponding compatible as described in the following sections. If the platform supports dedicated communication channel for a particular protocol, the 3 properties namely: mboxes, mbox-names and shmem shall be present in the sub-node corresponding -to that protocol. +to that protocol. The virtio transport does not support dedicated communication +channels. Clock/Performance bindings for the clocks/OPPs based on SCMI Message Protocol @@ -106,7 +121,8 @@ Required sub-node properties: [4] Documentation/devicetree/bindings/sram/sram.yaml [5] Documentation/devicetree/bindings/reset/reset.txt -Example: +Example (mailbox transport): + sram@5000 { compatible = "mmio-sram"; @@ -195,3 +211,21 @@ thermal-zones { ... }; }; + +Example (virtio transport): +--- + +virtio_mmio_scmi: virtio_mmio@4b001000 { + compatible = "virtio,mmio"; + ... +}; + +firmware { + ... + scmi { + compatible = "arm,scmi-virtio"; + virtio_transport = <&virtio_mmio_scmi>; + ... + +The rest is similar to the mailbox transport example, when omitting the +mailbox/shmem-specific properties. -- 2.25.1
[RFC PATCH 5/7] firmware: arm_scmi: Add xfer_init_buffers transport op
From: Igor Skalkin The virtio transport in this patch series can be simplified by using the scmi_xfer tx/rx buffers for data exchange with the virtio device, and for saving the message state. But the virtio transport requires prepending a transport-specific header. Also, for data exchange using virtqueues, the tx and rx buffers should not overlap. After the previous patch, this is the second and final step to enable the virtio transport to use the scmi_xfer buffers for data exchange. Add an optional op through which the transport can allocate the tx/rx buffers along with room for the prepended transport-specific headers. Co-developed-by: Peter Hilber Signed-off-by: Peter Hilber Signed-off-by: Igor Skalkin --- drivers/firmware/arm_scmi/common.h | 3 +++ drivers/firmware/arm_scmi/driver.c | 21 +++-- 2 files changed, 18 insertions(+), 6 deletions(-) diff --git a/drivers/firmware/arm_scmi/common.h b/drivers/firmware/arm_scmi/common.h index e3ccec954e93..c6071ffe4346 100644 --- a/drivers/firmware/arm_scmi/common.h +++ b/drivers/firmware/arm_scmi/common.h @@ -206,6 +206,7 @@ struct scmi_chan_info { * @get_max_msg: Optional callback to provide max_msg dynamically * @max_msg: Maximum number of messages for the channel type (tx or rx) * that can be pending simultaneously in the system + * @xfer_init_buffers: Callback to initialize buffers for scmi_xfer * @send_message: Callback to send a message * @mark_txdone: Callback to mark tx as done * @fetch_response: Callback to fetch response @@ -220,6 +221,8 @@ struct scmi_transport_ops { int (*chan_free)(int id, void *p, void *data); int (*get_max_msg)(bool tx, struct scmi_chan_info *base_cinfo, int *max_msg); + int (*xfer_init_buffers)(struct scmi_chan_info *cinfo, +struct scmi_xfer *xfer, int max_msg_size); int (*send_message)(struct scmi_chan_info *cinfo, struct scmi_xfer *xfer); void (*mark_txdone)(struct scmi_chan_info *cinfo, int ret); diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c index d66f19b27c44..2e25f6f068f5 100644 --- a/drivers/firmware/arm_scmi/driver.c +++ b/drivers/firmware/arm_scmi/driver.c @@ -632,12 +632,21 @@ static int __scmi_xfer_info_init(struct scmi_info *sinfo, /* Pre-initialize the buffer pointer to pre-allocated buffers */ for (i = 0, xfer = info->xfer_block; i < info->max_msg; i++, xfer++) { - xfer->rx.buf = devm_kcalloc(dev, sizeof(u8), desc->max_msg_size, - GFP_KERNEL); - if (!xfer->rx.buf) - return -ENOMEM; - - xfer->tx.buf = xfer->rx.buf; + if (desc->ops->xfer_init_buffers) { + int ret = desc->ops->xfer_init_buffers( + base_cinfo, xfer, desc->max_msg_size); + + if (ret) + return ret; + } else { + xfer->rx.buf = devm_kcalloc(dev, sizeof(u8), + desc->max_msg_size, + GFP_KERNEL); + if (!xfer->rx.buf) + return -ENOMEM; + + xfer->tx.buf = xfer->rx.buf; + } init_completion(&xfer->done); } -- 2.25.1
[RFC PATCH 4/7] firmware: arm_scmi: Add per message transport data
From: Igor Skalkin The virtio transport in this patch series can be simplified by using the scmi_xfer tx/rx buffers for data exchange with the virtio device, and for saving the message state. But the virtio transport requires prepending a transport-specific header. Also, for data exchange using virtqueues, the tx and rx buffers should not overlap. The first step to solve the aforementioned issues is to add a transport-specific data pointer to scmi_xfer. Co-developed-by: Peter Hilber Signed-off-by: Peter Hilber Signed-off-by: Igor Skalkin --- drivers/firmware/arm_scmi/common.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/firmware/arm_scmi/common.h b/drivers/firmware/arm_scmi/common.h index 0f1540cb2d84..e3ccec954e93 100644 --- a/drivers/firmware/arm_scmi/common.h +++ b/drivers/firmware/arm_scmi/common.h @@ -131,6 +131,7 @@ struct scmi_msg { * buffer for the rx path as we use for the tx path. * @done: command message transmit completion event * @async_done: pointer to delayed response message received event completion + * @extra_data: Transport-specific private data pointer */ struct scmi_xfer { int transfer_id; @@ -139,6 +140,7 @@ struct scmi_xfer { struct scmi_msg rx; struct completion done; struct completion *async_done; + void *extra_data; }; void scmi_xfer_put(const struct scmi_handle *h, struct scmi_xfer *xfer); -- 2.25.1
[RFC PATCH 3/7] firmware: arm_scmi: Add op to override max message #
From: Igor Skalkin The number of messages that the upcoming scmi-virtio transport can support depends on the virtio device (SCMI platform) and can differ for each channel. (The scmi-virtio transport does only have one tx and at most 1 rx channel.) Add an optional transport op so that scmi-virtio can report the actual max message # for each channel type. Respect these new limits. Co-developed-by: Peter Hilber Signed-off-by: Peter Hilber Signed-off-by: Igor Skalkin --- drivers/firmware/arm_scmi/common.h | 8 - drivers/firmware/arm_scmi/driver.c | 49 ++ 2 files changed, 43 insertions(+), 14 deletions(-) diff --git a/drivers/firmware/arm_scmi/common.h b/drivers/firmware/arm_scmi/common.h index 4cc78eb27f14..0f1540cb2d84 100644 --- a/drivers/firmware/arm_scmi/common.h +++ b/drivers/firmware/arm_scmi/common.h @@ -201,6 +201,9 @@ struct scmi_chan_info { * @chan_available: Callback to check if channel is available or not * @chan_setup: Callback to allocate and setup a channel * @chan_free: Callback to free a channel + * @get_max_msg: Optional callback to provide max_msg dynamically + * @max_msg: Maximum number of messages for the channel type (tx or rx) + * that can be pending simultaneously in the system * @send_message: Callback to send a message * @mark_txdone: Callback to mark tx as done * @fetch_response: Callback to fetch response @@ -213,6 +216,8 @@ struct scmi_transport_ops { int (*chan_setup)(struct scmi_chan_info *cinfo, struct device *dev, bool tx); int (*chan_free)(int id, void *p, void *data); + int (*get_max_msg)(bool tx, struct scmi_chan_info *base_cinfo, + int *max_msg); int (*send_message)(struct scmi_chan_info *cinfo, struct scmi_xfer *xfer); void (*mark_txdone)(struct scmi_chan_info *cinfo, int ret); @@ -230,7 +235,8 @@ struct scmi_transport_ops { * @ops: Pointer to the transport specific ops structure * @max_rx_timeout_ms: Timeout for communication with SoC (in Milliseconds) * @max_msg: Maximum number of messages for a channel type (tx or rx) that can - * be pending simultaneously in the system + * be pending simultaneously in the system. May be overridden by the + * get_max_msg op. * @max_msg_size: Maximum size of data per message that can be handled. */ struct scmi_desc { diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c index d070687cf2f6..d66f19b27c44 100644 --- a/drivers/firmware/arm_scmi/driver.c +++ b/drivers/firmware/arm_scmi/driver.c @@ -61,11 +61,13 @@ static atomic_t transfer_last_id; * Index of this bitmap table is also used for message * sequence identifier. * @xfer_lock: Protection for message allocation + * @max_msg: Maximum number of messages that can be pending */ struct scmi_xfers_info { struct scmi_xfer *xfer_block; unsigned long *xfer_alloc_table; spinlock_t xfer_lock; + int max_msg; }; /** @@ -157,13 +159,11 @@ static struct scmi_xfer *scmi_xfer_get(const struct scmi_handle *handle, u16 xfer_id; struct scmi_xfer *xfer; unsigned long flags, bit_pos; - struct scmi_info *info = handle_to_scmi_info(handle); /* Keep the locked section as small as possible */ spin_lock_irqsave(&minfo->xfer_lock, flags); - bit_pos = find_first_zero_bit(minfo->xfer_alloc_table, - info->desc->max_msg); - if (bit_pos == info->desc->max_msg) { + bit_pos = find_first_zero_bit(minfo->xfer_alloc_table, minfo->max_msg); + if (bit_pos == minfo->max_msg) { spin_unlock_irqrestore(&minfo->xfer_lock, flags); return ERR_PTR(-ENOMEM); } @@ -594,32 +594,44 @@ int scmi_handle_put(const struct scmi_handle *handle) } static int __scmi_xfer_info_init(struct scmi_info *sinfo, -struct scmi_xfers_info *info) +struct scmi_xfers_info *info, +bool tx, +struct scmi_chan_info *base_cinfo) { int i; struct scmi_xfer *xfer; struct device *dev = sinfo->dev; const struct scmi_desc *desc = sinfo->desc; + info->max_msg = desc->max_msg; + + if (desc->ops->get_max_msg) { + int ret = + desc->ops->get_max_msg(tx, base_cinfo, &info->max_msg); + + if (ret) + return ret; + } + /* Pre-allocated messages, no more than what hdr.seq can support */ - if (WARN_ON(desc->max_msg >= MSG_TOKEN_MAX)) { + if (WARN_ON(info->max_msg >= MSG_TOKEN_MAX)) { dev_err(dev, "Maximum message of %d exceeds supported %ld\n", -
[RFC PATCH 2/7] firmware: arm_scmi: Document that max_msg is a per channel type limit
From: Igor Skalkin struct scmi_desc.max_msg specifies a limit for the pending messages. This limit is a per SCMI channel type (tx, rx) limit. State that explicitly in the inline documentation. The following patch will add an op to override the limit per channel type. Co-developed-by: Peter Hilber Signed-off-by: Peter Hilber Signed-off-by: Igor Skalkin --- drivers/firmware/arm_scmi/common.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/firmware/arm_scmi/common.h b/drivers/firmware/arm_scmi/common.h index 36c38334a045..4cc78eb27f14 100644 --- a/drivers/firmware/arm_scmi/common.h +++ b/drivers/firmware/arm_scmi/common.h @@ -229,8 +229,8 @@ struct scmi_transport_ops { * * @ops: Pointer to the transport specific ops structure * @max_rx_timeout_ms: Timeout for communication with SoC (in Milliseconds) - * @max_msg: Maximum number of messages that can be pending - * simultaneously in the system + * @max_msg: Maximum number of messages for a channel type (tx or rx) that can + * be pending simultaneously in the system * @max_msg_size: Maximum size of data per message that can be handled. */ struct scmi_desc { -- 2.25.1
[RFC PATCH 1/7] firmware: arm_scmi, smccc, mailbox: Make shmem based transports optional
From: Igor Skalkin Upon adding the virtio transport in this patch series, SCMI will also work without shared memory based transports. Also, the mailbox transport may not be needed if the smc transport is used. - Compile shmem.c only if a shmem based transport is available. - Remove hard dependency of SCMI on mailbox. Co-developed-by: Peter Hilber Signed-off-by: Peter Hilber Signed-off-by: Igor Skalkin --- drivers/firmware/Kconfig | 9 - drivers/firmware/arm_scmi/Makefile | 2 +- drivers/firmware/arm_scmi/common.h | 2 ++ drivers/firmware/arm_scmi/driver.c | 2 ++ drivers/firmware/smccc/Kconfig | 1 + drivers/mailbox/Kconfig| 1 + 6 files changed, 15 insertions(+), 2 deletions(-) diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig index afdbebba628a..bdde51adb267 100644 --- a/drivers/firmware/Kconfig +++ b/drivers/firmware/Kconfig @@ -9,7 +9,7 @@ menu "Firmware Drivers" config ARM_SCMI_PROTOCOL tristate "ARM System Control and Management Interface (SCMI) Message Protocol" depends on ARM || ARM64 || COMPILE_TEST - depends on MAILBOX + depends on ARM_SCMI_HAVE_SHMEM help ARM System Control and Management Interface (SCMI) protocol is a set of operating system-independent software interfaces that are @@ -27,6 +27,13 @@ config ARM_SCMI_PROTOCOL This protocol library provides interface for all the client drivers making use of the features offered by the SCMI. +config ARM_SCMI_HAVE_SHMEM + bool + default n + help + This declares whether a shared memory based transport for SCMI is + available. + config ARM_SCMI_POWER_DOMAIN tristate "SCMI power domain driver" depends on ARM_SCMI_PROTOCOL || (COMPILE_TEST && OF) diff --git a/drivers/firmware/arm_scmi/Makefile b/drivers/firmware/arm_scmi/Makefile index bc0d54f8e861..3cc7fa40a464 100644 --- a/drivers/firmware/arm_scmi/Makefile +++ b/drivers/firmware/arm_scmi/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0-only scmi-bus-y = bus.o scmi-driver-y = driver.o notify.o -scmi-transport-y = shmem.o +scmi-transport-$(CONFIG_ARM_SCMI_HAVE_SHMEM) = shmem.o scmi-transport-$(CONFIG_MAILBOX) += mailbox.o scmi-transport-$(CONFIG_HAVE_ARM_SMCCC_DISCOVERY) += smc.o scmi-protocols-y = base.o clock.o perf.o power.o reset.o sensors.o system.o diff --git a/drivers/firmware/arm_scmi/common.h b/drivers/firmware/arm_scmi/common.h index 37fb583f1bf5..36c38334a045 100644 --- a/drivers/firmware/arm_scmi/common.h +++ b/drivers/firmware/arm_scmi/common.h @@ -240,7 +240,9 @@ struct scmi_desc { int max_msg_size; }; +#ifdef CONFIG_MAILBOX extern const struct scmi_desc scmi_mailbox_desc; +#endif #ifdef CONFIG_HAVE_ARM_SMCCC extern const struct scmi_desc scmi_smc_desc; #endif diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c index c5dea87edf8f..d070687cf2f6 100644 --- a/drivers/firmware/arm_scmi/driver.c +++ b/drivers/firmware/arm_scmi/driver.c @@ -910,7 +910,9 @@ ATTRIBUTE_GROUPS(versions); /* Each compatible listed below must have descriptor associated with it */ static const struct of_device_id scmi_of_match[] = { +#ifdef CONFIG_MAILBOX { .compatible = "arm,scmi", .data = &scmi_mailbox_desc }, +#endif #ifdef CONFIG_HAVE_ARM_SMCCC_DISCOVERY { .compatible = "arm,scmi-smc", .data = &scmi_smc_desc}, #endif diff --git a/drivers/firmware/smccc/Kconfig b/drivers/firmware/smccc/Kconfig index 15e7466179a6..69c4d6cabf62 100644 --- a/drivers/firmware/smccc/Kconfig +++ b/drivers/firmware/smccc/Kconfig @@ -9,6 +9,7 @@ config HAVE_ARM_SMCCC_DISCOVERY bool depends on ARM_PSCI_FW default y + select ARM_SCMI_HAVE_SHMEM help SMCCC v1.0 lacked discoverability and hence PSCI v1.0 was updated to add SMCCC discovery mechanism though the PSCI firmware diff --git a/drivers/mailbox/Kconfig b/drivers/mailbox/Kconfig index 05b1009e2820..5ffe1ab0c869 100644 --- a/drivers/mailbox/Kconfig +++ b/drivers/mailbox/Kconfig @@ -1,6 +1,7 @@ # SPDX-License-Identifier: GPL-2.0-only menuconfig MAILBOX bool "Mailbox Hardware Support" + select ARM_SCMI_HAVE_SHMEM help Mailbox is a framework to control hardware communication between on-chip processors through queued messages and interrupt driven -- 2.25.1
[RFC PATCH 0/7] firmware: arm_scmi: Add virtio transport
This series implements an SCMI virtio driver according to the virtio SCMI device spec patch v5 [1], after simple preparatory changes to the existing arm-scmi driver. The virtio transport differs in some respects from the existing shared-memory based SCMI transports. Message timeouts can be a problem if the virtio device (SCMI platform) does not have real-time properties. I set a high message rx timeout value which should work for non real-time virtio devices as well. There are other timeouts for delayed response and polling which were not addressed yet. Delayed responses are not really needed since, with the virtio transport, message responses may be transmitted out of order. Polling doesn't make sense at least for virtio devices without real-time behavior, in my understanding. There are some known issues which will be resolved before removing the RFC tag: - Further work is needed on the scmi_xfer management. Unlike shmem based transports, the virtio transport is actually exchanging messages with the SCMI agent through the scmi_xfer tx and rx buffers. In case of a message rx timeout, the arm-scmi driver could try to re-use the scmi_xfer, while that might still be used by the virtio device. I think part of the scmi_xfers_info bookkeeping could be optionally outsourced to the transport to remediate this. - After arm-scmi driver probe failure, or after remove, the scmi-virtio driver may still try to process and forward message responses from the virtio device. - We must be sure that the virtio transport option (such as virtio over MMIO) is available when the virtio SCMI device is probed. The series is based on for-next/scmi [2], on top of commit 66d90f6ecee7 ("firmware: arm_scmi: Enable building as a single module") The series was actually tested with a v5.4 based kernel, with the Base protocol and Sensor management protocol. The virtio SCMI device used was a proprietary implementation by OpenSynergy. Delayed responses were not tested. Any comments are very welcome. [1] https://lists.oasis-open.org/archives/virtio-comment/202005/msg00096.html [2] https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux.git Igor Skalkin (7): firmware: arm_scmi, smccc, mailbox: Make shmem based transports optional firmware: arm_scmi: Document that max_msg is a per channel type limit firmware: arm_scmi: Add op to override max message # firmware: arm_scmi: Add per message transport data firmware: arm_scmi: Add xfer_init_buffers transport op dt-bindings: arm: Add virtio transport for SCMI firmware: arm_scmi: Add virtio transport .../devicetree/bindings/arm/arm,scmi.txt | 38 +- MAINTAINERS | 1 + drivers/firmware/Kconfig | 19 +- drivers/firmware/arm_scmi/Makefile| 3 +- drivers/firmware/arm_scmi/common.h| 31 +- drivers/firmware/arm_scmi/driver.c| 83 +++- drivers/firmware/arm_scmi/virtio.c| 470 ++ drivers/firmware/smccc/Kconfig| 1 + drivers/mailbox/Kconfig | 1 + include/uapi/linux/virtio_ids.h | 1 + include/uapi/linux/virtio_scmi.h | 41 ++ 11 files changed, 664 insertions(+), 25 deletions(-) create mode 100644 drivers/firmware/arm_scmi/virtio.c create mode 100644 include/uapi/linux/virtio_scmi.h base-commit: 66d90f6ecee755e9c19a119c9255e80091165498 -- 2.25.1