On Wed Feb 11, 2026 at 9:04 AM JST, John Hubbard wrote:
> Nova was setting the outer GSP_MSG_QUEUE_ELEMENT.seqNum to an
> incrementing counter for all sends, but leaving the inner
> rpc_message_header_v.sequence at zero for all sends. Open RM sets the
> inner sequence to the counter for sync (command/response) calls and to
> zero for async (fire-and-forget) calls.
>
> Split GspMsgElement::init() into separate transport_seq (outer) and
> rpc_seq (inner) parameters. The outer seqNum always increments as a
> unique transport-level ID for every message. The inner rpc.sequence is
> set to 0 for async commands and to the transport counter for sync
> commands, matching Open RM behavior.
>
> Add an IS_ASYNC const to the CommandToGsp trait and set it to true for
> SetSystemInfo and SetRegistry, the two fire-and-forget RPCs.
>
> Add MsgFunction::is_event() to classify received messages as
> GSP-initiated async events vs command responses. This will be used by
> the next commit to improve debug logging.
>
> GSP does not yet include SetSystemInfo and SetRegistry in its sequence
> counting, but a future GSP firmware update will fold them in. A comment
> is added to note this.
>
> Cc: Maneet Singh <[email protected]>
> Signed-off-by: John Hubbard <[email protected]>
> ---
>  drivers/gpu/nova-core/gsp/boot.rs     |  6 ++++-
>  drivers/gpu/nova-core/gsp/cmdq.rs     | 17 +++++++++++--
>  drivers/gpu/nova-core/gsp/commands.rs |  2 ++
>  drivers/gpu/nova-core/gsp/fw.rs       | 36 +++++++++++++++++++++++----
>  4 files changed, 53 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/nova-core/gsp/boot.rs 
> b/drivers/gpu/nova-core/gsp/boot.rs
> index 02eec2961b5f..f769e234dae6 100644
> --- a/drivers/gpu/nova-core/gsp/boot.rs
> +++ b/drivers/gpu/nova-core/gsp/boot.rs
> @@ -403,7 +403,11 @@ pub(crate) fn boot(
>  
>          dev_dbg!(dev, "RISC-V active? {}\n", 
> gsp_falcon.is_riscv_active(bar));
>  
> -        // Now that GSP is active, send system info and registry
> +        // Now that GSP is active, send system info and registry.
> +        //
> +        // These are async (fire-and-forget) RPCs: no response comes back 
> from GSP.
> +        // GSP does not include them in its sequence number counting today, 
> but a
> +        // future GSP firmware update will fold them into the normal 
> sequence space.

>From the point of view of the caller this is a superfluous
implementation detail. This comment is actually confusing as it mentions
a sequence number that the caller does not need to provide, so I'd just
remove it altogether.

>          self.cmdq
>              .send_command(bar, commands::SetSystemInfo::new(pdev, chipset))?;
>          self.cmdq.send_command(bar, commands::SetRegistry::new())?;
> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs 
> b/drivers/gpu/nova-core/gsp/cmdq.rs
> index 16895f5281b7..7d6d7d81287c 100644
> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
> @@ -58,6 +58,13 @@ pub(crate) trait CommandToGsp {
>      /// Function identifying this command to the GSP.
>      const FUNCTION: MsgFunction;
>  
> +    /// Whether this command is async (fire-and-forget), meaning no response 
> is expected from GSP.
> +    ///
> +    /// Async commands get inner `rpc.sequence` set to 0. Sync commands get 
> inner `rpc.sequence`
> +    /// set to the transport counter, matching Open RM. The outer `seqNum` 
> always increments
> +    /// regardless.

This here is too much detail as well IMHO, even more if you follow the
last suggestion in this review. :)

> +    const IS_ASYNC: bool = false;
> +
>      /// Type generated by [`CommandToGsp::init`], to be written into the 
> command queue buffer.
>      type Command: FromBytes + AsBytes;
>  
> @@ -439,7 +446,8 @@ struct GspMessage<'a> {
>  pub(crate) struct Cmdq {
>      /// Device this command queue belongs to.
>      dev: ARef<device::Device>,
> -    /// Current command sequence number.
> +    /// Transport-level sequence number, incremented for every send. Used 
> for the outer
> +    /// GSP_MSG_QUEUE_ELEMENT.seqNum. Also used as the inner rpc.sequence 
> for sync commands.

Note that these types (especially `GSP_MSG_QUEUE_ELEMENT`) are never
visible in this module, so mentioning them here is assuming context that
the reader can't easily get.

Instead, how about just "Used for [`GspMsgElement`]'s sequence counter"?

>      seq: u32,
>      /// Memory area shared with the GSP for communicating commands and 
> messages.
>      gsp_mem: DmaGspMem,
> @@ -514,8 +522,13 @@ pub(crate) fn send_command<M>(&mut self, bar: &Bar0, 
> command: M) -> Result
>          // Extract area for the command itself.
>          let (cmd, payload_1) = 
> M::Command::from_bytes_mut_prefix(dst.contents.0).ok_or(EIO)?;
>  
> +        // The outer seqNum always increments (transport-level, unique per 
> message).
> +        // The inner rpc.sequence is 0 for async (fire-and-forget) commands, 
> or the
> +        // sync counter for command/response pairs, matching Open RM 
> behavior.
> +        let rpc_seq = if M::IS_ASYNC { 0 } else { self.seq };
> +
>          // Fill the header and command in-place.
> -        let msg_element = GspMsgElement::init(self.seq, command_size, 
> M::FUNCTION);
> +        let msg_element = GspMsgElement::init(self.seq, rpc_seq, 
> command_size, M::FUNCTION);
>          // SAFETY: `msg_header` and `cmd` are valid references, and not 
> touched if the initializer
>          // fails.
>          unsafe {
> diff --git a/drivers/gpu/nova-core/gsp/commands.rs 
> b/drivers/gpu/nova-core/gsp/commands.rs
> index e6a9a1fc6296..c8a73bd30051 100644
> --- a/drivers/gpu/nova-core/gsp/commands.rs
> +++ b/drivers/gpu/nova-core/gsp/commands.rs
> @@ -50,6 +50,7 @@ pub(crate) fn new(pdev: &'a pci::Device<device::Bound>, 
> chipset: Chipset) -> Sel
>  
>  impl<'a> CommandToGsp for SetSystemInfo<'a> {
>      const FUNCTION: MsgFunction = MsgFunction::GspSetSystemInfo;
> +    const IS_ASYNC: bool = true;
>      type Command = GspSetSystemInfo;
>      type InitError = Error;
>  
> @@ -101,6 +102,7 @@ pub(crate) fn new() -> Self {
>  
>  impl CommandToGsp for SetRegistry {
>      const FUNCTION: MsgFunction = MsgFunction::SetRegistry;
> +    const IS_ASYNC: bool = true;
>      type Command = PackedRegistryTable;
>      type InitError = Infallible;
>  
> diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
> index 927bcee6a5a5..e417ed58419f 100644
> --- a/drivers/gpu/nova-core/gsp/fw.rs
> +++ b/drivers/gpu/nova-core/gsp/fw.rs
> @@ -260,6 +260,26 @@ pub(crate) enum MsgFunction {
>      UcodeLibOsPrint = bindings::NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT,
>  }
>  
> +impl MsgFunction {
> +    /// Returns true if this is a GSP-initiated async event 
> (NV_VGPU_MSG_EVENT_*), as opposed to
> +    /// a command response (NV_VGPU_MSG_FUNCTION_*).
> +    #[expect(dead_code)]
> +    pub(crate) fn is_event(&self) -> bool {
> +        matches!(
> +            self,
> +            Self::GspInitDone
> +                | Self::GspRunCpuSequencer
> +                | Self::PostEvent
> +                | Self::RcTriggered
> +                | Self::MmuFaultQueued
> +                | Self::OsErrorLog
> +                | Self::GspPostNoCat
> +                | Self::GspLockdownNotice
> +                | Self::UcodeLibOsPrint //
> +        )
> +    }

Mmm using a method for this is quite fragile, as we are always at risk
of forgetting to handle a newly-added function. I wanted to eventually
split `MsgFunction` between its command and events constituants, maybe
that would be a good opportunity to do so.

Also, this is marked with `dead_code`? Looks like it belongs to the next
patch in this case.

> +}
> +
>  impl fmt::Display for MsgFunction {
>      fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
>          match self {
> @@ -816,7 +836,7 @@ fn new() -> Self {
>  }
>  
>  impl bindings::rpc_message_header_v {
> -    fn init(cmd_size: usize, function: MsgFunction) -> impl Init<Self, 
> Error> {
> +    fn init(cmd_size: usize, function: MsgFunction, sequence: u32) -> impl 
> Init<Self, Error> {
>          type RpcMessageHeader = bindings::rpc_message_header_v;
>  
>          try_init!(RpcMessageHeader {
> @@ -829,6 +849,7 @@ fn init(cmd_size: usize, function: MsgFunction) -> impl 
> Init<Self, Error> {
>                  .and_then(|v| v.try_into().map_err(|_| EINVAL))?,
>              rpc_result: 0xffffffff,
>              rpc_result_private: 0xffffffff,
> +            sequence,
>              ..Zeroable::init_zeroed()
>          })
>      }
> @@ -847,26 +868,31 @@ impl GspMsgElement {
>      ///
>      /// # Arguments
>      ///
> -    /// * `sequence` - Sequence number of the message.
> +    /// * `transport_seq` - Transport-level sequence number for the outer 
> message header
> +    ///   (`GSP_MSG_QUEUE_ELEMENT.seqNum`). Must be unique per message.
> +    /// * `rpc_seq` - RPC-level sequence number for the inner RPC header
> +    ///   (`rpc_message_header_v.sequence`). Set to 0 for async 
> (fire-and-forget) commands,
> +    ///   or to the sync counter for command/response pairs.
>      /// * `cmd_size` - Size of the command (not including the message 
> element), in bytes.
>      /// * `function` - Function of the message.
>      #[allow(non_snake_case)]
>      pub(crate) fn init(
> -        sequence: u32,
> +        transport_seq: u32,
> +        rpc_seq: u32,

The caller site of `init` mentions that `rpc_seq` will always either be
equal to `transport_seq`, or be 0, depending on whether this is an async
command or not, yet this constructor allows any value to be used. I'd
like to enforce this invariant by making it impossible to build invalid
combinations of sequences.

As a matter of fact we already have this information in
`CommandToGsp::IS_ASYNC`, so the simpler way would be to pass *that* as
an extra argument to `init`, and let it handle the sequencing
internally. It also has the benefit of not making this message-layer
detail leak into the command queue code, making the discussion about
this detail even more irrelevant outside of the `fw` module.

An even better solution would be to to pass the `CommandToGsp` as a
generic argument - that way `init` could extract both the `FUNCTION` and
`IS_ASYNC`. Maybe use a public generic proxy method that calls into a
non-generic private one to limit monomorphization.

Reply via email to