On Fri Mar 6, 2026 at 12:09 AM JST, Alexandre Courbot wrote:
> On Mon Mar 2, 2026 at 4:22 PM JST, Eliot Courtney wrote:
>> On Sun Mar 1, 2026 at 11:03 PM JST, Alexandre Courbot wrote:
>>> From: Timur Tabi <[email protected]>
>>> +impl FwsecFirmwareWithBl {
>>> + /// Loads the bootloader firmware for `dev` and `chipset`, and wrap
>>> `firmware` so it can be
>>> + /// loaded using it.
>>> + pub(crate) fn new(
>>> + firmware: FwsecFirmware,
>>> + dev: &Device<device::Bound>,
>>> + chipset: Chipset,
>>> + ) -> Result<Self> {
>>> + let fw = request_firmware(dev, chipset, "gen_bootloader",
>>> FIRMWARE_VERSION)?;
>>> + let hdr = fw
>>> + .data()
>>> + .get(0..size_of::<BinHdr>())
>>> + .and_then(BinHdr::from_bytes_copy)
>>> + .ok_or(EINVAL)?;
>>> +
>>> + let desc = {
>>> + let desc_offset = usize::from_safe_cast(hdr.header_offset);
>>> +
>>> + fw.data()
>>> + .get(desc_offset..)
>>> + .and_then(BootloaderDesc::from_bytes_copy_prefix)
>>> + .ok_or(EINVAL)?
>>> + .0
>>> + };
>>> +
>>> + let ucode = {
>>> + let ucode_start = usize::from_safe_cast(hdr.data_offset);
>>> + let code_size = usize::from_safe_cast(desc.code_size);
>>> + // Align to falcon block size (256 bytes).
>>> + let aligned_code_size = code_size
>>> + .align_up(Alignment::new::<{ falcon::MEM_BLOCK_ALIGNMENT
>>> }>())
>>> + .ok_or(EINVAL)?;
>>> +
>>> + let mut ucode = KVec::with_capacity(aligned_code_size,
>>> GFP_KERNEL)?;
>>> + ucode.extend_from_slice(
>>> + fw.data()
>>> + .get(ucode_start..ucode_start + code_size)
>>> + .ok_or(EINVAL)?,
>>> + GFP_KERNEL,
>>> + )?;
>>> + ucode.resize(aligned_code_size, 0, GFP_KERNEL)?;
>>> +
>>> + ucode
>>> + };
>>> +
>>> + let firmware_dma = DmaObject::from_data(dev, &firmware.ucode.0)?;
>>> +
>>> + let dmem_desc = {
>>> + let imem_sec =
>>> FalconDmaLoadable::imem_sec_load_params(&firmware);
>>> + let imem_ns =
>>> FalconDmaLoadable::imem_ns_load_params(&firmware).ok_or(EINVAL)?;
>>> + let dmem = FalconDmaLoadable::dmem_load_params(&firmware);
>>> +
>>> + BootloaderDmemDescV2 {
>>> + reserved: [0; 4],
>>> + signature: [0; 4],
>>> + ctx_dma: 4, // FALCON_DMAIDX_PHYS_SYS_NCOH
>>> + code_dma_base: firmware_dma.dma_handle(),
>>> + non_sec_code_off: imem_ns.dst_start,
>>> + non_sec_code_size: imem_ns.len,
>>> + sec_code_off: imem_sec.dst_start,
>>
>> Is it correct here to use `dst_start` for `non_sec_code_off` and
>> `sec_code_off`?
>> AFAICT, these are meant to be offsets from the base of the DMA memory and
>> it's meant to copy into the falcon. But the documentation on `dst_start`
>> says `Offset from the start of the destination memory to copy into.`
>
> OpenRM does the following:
>
> pUcode->imemNsPa = pDescV2->IMEMPhysBase;
> ...
> blDmemDesc.nonSecureCodeOff = pUcode->imemNsPa;
>
> and
>
> pUcode->imemSecPa = pDescV2->IMEMSecBase - pDescV2->IMEMVirtBase +
> pDescV2->IMEMPhysBase;
> ...
> blDmemDesc.secureCodeOff = pUcode->imemSecPa;
>
> So it *does* set IMEM addresses (i.e. destination) into these fields as
> well. And their documentation is the same as Nova, which is all the more
> intriguing.
>
> But the reason for this became clear after I figured out that the FWSEC
> firmware was a *mirror* image of what is expected in IMEM. The IMEM
> destination offsets are also the offsets from the start of the source
> DMA object, hence their use in `BootloaderDmemDescV2` - for the
> bootloader, they are the offsets from both the source AND the
> destination!
>
> I've found a couple of differences while reviewing the code though.
> Nova-core did not do the `- pDescV2->IMEMVirtBase +
> pDescV2->IMEMPhysBase` part, and it did not pad the FWSEC start image
> with zeroes up to `IMEMPhysBase` like OpenRM does (to have this
> source/destination symmetry). This happened to work because these values
> are all zero in practice, but we want to align the code to do the
> correct thing otherwise we have a theoretical risk of mismatch.
Thanks for looking into this - the mirrored firmware layout thing helps
explain a lot why `dst_start` is correct to use here and `src_start`
is incorect.
>
>>
>> Also `ucode_start` is defined as `hdr.data_offset`
>> but doesn't add `code_off` from `BootloaderDesc` and `code_off` appears
>> unused. So does `data_off` and `dmem_load_off` for that matter.
>> I find it hard to follow what's actually required but it seems odd that
>> none of these are used.
>>
>> At least for `code_off` should it not be part of the computation of
>> `ucode_start`?
>> `let ucode_start = usize::from_safe_cast(hdr.data_offset);`
>
> These two appear to be correct, at least if we follow OpenRM. For IMEM:
>
> // Copy bootloader to top of IMEM
> virtAddr = pBlUcDesc->blStartTag << 8;
> NV_ASSERT_OK_OR_RETURN(
> s_imemCopyTo_TU102(pGpu, pKernelFlcn,
> imemDstBlk << FALCON_IMEM_BLKSIZE2,
> pBlImg, // <-- start of image, no `code_off`.
> blSize,
> NV_FALSE,
> virtAddr));
>
> The `BootloaderDesc` is only used for `code_size` and `start_tag`. And
> it's not strange considering that it just contains the code to load into
> IMEM - the data being the `BootloaderDmemDescV2` that we construct from
> the actual firmware. Since there is only one segment in that binary, it
> makes sense that it starts at offset zero.
I looked into this some more and I believe we probably should include
`code_off` here, even though currently it doesn't matter. First, here is
what I found out about the firmware layout, which might be useful
background so let me record it here for posterity:
In OpenRM, the gen_bootloader firmware in [0], and it's split out into:
- ksec2BinArchiveBlUcode_TU102_ucode_image_data
- ksec2BinArchiveBlUcode_TU102_ucode_desc_data
Which are the image data and descriptor of the firmware.
This script creates the gen_bootloader binary that nova uses [1].
The script writes gen_bootloader.bin here[2]. It creates the
`nvfw_bin_hdr` which IIUC corresponds to `BinHdr` in nova. That would
make `BinHdr::data_offset` equal to `24 + descriptor_size` = 48 and
the .bin size `816` (this corresponds exactly to the data
offset of 0x200+data size of 0x100 + the 48 bytes of header, and to the
file size of the binary firmware file, and to the uncompressed size of
the ucode_image + 48 bytes of header). After that it puts in the
descriptor, which corresponds to
ksec2BinArchiveBlUcode_TU102_ucode_desc_data (which is 24 bytes, type is
`RM_FLCN_BL_DESC` in openrm) and includes the code offset field (which
is zero):
```
typedef struct _def_rm_flcn_bl_img_header
{
/*!
* Offset of code section in the image.
*/
NvU32 blCodeOffset;
/*!
* Size of code section in the image.
*/
NvU32 blCodeSize;
/*!
* Offset of data section in the image.
*/
NvU32 blDataOffset;
/*!
* Size of data section in the image.
*/
NvU32 blDataSize;
} RM_FLCN_BL_IMG_HEADER, *PRM_FLCN_BL_IMG_HEADER;
```
So when we read into the firmware in nova, we're looking at a firmware
that is like:
BinHdr
Descriptor (which includes code offset field)
Code Data
Data Data (this is unused)
But, in OpenRM, in
`src/nvidia/src/kernel/gpu/gsp/arch/turing/kernel_gsp_falcon_tu102.c` in
`s_prepareHsFalconWithLoader` it calls `ksec2GetGenericBlUcode_HAL`
which writes into `pBlUcDesc` which is of type `RM_FLCN_BL_DESC*` - that
goes to `ksec2BinArchiveBlUcode_TU102_ucode_desc_data`. It also writes
into `pBlImg`, but that actually points directly to
`ksec2BinArchiveBlUcode_TU102_ucode_image_data`.
In `ksec2GetGenericBlUcode_TU102` it writes just the code section:
`*ppImg = pKernelSec2->pGenericBlUcodeImg;`. That comes from
`bindataArchiveGetStorage` which grabs "ucode_image" in
`__ksec2GetBinArchiveBlUcode_TU102`. That corresponds directly to
`ksec2BinArchiveBlUcode_TU102_ucode_image_data`.
So when it uses `pBlImg` in `s_prepareHsFalconWithLoader`, it actually
doesn't include the header or the descriptor, it just includes the code
data.
[0] src/nvidia/generated/g_bindata_ksec2GetBinArchiveBlUcode_TU102.c
[1]
https://github.com/NVIDIA/open-gpu-kernel-modules/blob/main/nouveau/extract-firmware-nouveau.py
[2]
https://github.com/NVIDIA/open-gpu-kernel-modules/blob/df1c9a3de2be4b130f4fc7680f7198193d8ac6ad/nouveau/extract-firmware-nouveau.py#L181
With that out of the way, if we just consider the ucode image part, I
think OpenRM code is wrong to not add the code offset, and nouveau code
is correct to. Even though it is currently 0 in the embedded firmware
in openrm. I think this is less of a problem for openrm because the
hardcoded zero is versioned with the actual firmware which is embedded
in the driver, so there's no way it could get broken later.
So yeah, it doesn't matter too much but to be maximally safe and
correct, I think it should have `code_off` added.
>
> And here is what OpenRM does for DMEM:
>
> // Copy dmem desc to DMEM offset 0
> NV_ASSERT_OK_OR_RETURN(
> s_dmemCopyTo_TU102(pGpu, pKernelFlcn,
> 0,
> (NvU8 *) &blDmemDesc,
> sizeof(RM_FLCN_BL_DMEM_DESC)));
>
> Here too it makes sense that we load at offset 0, since there is only
> one data segment (the `BootloaderDmemDescV2`) that we build ourselves.
> There is no other data to place anywhere.
>
> I suspect `BootloaderDesc` (or `RM_FLCN_BL_IMG_HEADER` in OpenRM) was
> recycled from some older (pre-OpenRM), more complex firmware and
> inherited its documentation only serves a limited purpose here.
>
> The comments on `BootloaderDesc` are incorrect for us though. I'll amend
> them.
Thanks. And for dmem_load_offset, it's not used by nouveau either.
I think this one is ok with just a comment and marking it as unused
using _.
>
>>
>> Overall I find the dst/srcs here confusing cos it feels hard to keep
>> track of which set of memory things are in. So, sorry if these comments
>> are nonsense.
>
> The documentation clearly sends us on the wrong path, so good thing we
> fixed it.