On Tue Mar 10, 2026 at 10:51 AM JST, Gary Guo wrote:
> On Tue Mar 10, 2026 at 1:40 AM GMT, Alexandre Courbot wrote:
>> On Tue Mar 10, 2026 at 1:34 AM JST, Tim Kovalenko via B4 Relay wrote:
>>> From: Tim Kovalenko <[email protected]>
>>>
>>> The `Cmdq::new` function was allocating a `PteArray` struct on the stack
>>> and was causing a stack overflow with 8216 bytes.
>>>
>>> Modify the `PteArray` to calculate and write the Page Table Entries
>>> directly into the coherent DMA buffer one-by-one. This reduces the stack
>>> usage quite a lot.
>>>
>>> Signed-off-by: Tim Kovalenko <[email protected]>
>>> ---
>>>  drivers/gpu/nova-core/gsp.rs      | 34 +++++++++++++++++++---------------
>>>  drivers/gpu/nova-core/gsp/cmdq.rs | 15 ++++++++++++++-
>>>  2 files changed, 33 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/drivers/gpu/nova-core/gsp.rs b/drivers/gpu/nova-core/gsp.rs
>>> index 
>>> 25cd48514c777cb405a2af0acf57196b2e2e7837..20170e483e04c476efce8997b3916b0ad829ed38
>>>  100644
>>> --- a/drivers/gpu/nova-core/gsp.rs
>>> +++ b/drivers/gpu/nova-core/gsp.rs
>>> @@ -47,16 +47,11 @@
>>>  unsafe impl<const NUM_ENTRIES: usize> AsBytes for PteArray<NUM_ENTRIES> {}
>>>  
>>>  impl<const NUM_PAGES: usize> PteArray<NUM_PAGES> {
>>> -    /// Creates a new page table array mapping `NUM_PAGES` GSP pages 
>>> starting at address `start`.
>>> -    fn new(start: DmaAddress) -> Result<Self> {
>>> -        let mut ptes = [0u64; NUM_PAGES];
>>> -        for (i, pte) in ptes.iter_mut().enumerate() {
>>> -            *pte = start
>>> -                .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
>>> -                .ok_or(EOVERFLOW)?;
>>> -        }
>>> -
>>> -        Ok(Self(ptes))
>>> +    /// Returns the page table entry for `index`, for a mapping starting 
>>> at `start` DmaAddress.
>>> +    fn entry(start: DmaAddress, index: usize) -> Result<u64> {
>>> +        start
>>> +            .checked_add(num::usize_as_u64(index) << GSP_PAGE_SHIFT)
>>> +            .ok_or(EOVERFLOW)
>>>      }
>>>  }
>>>  
>>> @@ -86,16 +81,25 @@ fn new(dev: &device::Device<device::Bound>) -> 
>>> Result<Self> {
>>>              NUM_PAGES * GSP_PAGE_SIZE,
>>>              GFP_KERNEL | __GFP_ZERO,
>>>          )?);
>>> -        let ptes = PteArray::<NUM_PAGES>::new(obj.0.dma_handle())?;
>>> +
>>> +        let start_addr = obj.0.dma_handle();
>>>  
>>>          // SAFETY: `obj` has just been created and we are its sole user.
>>> -        unsafe {
>>> -            // Copy the self-mapping PTE at the expected location.
>>> +        let pte_region = unsafe {
>>>              obj.0
>>> -                .as_slice_mut(size_of::<u64>(), size_of_val(&ptes))?
>>> -                .copy_from_slice(ptes.as_bytes())
>>> +                .as_slice_mut(size_of::<u64>(), NUM_PAGES * 
>>> size_of::<u64>())?
>>>          };
>>>  
>>> +        // This is a  one by one GSP Page write to the memory
>>> +        // to avoid stack overflow when allocating the whole array at once.
>>> +        for (i, chunk) in 
>>> pte_region.chunks_exact_mut(size_of::<u64>()).enumerate() {
>>> +            let pte_value = start_addr
>>> +                .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
>>> +                .ok_or(EOVERFLOW)?;
>>> +
>>> +            chunk.copy_from_slice(&pte_value.to_ne_bytes());
>>> +        }
>>> +
>>>          Ok(obj)
>>>      }
>>>  }
>>> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs 
>>> b/drivers/gpu/nova-core/gsp/cmdq.rs
>>> index 
>>> 0056bfbf0a44cfbc5a0ca08d069f881b877e1edc..c8327d3098f73f9b880eee99038ad10a16e1e32d
>>>  100644
>>> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
>>> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
>>> @@ -202,7 +202,20 @@ fn new(dev: &device::Device<device::Bound>) -> 
>>> Result<Self> {
>>>  
>>>          let gsp_mem =
>>>              CoherentAllocation::<GspMem>::alloc_coherent(dev, 1, 
>>> GFP_KERNEL | __GFP_ZERO)?;
>>> -        dma_write!(gsp_mem, [0]?.ptes, 
>>> PteArray::new(gsp_mem.dma_handle())?);
>>> +
>>> +        const NUM_PTES: usize = GSP_PAGE_SIZE / size_of::<u64>();
>>> +
>>> +        let start = gsp_mem.dma_handle();
>>> +        // One by one GSP Page write to the memory to avoid stack overflow 
>>> when allocating
>>> +        // the whole array at once.
>>> +        for i in 0..NUM_PTES {
>>> +            dma_write!(
>>> +                gsp_mem,
>>> +                [0]?.ptes.0[i],
>>> +                PteArray::<NUM_PTES>::entry(start, i)?
>>
>> Does `::<NUM_PTES>` need to be mentioned here, or is the compiler able
>> to infer it?
>
> The function signature doesn't mention NUM_PTES at all, so no. In fact, 
> perhaps
> the `entry` shouldn't be an associated method at all (even if is, it probably
> should be of `PteArray::<0>` or something.

I had that thought as well - this calls for a redesign of the `PteArray`
business - but also didn't want to interfere too much as this fix is
very much (and quickly) needed. We will probably re-write this once we
have access to the new I/O code anyway.

Reply via email to