On Sun, 14 Jun 2026 at 20:08, Michael S. Tsirkin <[email protected]> wrote:
>
> From: Alejandro Jimenez <[email protected]>
>
> DTE[Mode] and PTE NextLevel encode page table levels as 1-based values, but
> fetch_pte() currently uses a 0-based level counter, making the logic
> harder to follow and requiring conversions between DTE mode and level.
>
> Switch the page table walk logic to use 1-based level accounting in
> fetch_pte() and the relevant macro helpers. To further simplify the page
> walking loop, split the root page table access from the walk i.e. rework
> fetch_pte() to follow the DTE Page Table Root Pointer and retrieve the top
> level pagetable entry before entering the loop, then iterate only over the
> PDE/PTE entries.
>
> The reworked algorithm fixes a page walk bug where the page size was
> calculated for the next level before checking if the current PTE was already
> a leaf/hugepage. That caused hugepage mappings to be reported as 4K pages,
> leading to performance degradation and failures in some setups.
Hi; Coverity flags up an issue with this commit. It won't have any
practical consequences, but it does show that there's something we
could clean up.
> +/*
> + * Validate DTE fields and extract permissions and top level data required to
> + * initiate the page table walk.
> + *
> + * On success, returns 0 and stores:
> + * - top_level: highest page-table level encoded in DTE[Mode]
> + * - dte_perms: effective permissions from the DTE
> + *
> + * On failure, returns -AMDVI_FR_PT_ROOT_INV. This includes cases where:
> + * - DTE permissions disallow read AND write
> + * - DTE[Mode] is invalid for translation
> + * - IOVA exceeds the address width supported by DTE[Mode]
> + * In all such cases a page walk must be aborted.
> + */
> +static uint64_t amdvi_get_top_pt_level_and_perms(hwaddr address, uint64_t
> dte,
> + uint8_t *top_level,
> + IOMMUAccessFlags *dte_perms)
This function returns 0 on success, or a small negative number on
failure. But we have typed it as returning uint64_t.
> @@ -662,40 +708,49 @@ static uint64_t large_pte_page_size(uint64_t pte)
> static uint64_t fetch_pte(AMDVIAddressSpace *as, hwaddr address, uint64_t
> dte,
> uint64_t *pte, hwaddr *page_size)
> {
> - IOMMUAccessFlags perms = amdvi_get_perms(dte);
> -
> - uint8_t level, mode;
> uint64_t pte_addr;
> + uint8_t pt_level, next_pt_level;
> + IOMMUAccessFlags perms;
> + int ret;
>
> - *pte = dte;
> *page_size = 0;
>
> - if (perms == IOMMU_NONE) {
> + /*
> + * Verify the DTE is properly configured before page walk, and extract
> + * top pagetable level and permissions.
> + */
> + ret = amdvi_get_top_pt_level_and_perms(address, dte, &pt_level, &perms);
At the callsite we take that uint64_t return value and put it into
an "int" local variable. That potentially overflows. (In practice
none of the values actually returned will overflow.)
> + if (ret < 0) {
> + return ret;
Then we return that 'int' value from this function, which extends
it back out to uint64_t again.
> + }
Since all of fetch_pte()'s callers put its return value into an 'int'
local variable, and fetch_pte() is the only function that calls
amdvi_get_top_pt_level_and_perms(), I think the right fix here is
to make both amdvi_get_top_pt_level_and_perms() and fetch_pte()
return "int" instead of "uint64_t". Then the type matches what the
return variable is (0 on success, small negative on failure) and
also the type of the variable that all the callers put it into.
thanks
-- PMM