RE: [PATCH v5 01/15] genpt: Generic Page Table base API

Tian, Kevin Fri, 19 Sep 2025 01:11:33 -0700

> From: Jason Gunthorpe <[email protected]>
> Sent: Friday, September 19, 2025 2:07 AM
> 
> On Thu, Sep 18, 2025 at 06:49:08AM +0000, Tian, Kevin wrote:
> > > This is enough to implement the 8 initial format variations with all of
> > > their features:
> > >  * Entries comprised of contiguous blocks of IO PTEs for larger page
> > >    sizes (AMDv1, ARMv8)
> > >  * Multi-level tables, up to 6 levels. Runtime selected top level
> > >  * Runtime variable table level size (ARM's concatenated tables)
> > >  * Expandable top level (AMDv1)
> >
> > any more context about this one? how is it different from the earlier
> > "runtime selected top level"?
> 
> How about:
> 
>  * The size of the top table level can be selected at runtime (ARM's
>    concatenated tables)
>  * The number of levels in the table can optionally increase dynamically
>    during map (AMDv1)


clearer.

> > > + *  item/entry_size
> > > + *     The number of bytes of VA the table translates for.
> > > + *     If the item is a table entry then the next table covers
> > > + *     this size. If the entry is an output address then the
> >
> > s/is/translates/
> 
> Don't follow?

entry is not an address. So I meant:

"If the entry translates to an output address"

> > > +/*
> > > + * PT_WARN_ON is used for invariants that the kunit should be checking
> > > can't
> > > + * happen.
> > > + */
> > > +#if IS_ENABLED(CONFIG_DEBUG_GENERIC_PT)
> > > +#define PT_WARN_ON WARN_ON
> > > +#else
> > > +static inline bool PT_WARN_ON(bool condition)
> > > +{
> > > + return false;
> > > +}
> > > +#endif
> >
> > Then call it PT_DBG_WARN_ON() to be more explicit?
> 
> Ah, I'd rather leave it..
> 
> > btw looks there is no plain WARN_ON() used in generic-pt. Just be curious
> > about the rationale behind. Is it a new trend to contain all warnings under
> > a debug option?
> 
> It is sort of like VM_WARN_ON(), the places that got put are largely
> performance path so you don't want it enabled unless doing debugging.
> 
> Generally the idea is to use PT_WARN_ON() on performance path cases
> only, and leave normal stuff to normal WARN_ON.

Good to know

> > > +
> > > +/* If not supplied by the format then contiguous pages are not
> supported */
> > > +#ifndef pt_entry_num_contig_lg2
> > > +static inline unsigned int pt_entry_num_contig_lg2(const struct pt_state
> > > *pts)
> > > +{
> > > + return ilog2(1);
> > > +}
> > > +
> > > +static inline unsigned short pt_contig_count_lg2(const struct pt_state
> *pts)
> > > +{
> > > + return ilog2(1);
> > > +}
> >
> > what is the difference between above two helpers?
> 
> Oh, pt_contig_count_lg2 didn't get kdocs because they are internal
> helpers to build other functions..
> 
> Like this:
> 
> /*
>  * If not supplied by the format then contiguous pages are not supported.
>  *
>  * If contiguous pages are supported then the format must also provide
>  * pt_contig_count_lg2() if it supports a single contiguous size per level,
>  * or pt_possible_sizes() if it supports multiple sizes per level.

could be simplified to require the format to always support pt_possible_sizes()
if contiguous sizes are supported, no matter being a single size or multiple.

>  */
> #ifndef pt_entry_num_contig_lg2
> static inline unsigned int pt_entry_num_contig_lg2(const struct pt_state *pts)
> {
>       return ilog2(1);
> }
> 
> /*
>  * Return the number of contiguous OA items forming an entry at this table
> level
>  */
> static inline unsigned short pt_contig_count_lg2(const struct pt_state *pts)
> {
>       return ilog2(1);
> }
> #endif
> 
> > It's currently not implemented by any driver so will have the default 
> > version
> > returning 0. and it is only used by default pt_possible_sizes(), which then
> > returns only one page size accordingly.
> 
> ARM and RISCV use it. AMD is the only format that support more than
> one size of contiguous per level.

hmm I didn't find ARM/RISCV defining pt_contig_count_lg2(). So in all cases
it's the default version returning ilog2(1). Then what's the point of keeping
it instead of directly using ilog2(1) in default pt_possible_sizes? sorry that 
I still didn't connect the dots here.

> > > +
> > > +/**
> > > + * pt_item_fully_covered() - Check if the item or entry is entirely
> contained
> > > + *                           within pts->range
> >
> > when using pts it's more accurate to call it pt_entry_fully_covered()
> 
> Not so much related to PTS, as pts could be either, but it makes more
> sense to refer to table poitner as an entry than does it to refer to a
> contiguous entry an item.

make sense.

> 
> > > + PT_FEAT_FLUSH_RANGE,
> > > + /**
> > > +  * @PT_FEAT_FLUSH_RANGE_NO_GAPS: Like PT_FEAT_FLUSH_RANGE
> > > except that
> > > +  * the optimization objective is to only flush IOVA that has been
> > > +  * changed. This mode is suitable for cases like hypervisor shadowing
> > > +  * where flushing unchanged ranges may cause the hypervisor to
> > > reparse
> > > +  * significant amount of page table.
> > > +  */
> > > + PT_FEAT_FLUSH_RANGE_NO_GAPS,
> >
> > FLUSH_RANGE and FLUSH_RANGE_NO_GAPS are mutually exclusive but
> > one format must select one? then we could just keep one flag (NO_GAP)
> > then feature off means FLUSH_RANGE.
> 
> I think at least one or two more flushing modes will be needed, and
> they would be mutually exclusive. It is setup with this one hot
> encoding because of how the pt_feature function compile optimization
> works.
> 

clear now.

RE: [PATCH v5 01/15] genpt: Generic Page Table base API

Reply via email to