> From: Jason Gunthorpe <[email protected]>
> Sent: Friday, September 19, 2025 2:07 AM
>
> On Thu, Sep 18, 2025 at 06:49:08AM +0000, Tian, Kevin wrote:
> > > This is enough to implement the 8 initial format variations with all of
> > > their features:
> > > * Entries comprised of contiguous blocks of IO PTEs for larger page
> > > sizes (AMDv1, ARMv8)
> > > * Multi-level tables, up to 6 levels. Runtime selected top level
> > > * Runtime variable table level size (ARM's concatenated tables)
> > > * Expandable top level (AMDv1)
> >
> > any more context about this one? how is it different from the earlier
> > "runtime selected top level"?
>
> How about:
>
> * The size of the top table level can be selected at runtime (ARM's
> concatenated tables)
> * The number of levels in the table can optionally increase dynamically
> during map (AMDv1)
clearer.
> > > + * item/entry_size
> > > + * The number of bytes of VA the table translates for.
> > > + * If the item is a table entry then the next table covers
> > > + * this size. If the entry is an output address then the
> >
> > s/is/translates/
>
> Don't follow?
entry is not an address. So I meant:
"If the entry translates to an output address"
> > > +/*
> > > + * PT_WARN_ON is used for invariants that the kunit should be checking
> > > can't
> > > + * happen.
> > > + */
> > > +#if IS_ENABLED(CONFIG_DEBUG_GENERIC_PT)
> > > +#define PT_WARN_ON WARN_ON
> > > +#else
> > > +static inline bool PT_WARN_ON(bool condition)
> > > +{
> > > + return false;
> > > +}
> > > +#endif
> >
> > Then call it PT_DBG_WARN_ON() to be more explicit?
>
> Ah, I'd rather leave it..
>
> > btw looks there is no plain WARN_ON() used in generic-pt. Just be curious
> > about the rationale behind. Is it a new trend to contain all warnings under
> > a debug option?
>
> It is sort of like VM_WARN_ON(), the places that got put are largely
> performance path so you don't want it enabled unless doing debugging.
>
> Generally the idea is to use PT_WARN_ON() on performance path cases
> only, and leave normal stuff to normal WARN_ON.
Good to know
> > > +
> > > +/* If not supplied by the format then contiguous pages are not
> supported */
> > > +#ifndef pt_entry_num_contig_lg2
> > > +static inline unsigned int pt_entry_num_contig_lg2(const struct pt_state
> > > *pts)
> > > +{
> > > + return ilog2(1);
> > > +}
> > > +
> > > +static inline unsigned short pt_contig_count_lg2(const struct pt_state
> *pts)
> > > +{
> > > + return ilog2(1);
> > > +}
> >
> > what is the difference between above two helpers?
>
> Oh, pt_contig_count_lg2 didn't get kdocs because they are internal
> helpers to build other functions..
>
> Like this:
>
> /*
> * If not supplied by the format then contiguous pages are not supported.
> *
> * If contiguous pages are supported then the format must also provide
> * pt_contig_count_lg2() if it supports a single contiguous size per level,
> * or pt_possible_sizes() if it supports multiple sizes per level.
could be simplified to require the format to always support pt_possible_sizes()
if contiguous sizes are supported, no matter being a single size or multiple.
> */
> #ifndef pt_entry_num_contig_lg2
> static inline unsigned int pt_entry_num_contig_lg2(const struct pt_state *pts)
> {
> return ilog2(1);
> }
>
> /*
> * Return the number of contiguous OA items forming an entry at this table
> level
> */
> static inline unsigned short pt_contig_count_lg2(const struct pt_state *pts)
> {
> return ilog2(1);
> }
> #endif
>
> > It's currently not implemented by any driver so will have the default
> > version
> > returning 0. and it is only used by default pt_possible_sizes(), which then
> > returns only one page size accordingly.
>
> ARM and RISCV use it. AMD is the only format that support more than
> one size of contiguous per level.
hmm I didn't find ARM/RISCV defining pt_contig_count_lg2(). So in all cases
it's the default version returning ilog2(1). Then what's the point of keeping
it instead of directly using ilog2(1) in default pt_possible_sizes? sorry that
I still didn't connect the dots here.
> > > +
> > > +/**
> > > + * pt_item_fully_covered() - Check if the item or entry is entirely
> contained
> > > + * within pts->range
> >
> > when using pts it's more accurate to call it pt_entry_fully_covered()
>
> Not so much related to PTS, as pts could be either, but it makes more
> sense to refer to table poitner as an entry than does it to refer to a
> contiguous entry an item.
make sense.
>
> > > + PT_FEAT_FLUSH_RANGE,
> > > + /**
> > > + * @PT_FEAT_FLUSH_RANGE_NO_GAPS: Like PT_FEAT_FLUSH_RANGE
> > > except that
> > > + * the optimization objective is to only flush IOVA that has been
> > > + * changed. This mode is suitable for cases like hypervisor shadowing
> > > + * where flushing unchanged ranges may cause the hypervisor to
> > > reparse
> > > + * significant amount of page table.
> > > + */
> > > + PT_FEAT_FLUSH_RANGE_NO_GAPS,
> >
> > FLUSH_RANGE and FLUSH_RANGE_NO_GAPS are mutually exclusive but
> > one format must select one? then we could just keep one flag (NO_GAP)
> > then feature off means FLUSH_RANGE.
>
> I think at least one or two more flushing modes will be needed, and
> they would be mutually exclusive. It is setup with this one hot
> encoding because of how the pt_feature function compile optimization
> works.
>
clear now.