On Fri, Mar 05, 2021 at 04:57:15PM -0800, Andrew Morton wrote:
> On Thu, 18 Feb 2021 15:57:44 -0800 Saravanan D <saravan...@fb.com> wrote:
> 
> > To help with debugging the sluggishness caused by TLB miss/reload,
> > we introduce monotonic hugepage [direct mapped] split event counts since
> > system state: SYSTEM_RUNNING to be displayed as part of
> > /proc/vmstat in x86 servers
> >
> > ...
> >
> > --- a/arch/x86/mm/pat/set_memory.c
> > +++ b/arch/x86/mm/pat/set_memory.c
> > @@ -120,6 +120,10 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
> >  #ifdef CONFIG_SWAP
> >             SWAP_RA,
> >             SWAP_RA_HIT,
> > +#endif
> > +#ifdef CONFIG_X86
> > +           DIRECT_MAP_LEVEL2_SPLIT,
> > +           DIRECT_MAP_LEVEL3_SPLIT,
> >  #endif
> >             NR_VM_EVENT_ITEMS
> >  };
> 
> This is the first appearance of arch-specific fields in /proc/vmstat.
> 
> I don't really see a problem with this - vmstat is basically a dumping
> ground of random developer stuff.  But was this the best place in which
> to present this data?

IMO it's a big plus for discoverability.

One of the first things I tend to do when triaging mysterious memory
issues is going to /proc/vmstat and seeing if anything looks abnormal.
There is value in making that file comprehensive for all things that
could indicate memory-related pathologies.

The impetus for adding these is a real-world tlb regression caused by
kprobes chewing up the direct mapping that took longer to debug than
necessary. We have the /proc/meminfo lines on the DirectMap, but those
are more useful when you already have a theory - they simply don't
make problems immediately stand out the same way.

Reply via email to